CN118333859A

CN118333859A - Method for realizing super resolution applied to automation equipment

Info

Publication number: CN118333859A
Application number: CN202410733970.6A
Authority: CN
Inventors: 邵锡晟
Original assignee: Zhuhai Bojay Electronics Co Ltd
Current assignee: Zhuhai Bojay Electronics Co Ltd
Priority date: 2024-06-07
Filing date: 2024-06-07
Publication date: 2024-07-12

Abstract

The invention discloses a method for realizing super-resolution applied to automation equipment, which aims to reconstruct and acquire high-quality high-resolution images under the condition of not increasing hardware cost, thereby improving the positioning accuracy of a vision module, improving the sorting success rate of ultra-small element sorting equipment, or improving the positioning accuracy of the vision module and improving the detection success rate of small-size defects of AOI equipment. The invention comprises the following steps: step S1: creating a sample dataset S; step S2: constructing a generator model; step S3: constructing a discriminator model; step S4: creating a VGG model; step S5: training a model; step S6: the trained model is deployed between a camera module and a visual positioning module for precise sorting of ultra-small elements or for visual detection of small-size defects. The invention is applied to the technical field of automation equipment needing to realize super resolution.

Description

Method for realizing super resolution applied to automation equipment

Technical Field

The invention relates to a method for realizing super-resolution, in particular to a method for realizing super-resolution applied to automation equipment.

Background

Background one: the visual sorting apparatus may sort elements of different categories or different qualities into different containers. The sorting equipment collects images of the working area of the equipment through the camera module, the vision module positions the position of the element in the images and converts the position of the element in the working area, the sorting device is guided to finish element sorting, and the working flow of the sorting device is shown in figure 1. For ultra-small components, the accuracy of the positioning of the components by the vision module is an important bottleneck for the system. This is because the ultra-small elements occupy too few pixels in the image, and the features of the elements are not fully represented in the image, resulting in difficulty in locating the vision module.

Background II: with the development of automated techniques, automated optical inspection (Automated Optical Inspection, AOI) techniques have gradually replaced manual defect detection. Compared with the artificial defect detection, the AOI detection technology is not influenced by subjective factors, and can use a quantifiable unified standard for detection, so that the quality of products is effectively ensured. AOI inspection techniques require the use of an industrial camera and an industrial light source in combination to capture images and employ digital image processing algorithms to accomplish defect inspection. The digital image processing method includes a conventional image processing method and a deep learning-based image processing method, which accomplish the identification and localization of defects by detecting unique features of the defects on an image, and the workflow thereof is shown in fig. 2. The surface defects of the product are generally characterized by various forms, large size difference and the like. Since the field of view of the image acquisition system is required to be compatible with products of a variety of sizes, for small-sized defects, the identifiable features in the image are insufficient, which can easily lead to defects that cannot be accurately detected by the AOI device.

In order to solve the problems faced by the two above-mentioned backgrounds, it is necessary to provide a higher resolution input image to the vision module. At present, two main modes are available for acquiring high-resolution images, namely, the precision of a camera module is improved, and the high-resolution images are directly acquired; and secondly, reconstructing a Low-Resolution image (LR) into a High-Resolution image (HR) by a Super-Resolution method (SR). The use of high precision camera modules can significantly increase the production cost of the device and reduce the market competitiveness of the device. Therefore, reconstructing the high-resolution image by the super-resolution method becomes a more economical and practical design scheme.

The super resolution method can be classified into an interpolation-based method and a learning-based method. The interpolation-based method is to interpolate the gray level or color of the image pixel according to prior knowledge, and alternative methods include bilinear interpolation, bicubic interpolation, interpolation method with edge holding characteristic, and the like. Such methods are generally faster, but the texture portion of the high resolution image obtained by reconstruction is blurred. Early learning-based methods were mostly dictionary-like methods. The dictionary method is to cut the image into small images and group the images, establish a dictionary with the corresponding relation between the low resolution small images and the high resolution small images, and reconstruct the newly acquired low resolution images by using the dictionary. If the acquired low-resolution image is not similar to the image in the dictionary, distortion can occur in the high-resolution image obtained by reconstruction. With the development of deep learning, the super-resolution method based on deep learning has taken an important place in the learning-based method. The method learns the relation between the low-resolution image and the high-resolution image through model training, and obtains a model which can be used for reconstructing the high-resolution image. The super-resolution method based on deep learning directly optimizes the generation result of the model in the training process without manually providing prior or design features, thereby having better robustness.

Although many super-resolution methods based on deep learning have been proposed at present, the following problems still exist with these methods: 1. in the model training process, the partial method only focuses on the similarity between the reconstruction result of the current image and the corresponding high-resolution image. This easily results in model overfitting, poor generalization performance, and inability to apply on newly acquired images; 2. in the model training process, the other part of the method only focuses on the similarity between the reconstruction result of the current image and the dataset image. This results in the reconstructed high resolution image not accurately recovering the details in the image.

Disclosure of Invention

The invention aims to solve the technical problems of overcoming the defects of the prior art, and provides a method for realizing super resolution applied to automation equipment, which aims to reconstruct and acquire high-quality high-resolution images under the condition of not increasing hardware cost, thereby improving the positioning accuracy of a vision module, improving the sorting success rate of ultra-small element sorting equipment or improving the positioning accuracy of the vision module and improving the detection success rate of small-size defects of AOI equipment; in addition, plug and play can be realized through the invention, namely, the positioning precision of the visual detection module can be improved or the AOI detection precision of the visual detection module can be improved only by arranging the model between the camera module and the visual positioning module.

The technical scheme adopted by the invention is as follows: the method for realizing super resolution applied in the automation equipment comprises the following steps:

Step S1: creating a sample dataset S, collecting low resolution images and corresponding high resolution images of the device workspace in pairs, s= { (LR _i,HR_i) }, where LR _i and HR _i represent low resolution images and corresponding high resolution images of the ith ultra-small element in the dataset S, respectively;

Step S2: constructing a generator model, wherein a generator G consists of a plurality of convolution modules, a nonlinear module and an up-sampling module and is responsible for reconstructing an input low-resolution image into a corresponding high-resolution image;

Step S3: constructing a discriminator model, wherein a discriminator D consists of a plurality of convolution modules, a nonlinear module and a downsampling module and is responsible for discriminating whether an input image is a reconstructed image or a real high-resolution image in the training process;

step S4: creating a VGG model, wherein the VGG model extracts image characteristics and is used for calculating model training loss;

Step S5: training a model, namely setting optimizers of a generator G and a discriminator D as Adam and SGD respectively, setting a learning rate scheduling scheme as cosine simulated annealing, and alternately training the generator G and the discriminator D;

step S6: the trained model is deployed between a camera module and a visual positioning module for precise sorting of ultra-small elements or for visual detection of small-size defects.

Further, the apparatus in step S1 is an ultra-small component sorting apparatus, or an AOI optical inspection apparatus for small-size defect inspection.

Further, the generator G constructed in step S2 includes a first convolution layer, a first residual block, a second convolution layer, a first BN layer, a third residual block, a first upsampling block, a second upsampling block, and a third convolution layer; after the low resolution image is input from the first convolution layer, the low resolution image sequentially passes through a first residual block, a second convolution layer, a first BN layer, a first residual layer, a first upsampling block, a second upsampling block and a third convolution layer, and finally the high resolution image is output from the third convolution layer; the signal output end of the first convolution layer is divided into two paths, one path enters the first residual block, and the other path enters the first residual layer.

Further, each module of the generator G is:

First convolution layer: the number of input channels is 3, the number of output channels is 64, the convolution kernel size is 9×9, padding is set to 4, and a PReLU activation function is adopted, so that the layer converts the 3-channel color image channel of the input image into 64 channels;

First residual block: the number of input channels is 64;

Second residual block: the number of input channels is 64;

Second convolution layer: the number of input channels and the number of output channels are 64, the convolution kernel size is 3 multiplied by 3, the padding is set to be 1, and PReLU activation functions are adopted;

first BN layer: normalizing the convolved data, wherein the feature quantity is set to be 64;

first residual layer: performing pixel level addition with an input channel number of 64;

First upsampling layer: the number of input channels is 64;

Second upsampling layer: the number of input channels is 64;

third convolution layer: the number of input channels is 64, the number of output channels is 3, the convolution kernel size is 9×9, padding is set to 4, and PReLU activation functions are adopted; this layer converts the feature map into 3 channels to obtain the final high resolution image.

Further, the first residual block and the second residual block comprise a first convolution layer, a first BN layer, a second convolution layer, a second BN layer and a first residual layer in sequence, wherein,

Convolution layer one: the convolution kernel size is 3 multiplied by 3, padding is set to be 1, the number of input and output channels is channels, and PreLU activation functions are adopted;

BN layer one: normalizing the convolved data, wherein the characteristic quantity is channels;

Convolution layer two: the convolution kernel size is 3 multiplied by 3, padding is set to be 1, the number of input and output channels is channels, and PReLU activation functions are adopted;

BN layer two: normalizing the convolved data, wherein the characteristic quantity is channels;

residual layer one: residual learning is achieved by adding the input to the convolved and batch normalized output.

Further, the first upsampling layer and the second upsampling layer each comprise a convolutional layer a and Pixel Shuffler layers, wherein,

Convolution layer a: the convolution kernel size is 3×3, padding is set to 1, the number of input channels is in_channels, the number of output channels is in_channels×up_scale2, and a PreLU activation function is adopted to expand each channel in the input feature map to in_channels×up_scale2 channels, and the operation prepares data for the subsequent Pixel Shuffler layers;

pixel Shuffler layers: an up-sampling operation is performed to rearrange the pixels in the feature map to increase resolution.

Further, the discriminator D in the step S3 sequentially comprises a convolution layer I, a convolution layer II, a BN layer I, a convolution layer III, a BN layer II and a full connection layer, wherein,

Convolution layer I: the number of input channels is 3, the number of output channels is 64, the convolution kernel size is 3×3, padding is set to 1, a leakage ReLU activation function is adopted, and the slope is set to 0.2;

Convolution layer II: the number of input channels is 64, the number of output channels is 64, the convolution kernel size is 3×3, stride is set to 2, and padding is set to 1; using a leak ReLU activation function, setting the slope to 0.2;

BN layer I: the feature quantity is set to 64;

Convolution layer III: the number of input channels is 64, the number of output channels is 128, the convolution kernel size is 3×3, stride is set to 1, and padding is set to 1; BN layer: the feature quantity is set to 128; using a leak ReLU activation function, setting the slope to 0.2;

BN layer II: the feature quantity is set to 64;

Full tie layer: using Conv2d, the number of input channels is 512, the number of output channels is 1024, the convolution kernel is 1×1, a sigmoid activation function is adopted, and a judgment result is output.

Further, in step S5, the learning rate of the optimizer is set first for each training period, and then the alternating training of the generator and the arbiter is performed once; the specific content of one training is as follows:

① Training a discriminator: the real high-resolution picture HR _i is firstly transmitted into a discriminator D to obtain D_real_loss, then the real low-resolution picture LR _i is transmitted into a generator to obtain a picture, the picture is transmitted into the discriminator to obtain D_like_loss, and finally the D_train_loss is obtained as follows:

D_train_loss = D_real_loss + D_fake_loss；

② Training generator: transmitting the real picture into a generator to obtain a result, and obtaining an image error image_loss through the mean square error of the output image of the generator and the real image; then, the generated picture is transmitted into a discriminator, ADVERSARIAL LOSS is calculated, namely the probability that the discriminator discriminates the generated image into a real image; and finally, transmitting the generated image and the real image into a VGG19 model and calculating a mean square error to obtain perception _loss. The g_train_loss of the final computation generator is:

G_train_loss = image_loss + 10^-3×adversarial loss + 2×10^-6×perception_loss；

Wherein image_ loss, perception _loss reflects pixel and texture differences between the reconstructed image and the real high-resolution image, respectively, ADVERSARIAL LOSS reflects the distributed pixel levels between the reconstructed image and the real high-resolution image in the whole dataset.

Drawings

Fig. 1 is a flow chart of a visual sorting apparatus;

FIG. 2 is an AOI device workflow diagram;

FIG. 3 is a schematic diagram of the model structure of generator G;

FIG. 4 is a schematic diagram of a residual block structure;

FIG. 5 is a schematic diagram of an upsampling block structure;

FIG. 6 is a schematic diagram of a structure of a discriminator D;

FIG. 7 is a schematic diagram of the operation of a super-resolution model for sorting ultra-small elements;

FIG. 8 is a schematic diagram of the operation of a super-resolution model for defect feature enhancement.

Detailed Description

Example 1

Through in this example, can realize rebuilding under the condition that does not increase hardware cost, acquire high-quality high-resolution image, improve vision module location accuracy, promote the letter sorting success rate of ultra-small element letter sorting equipment, arrange the model in addition between camera module and vision positioning module, can promote vision detection module's positioning accuracy.

In this embodiment, the present invention is a method for realizing super resolution applied in a super small component sorting apparatus, including the steps of:

Step S1: a sample dataset S is created. Low-resolution images and corresponding high-resolution images of the ultra-small elements in the device working area are collected in pairs, s= { (LRi, HRi) }, where LRi and HRi represent the low-resolution image and corresponding high-resolution image of the ith ultra-small element in the dataset S, respectively.

Step S2: a generator model G is constructed. The generator G is composed of a plurality of convolution modules, a nonlinear module and an upsampling module, and is responsible for reconstructing an input low resolution image into a corresponding high resolution image.

Step S3: and constructing a discriminator model D. The discriminator D is composed of a plurality of convolution modules, nonlinear modules and downsampling modules, and is responsible for discriminating whether an input image is a generator reconstructed image or a real high-resolution image in the training process.

Step S4: a VGG model is created. The VGG model extracts image features for calculation of model training loss.

Step S5: and training a model. Optimizers of the generator and the discriminant are set as Adam and SGD respectively, a learning rate scheduling scheme is set as cosine simulated annealing, and the generator G and the discriminant D are trained alternately.

Step S6: the trained model is deployed between the camera module and the vision positioning module for accurate sorting of the ultra-small elements.

In step 2, the specific implementation of the generator G is shown in fig. 3, where the broken line arrows point to the residual layers of the respective modules. The specific definition of each module is as follows:

First residual block: the number of input channels is 64;

Second residual block: the number of input channels is 64;

First upsampling layer: the number of input channels is 64;

Second upsampling layer: the number of input channels is 64;

The structure of the residual block is shown in fig. 4, and the specific implementation of each layer is as follows:

The specific structure of the up-sampling block is shown in fig. 5, and the specific implementation of each layer is as follows:

In step 3, as shown in fig. 6, the specific implementation of the discriminator D, the specific definition of each module is:

BN layer I: the feature quantity is set to 64;

BN layer II: the feature quantity is set to 64;

In this embodiment, in step S5, the learning rate of the optimizer is set first for each training period, and then the alternating training of the generator and the arbiter is performed once; the specific content of one training is as follows:

D_train_loss = D_real_loss + D_fake_loss；

In this embodiment, the present invention can realize super resolution for plug and play, and fine sorting of ultra-small elements, and the schematic diagram thereof is shown in fig. 7. The super-resolution model trains the generator G and the arbiter D alternately by means of the dataset S. After the model is converged, the generator G is deployed on the camera module of the ultra-small element sorting equipment, and the low-resolution image output by the camera module is reconstructed into a high-resolution image, so that the purpose of improving the precision of the visual positioning module is achieved. The visual positioning module transmits the positions of the elements which are identified and acquired on the high-resolution image to the sorting device controller to finish sorting of the ultra-small elements.

Example two

In this example, the present invention is different from the first embodiment in that in this embodiment, feature enhancement for small-size defects is performed, and a schematic diagram thereof is shown in fig. 8. The super-resolution model trains the generator G and the arbiter D alternately by means of the dataset S. After the model is converged, the generator G is deployed on the camera module of the AOI equipment, and the low-resolution image output by the camera module is reconstructed into a high-resolution image, so that the aim of improving the defect visual detection precision is fulfilled. And the visual detection module recognizes the obtained defect characteristics on the high-resolution image and outputs the result of defect visual detection.

The method comprehensively considers the similarity between the reconstructed high-resolution image and the current image and between the reconstructed high-resolution image and the data set image, and the obtained model can be used for reconstructing the high-resolution image under the ultra-small element sorting scene or reconstructing the defect high-resolution image under the AOI scene, so that defect characteristic enhancement is realized. The invention reconstructs the collected low-resolution image into the high-resolution image, and under the condition of not increasing hardware cost, the high-quality high-resolution image is reconstructed and obtained, thereby improving the positioning accuracy of the vision module, improving the sorting success rate of the ultra-small element sorting equipment, or improving the positioning accuracy of the vision module and improving the detection success rate of the small-size defects of the AOI equipment; in addition, plug and play can be realized through the invention, namely, the positioning precision of the visual detection module can be improved or the AOI detection precision of the visual detection module can be improved only by arranging the model between the camera module and the visual positioning module.

While the embodiments of this invention have been described in terms of practical aspects, they are not to be construed as limiting the meaning of this invention, and modifications to the embodiments and combinations with other aspects thereof will be apparent to those skilled in the art from this description.

Claims

1. A method for implementing super resolution for use in an automation device, characterized by: the method comprises the following steps:

2. A method for realizing super resolution for use in an automation device according to claim 1, wherein: the apparatus in step S1 is an ultra-small component sorting apparatus, or an AOI optical inspection apparatus for small-size defect inspection.

3. A method for realizing super resolution for use in an automation device according to claim 1, wherein: the generator G constructed in the step S2 comprises a first convolution layer, a first residual block, a second convolution layer, a first BN layer, a third residual block, a first upsampling block, a second upsampling block and a third convolution layer; after the low resolution image is input from the first convolution layer, the low resolution image sequentially passes through a first residual block, a second convolution layer, a first BN layer, a first residual layer, a first upsampling block, a second upsampling block and a third convolution layer, and finally the high resolution image is output from the third convolution layer; the signal output end of the first convolution layer is divided into two paths, one path enters the first residual block, and the other path enters the first residual layer.

4. A method for realizing super resolution for use in an automation device according to claim 3, wherein: the generator G comprises the following modules:

First residual block: the number of input channels is 64;

Second residual block: the number of input channels is 64;

First upsampling layer: the number of input channels is 64;

Second upsampling layer: the number of input channels is 64;

5. A method for realizing super resolution for use in an automation device according to claim 3 or 4, wherein: the first residual block and the second residual block sequentially comprise a first convolution layer, a first BN layer, a second convolution layer, a second BN layer and a first residual layer, wherein,

6. A method for realizing super resolution for use in an automation device according to claim 3 or 4, wherein: the first upsampling layer and the second upsampling layer each comprise a convolutional layer a and Pixel Shuffler layer, wherein,

7. A method for realizing super resolution for use in an automation device according to claim 1, wherein: the discriminator D in the step S3 sequentially comprises a convolution layer I, a convolution layer II, a BN layer I, a convolution layer III, a BN layer II and a full connection layer, wherein,

BN layer I: the feature quantity is set to 64;

BN layer II: the feature quantity is set to 64;

8. A method for realizing super resolution for use in an automation device according to claim 1, wherein:

in step S5, the learning rate of the optimizer is set first for each training period, and then the alternating training of the generator and the arbiter is performed once; the specific content of one training is as follows:

D_train_loss = D_real_loss + D_fake_loss；

② Training generator: transmitting the real picture into a generator to obtain a result, and obtaining an image error image_loss through the mean square error of the output image of the generator and the real image; then, the generated picture is transmitted into a discriminator, ADVERSARIAL LOSS is calculated, namely the probability that the discriminator discriminates the generated image into a real image; finally, the generated image and the real image are transmitted into a VGG19 model, and a mean square error is calculated to obtain perception _loss; the g_train_loss of the final computation generator is: