WO2021181749A1

WO2021181749A1 - Learning device, image inspection device, learned parameter, learning method, and image inspection method

Info

Publication number: WO2021181749A1
Application number: PCT/JP2020/041655
Authority: WO
Inventors: 悟史岡本
Original assignee: 株式会社Ｓｃｒｅｅｎホールディングス
Priority date: 2020-03-10
Filing date: 2020-11-09
Publication date: 2021-09-16
Also published as: JP2021144314A; JP7426261B2

Abstract

An image inspection device (100) is provided with a learning unit (143), a statistic acquisition unit (31), and an abnormality detection unit (37). The learning unit (143), taking a plurality of target object images as learning data, learns a variational autoencoder so as to reduce an input/output error and so as to output a mean, variance, and a higher-order statistic of a distribution approximated with a specific distribution for respective unit pixels. The statistic acquisition unit (31) inputs an inspection image to be inspected to the variational autoencoder and acquires a mean, variance, and skewness of each of the pixels of the inspection image. The abnormality detection unit (37) detects an abnormality in the inspection image on the basis of the mean, variance, and higher-order statistic for each of the unit pixels of the inspection image obtained by the statistic acquisition unit (31).

Description

Learning device, image inspection device, learned parameters, learning method, and image inspection method

The present invention relates to a technique for detecting an abnormality in an object.

In the manufacturing process of foods, pharmaceuticals, industrial products, etc., an inspection process may be provided on the manufacturing line to perform inspections such as detection of defective products. Product inspection is performed visually by humans (visual inspection), and there is a problem that human cost is high. Therefore, in order to automate a part or all of the inspection process, a system for automatically inspecting products by a machine is being developed.

In recent years, a defect inspection method using a machine learning technique called a convolutional neural network has been proposed. A general method using a convolutional neural network consists of a convolutional layer and a fully connected layer, and learns to classify non-defective products and defective products. At this time, in the convolutional layer, the feature amount of the image is extracted, and in the fully connected layer in the final stage, it is learned to perform the identification using the feature amount. When the image of the sample to be inspected is input, the trained network outputs a judgment result indicating whether the product is non-defective or defective. In this way, machine learning that learns by giving a correct label indicating whether it is a good product or a defective product is called supervised learning.

Patent Document 1 points out that a sufficient amount of good and defective products are required for learning, and proposes an inspection by unsupervised learning. Specifically, learning is performed so as to reconstruct the input image using a convolutional neural network called an autoencoder composed of a convolutional layer in the first half and a deconvolutional layer in the second half. Only non-defective images are used for learning. The convolution layer compresses the input image into smaller data, and the deconvolution layer restores the original input image from the compressed data. By such learning, the convolutional layer in the first half can output the feature amount of the image. The feature amount extracted using the learned convolution layer is input to a classifier such as an isolation forest, and the quality of the product is judged.

Patent Document 2 acquires a difference image between an image input to the autoencoder and an image output by the autoencoder in order to discriminate defects in industrial parts in which a part of the image is slightly different. By learning the autoencoder using only non-defective images, the image reconstructed by the autoencoder becomes an image without defects, so that the defective portion can be detected by taking the difference.

Non-Patent Document 1 describes an inspection technique using a Variational Auto-Encoder (VAE). Variational Auto-Encoder is a kind of auto-encoder and enables more advanced learning by introducing a probabilistic model. In Non-Patent Document 1, the output of the variational auto-encoder is modeled with a multivariate normal distribution, and the mean and variance of the normal distribution are estimated in consideration of the restoration error, instead of simply reconstructing each pixel. Learning is done like this. By considering not only the input / output difference but also how much error occurs during restoration, more accurate defect detection is possible.

JP-A-2019-087181 JP-A-2018-195119

However, the method described in Non-Patent Document 1 has a problem that the accuracy is greatly reduced when the brightness distribution of each pixel of a plurality of images used for learning does not follow a normal distribution. For example, if the luminance distribution has a shape that is extremely biased to the left and right, the difference between the actual luminance and the average value becomes large, so that over-detection may increase.

An object of the present invention is to provide a technique for suppressing over-detection when detecting an abnormality in an image.

In order to solve the above problem, the first aspect is a learning device for constructing an image inspection device, in which a plurality of object images are used as learning data so that an error between input and output is reduced and For each unit pixel, it is provided with a learning unit that learns a probability model so as to output the mean, variance, and higher-order statistics of the distribution approximated by a specific distribution.

The second aspect is the learning device of the first aspect, and the object image is a non-defective product image obtained by capturing a non-defective product.

The third aspect is the learning device of the first aspect or the second aspect, and the specific distribution is a normal distribution.

The fourth aspect is a learning device according to any one of the first to third aspects, and the probability model is a variational autoencoder.

The fifth aspect is the learning device of any one of the first to fourth aspects, and the higher-order statistics are skewness or kurtosis.

The sixth aspect is an image inspection device using a learning model trained by any one of the learning devices of the first to fifth aspects, and the inspection image to be inspected is obtained by the learning unit. A statistic acquisition unit that inputs to the probabilistic model having the learned parameters and acquires the average, variance, and higher-order statistics for each unit pixel with respect to the inspection image, and an average acquired by the statistic acquisition unit. It includes an abnormality detection unit that detects an abnormality in the inspection image based on the variance and higher-order statistics.

A seventh aspect is the image inspection device of the sixth aspect, in which the abnormality detection unit detects a unit pixel of the inspection image in which the higher-order statistics acquired by the statistic acquisition unit exceeds a predetermined threshold value. Exclude and detect anomalies.

The eighth aspect is a learned parameter of the probability model acquired by the learning device of any one of the first to fifth aspects.

The ninth aspect is a learning method for constructing an image inspection method, in which a plurality of object images are used as training data so that an error between input and output is reduced, and a specific object is specified for each unit pixel. It involves training a probabilistic model to output the mean, variance, and higher-order statistics of the distribution approximated by the distribution.

The tenth aspect is an image inspection apparatus, in which a plurality of object images are used as training data, and the distribution is approximated by a specific distribution for each unit pixel so that the error between input and output is reduced. Statistic acquisition to obtain mean, variance, and higher-order statistics for each unit pixel of an inspection image using a probabilistic model with trained parameters trained to output mean, variance, and higher-order statistics. A unit and an abnormality detection unit that detects an abnormality in the inspection image based on the average, dispersion, and higher-order statistics for each unit pixel of the inspection image obtained by the statistic acquisition unit.

The eleventh aspect is the image inspection device of the tenth aspect, and the abnormality detection unit detects a unit pixel of the inspection image in which the higher-order statistics acquired by the statistic acquisition unit exceeds a predetermined threshold value. Exclude and detect anomalies.

The twelfth aspect is an image inspection method, in which a plurality of object images are used as training data, and the distribution is approximated by a specific distribution for each unit pixel so that the error between input and output is reduced. Statistic acquisition to obtain mean, variance, and higher-order statistics for each unit pixel of an inspection image using a probabilistic model with trained parameters trained to output mean, variance, and higher-order statistics. The step includes an abnormality detection step of detecting an abnormality in the inspection image based on the average, dispersion, and higher-order statistics of the inspection image for each unit pixel obtained by the statistic generation step.

According to the present invention, since unit pixels that do not follow a specific distribution can be specified based on higher-order statistics estimated using a probability model, overdetection can be suppressed.

It is a figure which shows the inspection apparatus of embodiment. It is a figure which shows the hardware configuration of the information processing apparatus of embodiment. It is a figure which shows the functional structure which the information processing apparatus of an embodiment has. It is a figure which shows the variational autoencoder conceptually. It is a figure which conceptually shows the flow of inspection by an inspection part.

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It should be noted that the components described in this embodiment are merely examples, and the scope of the present invention is not limited to them. In the drawings, the dimensions and numbers of each part may be exaggerated or simplified as necessary for easy understanding.

<1. Embodiment>
FIG. 1 is a diagram showing an image inspection device 10 of an embodiment. The image inspection device 10 detects defects (abnormalities) in the object 90 by analyzing the image of the object 90. The object 90 is specifically a tablet, but is not limited to a tablet. The image inspection device 10 includes a camera 110 and an information processing device 120. The camera 110 is electrically connected to the information processing device 120. The camera 110 includes an image sensor. The camera 110 outputs an image signal obtained by imaging the object 90 using the image sensor to the information processing device 120. The object 90 imaged by the camera 110 may be stopped at a predetermined position, or may be moved in a predetermined direction by a transport mechanism such as a belt conveyor.

FIG. 2 is a diagram showing a hardware configuration of the information processing device 120 of the embodiment. The information processing device 120 has a configuration as a computer. Specifically, the information processing device 120 includes a processor 121, a RAM 123, a storage unit 125, an input unit 127, a display unit 129, an apparatus I / F 131, and a communication I / F 133. The processor 121, the RAM 123, the storage unit 125, the input unit 127, the display unit 129, the device I / F 131, and the communication I / F 133 are electrically connected to each other via the bus 135.

Specifically, the processor 121 includes a CPU or a GPU. The RAM 123 is a storage medium capable of reading and writing information, and specifically, an SDRAM. The storage unit 125 is a recording medium capable of reading and writing information, and specifically includes an HDD (hard disk drive) or an SSD (solid state drive). The storage unit 125 may include a ROM, a portable optical disk, a magnetic disk, a semiconductor memory, or the like. The storage unit 125 stores the program P. The processor 121 realizes various functions by executing the program P with the RAM 123 as a work area. The program P may be provided or distributed to the information processing apparatus 120 via the network.

The input unit 127 is an input device that accepts user's operation input, specifically, a mouse, a keyboard, or the like. The display unit 129 is a display device that displays images representing various types of information, and is specifically a liquid crystal display.

The device I / F 131 is an interface for electrically connecting the camera 110 to the information processing device 120. The communication I / F 133 is an interface for connecting the information processing device 120 to a network such as the Internet. The camera 110 may be connected to the information processing device 120 via the communication I / F 133. That is, the image inspection device 10 is not essential to include the camera 110, and may include only the information processing device 120.

FIG. 3 is a diagram showing a functional configuration included in the information processing device 120 of the embodiment. The information processing device 120 includes an acquisition unit 141, a learning unit 143, and an inspection unit 145. The acquisition unit 141, the learning unit 143, and the inspection unit 145 are functions realized by operating the processor 121 according to the program P. The learning unit 143 is not essential to be provided in the information processing device 120, and may be provided in another computer.

The acquisition unit 141 acquires the object image 91 obtained by capturing the object 90 with the camera 110. Of the object images 91, the image to be inspected is referred to as an inspection image. The learning unit 143 performs learning using the variational autoencoder 20, which is a probability model described later. The inspection unit 145 inputs the inspection image to the variational autoencoder 20 and detects an abnormality in the inspection image based on the output result.

<2. Network construction>
FIG. 4 is a diagram conceptually showing the variational autoencoder 20. An autoencoder is a neural network technology, also called a self-encoder. In the image inspection apparatus 100, a variational auto-encoder (VAE: Variational Auto Encoder) is used as an example of the autoencoder. A Generative Adversarial Network (GAN) may be used as the encoder.

The variational autoencoder 20 is a function composed of a neural network. In the variational autoencoder 20, the data x (object image 91) is input to the convolutional layer 21 and converted into a dimensionally reduced latent variable z. Then, the latent variable z is input to the first deconvolution layer 231 and the reconstruction data x'is output. The convolution layer 21 is also referred to as an encoder, and the first deconvolution layer 231 is also referred to as a decoder. Then, the encoder and the decoder are trained so that the reconstructed data x'is close to the data x.

In the variational autoencoder 20, the data x and the latent variable z are treated as random variables. That is, the encoder (convolution layer 21) and the decoder (first deconvolution layer 231) are not deterministic and are stochastic transformations that include sampling from the probability distributions p (z | x), p (x | z). I do. Further, as the probability distribution p (z│x), the probability distribution q (z│x) approximated by the variational method is used. Further, in the variational autoencoder 20, the probability distributions q (z | x) and p (x | z) are approximated by a specific distribution determined by a limited number of parameters. The probability distributions q (z | x) and p (x | z) are expressed by the following equations when approximated by a specific distribution.

q (z | x) = q (z | φ (x))
p (x | z) = p (x | θ (z))

Here, φ (x) and θ (z) are projection functions that output each of the parameters φ and θ of the probability distribution with respect to the inputs (x and z).

In this embodiment, the probability distributions q (z | x) and p (x | z) are approximated by a normal distribution. By approximating the probability distribution p (x | z) with a normal distribution, the output of the decoder _{becomes the parameters of the probability distribution approximated with the normal distribution (mean μ x} and variance σ _x ² ). That is, when the object image 91 is input to the variational autoencoder 20 as the data x, the first deconvolution layer 231 outputs the _{average μ x} and the variance σ _x ^{2 for each pixel of the object image 91.}

_{The average μ x} output by the first deconvolution layer 231 represents an image obtained by reconstructing the object image 91 input to the variational autoencoder 20. _{Further, the variance σ x} ² output by the first deconvolution layer 231 represents the variation at the time of reconstruction.

It is not essential that the probability distributions q (z | x) and p (x | z) are approximated by a normal distribution, and may be approximated by a distribution other than the normal distribution such as the Bernoulli distribution or the multinomial distribution.

As shown in FIG. 4, the variational autoencoder 20 has a second deconvolution layer 233. The second deconvolution layer 233 is connected to the output side of the convolution layer 21. The second deconvolution layer 233 outputs high-order statistics for each pixel of the object image 91 input to the convolution layer 21 from the latent variable z output by the convolution layer 21.

In the present embodiment, the higher-order statistic output by the second deconvolution layer 233 is skewness. The skewness is a statistic that indicates the degree of skewness of the distribution. If the distribution is not biased, it becomes 0, and if it is biased to the left or right, the value goes up or down.

The skewness output by the second deconvolution layer 233 is _{a value indicating how much the mean μ x} and the variance σ _x ² output by the first deconvolution layer 231 are distorted with respect to the normal distribution.

In the present embodiment, the average and the variance are output from the first deconvolution layer 231 and the skewness is output from the second deconvolution layer 233. By separating the deconvolution layers in this way, the accuracy of each output is improved. However, the mean, variance, and skewness may be output from a common deconvolution layer.

<Network learning>
The learning unit 143 performs learning using the variational autoencoder 20. As the learning data, as the plurality of object images 91, a non-defective image obtained by imaging a non-defective object 90 is used. The learning unit 143 learns to update the internal parameters so as to minimize the error (reconstruction error) between the input and the output in the variational autoencoder 20. An error function L (x) is defined for this learning. The learning unit 143 inputs each good product image to the variational autoencoder 20, and learns to reconstruct each input good product image by using the probabilistic re-descent method, so that the convolution layer 21, the first The internal parameters of the deconvolution layer 231 and the second deconvolution layer 233 are updated. The following equation is the error function L (x) used during learning.

In the error function L (x), i and j indicate the element numbers of the non-defective images of the training data. The error function L (x), the error function described in Non-Patent Document 1, the error S _VAE for optimizing the skewness is added. The mean μ _xi and variance σ _xi ² of the normal distribution are learned by using the log-likelihood of the normal distribution as an error function. _SVAE is designed to approximately optimize the skewness of the mean and variance of the normal distribution with a squared error. As a result, the output of the second reverse convolution 233 is brought closer to the skewness.

Specifically, the i-th input image _{x i,} and is a parameter of the normal distribution estimated is determined from the average mu _xi and standard deviation _{_{_{σ xi (x i -μ xi)}}} 3 / σ xi 3, the network output The error of skewness is minimized. This means that only the error of one non-defective image with respect to the distribution is calculated, but the approximation is stochastically performed by repeatedly updating the internal parameters using a huge number of non-defective images. When the learning unit 143 completes the learning using the variational autoencoder 20, the learning unit 143 stores the learned internal parameters (learned parameters) in the storage unit 125.

As shown in FIG. 3, the inspection unit 145 includes a statistic acquisition unit 31, an abnormality degree acquisition unit 33, a correction unit 35, and an abnormality detection unit 37. The contents of the process executed by the inspection unit 145 will be described in detail below.

<Defect detection>
FIG. 5 is a diagram conceptually showing the flow of inspection by the inspection unit 145. FIG. 5 shows a case where the inspection image 93 to be inspected has a defective portion NG1.

First, as shown in FIG. 5, the statistic acquisition unit 31 of the inspection unit 145 inputs the inspection image 93 to the variational autoencoder 20 having the learned internal parameters. Then, the variable auto encoder 20 has an average image 931 representing the average of the normal distribution, a dispersion image 933 showing the variance of the normal distribution, and a skewness image 935 showing the skewness of the normal distribution for each pixel of the inspection image 93. And output.

The abnormality degree acquisition unit 33 of the inspection unit 145 calculates the abnormality degree for each pixel of the inspection image 93 based on the inspection image 93 and the average image 931 and the dispersion image 933 acquired by the statistic acquisition unit 31. The degree of anomaly may be, for example, the Mahalanobis distance. The Mahalanobis distance is determined by, for example, (x _k − μ _k ) ² / σ _k ² (where k represents the element number of each pixel).

More specifically, the abnormality degree acquisition unit 33, the inspection image 93 and average image 931, calculates a difference _{(= x} _k -μ k) of the corresponding pixels to the luminance _{(x k} and mu _k). Further, the abnormality degree acquisition unit 33 divides the _{obtained difference squared value (= (x k} − μ _k ) ² _{) by the variance σ k} ² of the corresponding pixel in the variance image 933. The abnormality degree acquisition unit 33 acquires the abnormality degree image 937 by performing such arithmetic processing on all the pixels of the inspection image 93.

As shown in FIG. 5, the defect portion NG1 included in the inspection image 93 is detected in the abnormality degree image 937 as a high-luminance portion indicating that the abnormality degree is large. However, as shown by the arrow, in the abnormality degree image 937, a portion having a large abnormality degree is detected in addition to the defect portion NG1. Specifically, the shining portion of the object 90 in the inspection image 93 is detected with a large degree of abnormality. Therefore, when the abnormality determination is made based on the abnormality degree image 937, there is a possibility that over-detection in which the abnormality is determined other than the defect portion NG1 may occur.

Here, the anomaly degree image 937 is obtained on the assumption that the brightness distribution of each pixel follows a normal distribution. Therefore, for pixels whose estimated luminance distribution does not follow the normal distribution, the degree of abnormality tends to be high, which may cause over-detection. Therefore, the correction unit 35 of the inspection unit 145 corrects the abnormality degree image 937 in order to suppress over-detection. Specifically, the correction unit 35 performs a process of removing from the anomaly image 937 the pixels whose skewness exceeds a predetermined threshold value among the skewness images 935 as pixels that do not follow the normal distribution. That is, the correction unit 35 generates the correction image 939 based on the abnormality degree image 937 and the skewness degree image 935. As shown in FIG. 5, the shining portion in the inspection image 93 has a relatively large skewness in the skewness image 935. Therefore, in the corrected image 939, the degree of abnormality of the shining portion is removed.

The abnormality detection unit 37 of the inspection unit 145 determines whether or not each pixel of the inspection image 93 is abnormal based on the degree of abnormality of the corrected image 939. Specifically, the abnormality detection unit 37 determines in the corrected image 939 that a pixel whose degree of abnormality exceeds a predetermined threshold value is abnormal. The abnormality detection unit 37 may display the determination result on the display unit 129. The abnormality detection unit 37 may display information indicating the coordinates of the pixel determined to be abnormal or the degree of abnormality on the display unit 129.

As described above, in the present embodiment, by training the variational autoencoder 20 so as to output the skewness which is a high-order statistic, the pixels whose luminance distribution does not follow the normal distribution are based on the skewness. Can be identified. Therefore, by correcting the abnormality degree image based on the skewness, over-detection of the abnormality in the inspection image 93 can be suppressed.

<3. Modification example>
Although the embodiments have been described above, the present invention is not limited to the above, and various modifications are possible.

For example, in the above embodiment, skewness is adopted as a higher-order statistic, but kurtosis or a higher-order statistic may be adopted. The following equation may be adopted as the _SVAE of the error function L (x) that trains the variational autoencoder 20 so as to output the kurtosis.

Although the present invention has been described in detail, the above description is exemplary in all aspects and the invention is not limited thereto. It is understood that innumerable variations not illustrated can be assumed without departing from the scope of the present invention. The configurations described in the above embodiments and the modifications can be appropriately combined or omitted as long as they do not conflict with each other.

10 Image inspection device 100 Image inspection device 120 Information processing device (learning device)
125 Storage unit 143 Learning unit 145 Variational auto-encoder 31 Statistics acquisition unit 33 Abnormality acquisition unit 35 Correction unit 37 Abnormality detection unit 90 Object 91 Object image 93 Inspection image 931 Average image 933 Distributed image 935 Skewness Image 937 Anomaly image 939 Corrected image

Claims

It is a learning device for constructing an image inspection device.
Multiple object images are used as training data, and the mean, variance, and higher-order statistics of the distribution approximated by a specific distribution are output for each unit pixel so that the error between input and output is small. Learning department, which learns a probabilistic model,
A learning device equipped with.
The learning device according to claim 1.
A learning device in which the object image is a non-defective image obtained by imaging a non-defective product.
The learning device according to claim 1 or 2.
A learning device in which the specific distribution is a normal distribution.
The learning device according to any one of claims 1 to 3.
A learning device in which the probability model is a variational autoencoder.
The learning device according to any one of claims 1 to 4.
A learning device in which the higher-order statistics are skewness or kurtosis.
An image inspection apparatus using a learning model in which learning is performed by the learning apparatus according to any one of claims 1 to 5.
Statistic acquisition in which the inspection image to be inspected is input to the probability model having the learned parameters obtained by the learning unit, and the mean, variance, and higher-order statistics for each unit pixel with respect to the inspection image are acquired. Department and
An abnormality detection unit that detects an abnormality in the inspection image based on the mean, variance, and higher-order statistics acquired by the statistic acquisition unit.
Imaging inspection equipment, including.
The image inspection apparatus according to claim 6.
The abnormality detection unit
An image inspection device for detecting an abnormality by excluding unit pixels in which the higher-order statistics acquired by the statistic acquisition unit exceeds a predetermined threshold value from the inspection images.
A trained parameter of the probability model acquired by the learning device according to any one of claims 1 to 5.
It is a learning method for constructing an image inspection method.
Multiple object images are used as training data, and the mean, variance, and higher-order statistics of the distribution approximated by a specific distribution are output for each unit pixel so that the error between input and output is small. The process of learning a stochastic model,
Learning methods, including.
It is an image inspection device
Multiple object images are used as learning data so that the error between input and output is small, and
For each unit pixel, the mean, variance, and mean for each unit pixel of the inspection image using a probabilistic model with trained parameters trained to output the mean, variance, and higher-order statistics of the distribution approximated by a particular distribution. , Variance, and statistic acquisition unit to acquire higher-order statistics,
An abnormality detection unit that detects an abnormality in the inspection image based on the average, variance, and higher-order statistics of the inspection image for each unit pixel obtained by the statistic acquisition unit.
An image inspection device.
The image inspection apparatus according to claim 10.
The abnormality detection unit is an image inspection device that detects an abnormality by excluding unit pixels in which the higher-order statistics acquired by the statistic acquisition unit exceeds a predetermined threshold value from the inspection images.
It is an image inspection method
Multiple object images are used as training data, and the mean, variance, and higher-order statistics of the distribution approximated by a specific distribution are output for each unit pixel so that the error between input and output is small. A statistic acquisition process for acquiring the mean, variance, and higher-order statistics for each unit pixel of the inspection image using a probabilistic model with the trained parameters trained in this way.
An abnormality detection step of detecting an abnormality in the inspection image based on the average, variance, and higher-order statistics of the inspection image for each unit pixel obtained by the statistic generation step.
Imaging inspection methods, including.