CN112435258A

CN112435258A - Image detection model construction method, image detection method and device

Info

Publication number: CN112435258A
Application number: CN202011495984.7A
Authority: CN
Inventors: 李杰明; 杨洋
Original assignee: Shenzhen Huahan Weiye Technology Co ltd
Current assignee: Shenzhen Huahan Weiye Technology Co ltd
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2021-03-02

Abstract

The application relates to a construction method of an image detection model, an image detection method and a device, wherein the construction method comprises the following steps: establishing a structure of a cascade feedback network; obtaining a corresponding loss function according to the structural configuration of the cascade feedback network; training the cascade feedback network by using a plurality of normal sample images of the object to be detected, updating network parameters of the cascade feedback network through a loss function, and obtaining an image detection model after training. The method has the advantages that the shallow self-encoder is used for forming the network nodes in the cascade feedback network, the structure similar to the cyclic neural network can enable parameters of each network node to be completely consistent, the parameter quantity is greatly reduced compared with that of the existing method, and the method has the advantages of convenience in transmission, storage and deployment, so that the network training process is simplified, and the convergence rate of the loss function is accelerated.

Description

Image detection model construction method, image detection method and device

Technical Field

The invention relates to the technical field of image processing, in particular to a construction method of an image detection model, an image detection method and an image detection device.

Background

In recent years, deep learning has become a focus of attention in various fields at home and abroad, and the deep learning includes two types, namely supervised learning and unsupervised learning. In the field of computer vision, supervised learning refers to training a neural network through one-to-one correspondence of images and labeled information, so that the neural network can complete the work of classification, target detection, semantic segmentation and the like; unsupervised learning refers to training a neural network only by using image information without labels, so that the neural network can finish the work of clustering, anomaly detection, image generation and the like. In the field of industrial quality inspection, widely applied methods include a method for manually selecting features and a supervised deep learning method (hereinafter referred to as a supervised learning method).

There are still some limitations in the manual feature selection method: information such as the shape, the pose and the color of an object to be detected needs to be changed within a certain range, and when the information of the shape, the pose and the color of the object is changed too much, the pixel precision of an abnormal area (such as holes, cracks, cuts, printing and the like on the surface of the object) and a normal area is difficult to judge through manually establishing a standard. No matter between the standard image and the defect image, or between the standard image and the standard image, or between the same kind of defect images, under the condition that the change range of the surface shape and the pose of the object is large, the detection is often difficult to be carried out by manually selecting the features.

In recent years, the situation that a manual feature selection method is difficult to work is solved by using a supervised learning method. The method comprises the steps of designing a convolutional neural network, collecting and labeling images (including a large number of normal images and abnormal images) of an object to be detected to form a data set, and then training the convolutional neural network by using the data set, so that automatic selection and judgment of characteristics are realized. Although the supervised learning method can still generate results with high accuracy and robustness under the condition that the information change range of the shape, the pose, the color and the like of an object is large, the method also has some obvious defects, on one hand, the abnormal samples with enough quantity and types are difficult to obtain, and on the other hand, the problems of long time consumption and high cost are directly caused to the labeling work of a large number of images.

Disclosure of Invention

The invention mainly solves the technical problems that: how to overcome the defects of the existing deep learning method in industrial quality inspection. In order to solve the technical problem, the application provides a construction method of an image detection model, an image detection method and an image detection device.

According to a first aspect, an embodiment is a method for constructing an image detection model, which includes: establishing a structure of a cascade feedback network; the cascade feedback network comprises a plurality of network nodes formed by a plurality of shallow layer self-encoders through cascade feedback; obtaining a corresponding loss function according to the structural configuration of the cascade feedback network; and training the cascade feedback network by using a plurality of normal sample images of the object to be detected, updating network parameters of the cascade feedback network through the loss function, and obtaining an image detection model after training.

The structure for establishing the cascade feedback network comprises the following steps: forming each shallow layer self-encoder by utilizing a convolution nerve unit; the shallow layer self-encoder comprises an encoder consisting of a convolutional layer and a downsampling layer, and a decoder consisting of an upsampling layer and a convolutional layer; the encoder is used for receiving the image input by the shallow self-encoder and converting the image into semantic information, and the decoder is used for restoring the semantic information and outputting a reconstructed image; sequencing the shallow self-encoders in sequence, feeding back the output of each shallow self-encoder to the input of the next shallow self-encoder, and taking each shallow self-encoder as a network node in the cascade feedback network; and setting each network node as a node grouping, and establishing the cascade feedback network in a cascade form of each network node in the node grouping.

The obtaining of the corresponding loss function according to the structural configuration of the cascaded feedback network includes: calculating the Euclidean distances of images corresponding to the head and the tail network nodes in the node grouping respectively so as to obtain the first image reconstruction quality represented by the Euclidean distances of the images, and expressing the first image reconstruction quality as

Wherein x is₀An image input for the 1 st network node in the cascaded feedback network,

for the reconstructed image output by the 1 st network node,

a reconstructed image output for the nth network node; calculating variance statistics of a number of network nodes within the node grouping to obtain a second image reconstruction quality characterized by variance statistics and formulated as

Wherein m is the number of a plurality of network nodes, c is the number of channels of the reconstructed image output by the kth network node,

the value of the ith channel on the reconstructed image output for the kth network node,

is the average of the ith channel over the m reconstructed images, an

Satisfy the requirement of

Calculating to obtain a third image reconstruction quality according to the first image reconstruction quality and the second image reconstruction quality, and expressing the third image reconstruction quality as formula

Loss_d＝Loss_b+Loss_var；

Representing formula Loss of the first image reconstruction quality_bOr the expression Loss of the quality of the third image reconstruction_dAnd configuring a loss function corresponding to the cascade feedback network.

The training of the cascade feedback network by using a plurality of normal sample images of an object to be detected, the updating of the network parameters of the cascade feedback network by the loss function, and the obtaining of an image detection model after the training, comprise: acquiring a plurality of normal sample images of an object to be detected; the normal sample image does not contain the surface abnormal area of the object to be detected; the normal sample image is used as an image input by a first network node in the cascade feedback network, and each normal sample image is sequentially input to the cascade feedback network for training; finishing training when the calculation difference value before and after the loss function corresponding to the cascade feedback network is smaller than a preset threshold value or the corresponding loss function reaches a preset iteration number; and obtaining the image detection model of the object to be detected by using the cascade feedback network with updated network parameters when the training is finished.

According to a second aspect, an embodiment provides a method of image detection based on cascaded feedback, comprising: acquiring a to-be-detected image of an object to be detected; inputting the image to be detected into the image detection model constructed by the construction method in the first aspect, and detecting to obtain a reconstructed image output by any network node in the cascade feedback network; and comparing the reconstructed image with the image to be detected to obtain the surface abnormal area of the object to be detected.

Comparing the reconstructed image with the image to be detected to obtain the surface abnormal area of the object to be detected, comprising the following steps: constructing an evaluation function of a surface abnormal region by using the reconstructed image and the image to be detected; when the value of the evaluation function is larger than or equal to a preset value, determining that the reconstructed image contains the surface abnormal area of the object to be detected, and comparing the difference value of the reconstructed image and the image to be detected to obtain the surface abnormal area of the object to be detected; and outputting the image to be detected of the object to be detected and the surface abnormal area of the object to be detected.

The method for constructing the evaluation function of the surface abnormal region by using the reconstructed image and the image to be detected comprises the following steps: obtaining a reconstructed image output by each network node in the cascade feedback network after the image to be detected is input into the image detection model, thereby constructing an evaluation function of the surface abnormal area, wherein the evaluation function is expressed as the following formula

Wherein x is₀' for the image to be detected,

a reconstructed image output by any network node when the image to be detected is input into the cascade feedback network,

an output image M (x ') of an intermediate network layer of any network node when the image to be detected is input into the cascade feedback network'₀) When the image to be detected is input into the cascade feedback network, the output image of the middle network layer of the first network node is s, which is a set formed by a plurality of network nodes, and subscripts n, j and k are the serial numbers of the network nodes; m is the number of a plurality of network nodes, c is the number of channels of the reconstructed image output by the kth network node,

the value x of the ith channel on the reconstructed image output by the kth network node when the image to be detected is input into the cascade feedback network_i' is the average value of the ith channel on the m reconstructed images when the image to be detected is input into the cascade feedback network.

According to a third aspect, an embodiment provides an image detection apparatus, comprising: the image acquisition component is used for acquiring an image to be detected of an object to be detected; a processor, connected to the image acquisition component, configured to construct an image detection model according to the construction method in the first aspect, and/or obtain a surface abnormal region of an object to be detected in the image to be detected according to the image detection method in the second aspect; and the display is connected with the processor and used for displaying the image to be detected and the surface abnormal area of the object to be detected.

The processor comprises a model building module and an abnormality detection module; the model construction module is used for training a pre-established cascade feedback model by utilizing one or more normal sample images and updating network parameters through a loss function to obtain an image detection model; the cascade feedback network comprises a plurality of network nodes formed by a plurality of shallow layer self-encoders through cascade feedback, and the loss function is configured according to the structure of the cascade feedback network; and the abnormity testing module is connected with the model building module and used for inputting the image to be detected into the image detection model and outputting the surface abnormal area of the object to be detected through detection processing.

According to a fourth aspect, an embodiment provides a computer-readable storage medium comprising a program executable by a processor to implement the construction method described in the first aspect above and/or to implement the image detection method described in the second aspect above.

The beneficial effect of this application is:

according to the embodiment, the image detection model construction method, the image detection method and the image detection device are provided. The construction method comprises the following steps: establishing a structure of a cascade feedback network; obtaining a corresponding loss function according to the structural configuration of the cascade feedback network; training the cascade feedback network by using a plurality of normal sample images of the object to be detected, updating network parameters of the cascade feedback network through a loss function, and obtaining an image detection model after training. The image detection method comprises the following steps: acquiring a to-be-detected image of an object to be detected; inputting an image to be detected into the constructed image detection model, and detecting to obtain a reconstructed image output by any network node in the cascade feedback network; and comparing the reconstructed image with the image to be detected to obtain the surface abnormal area of the object to be detected. On the first hand, the image detection model is obtained by utilizing the cascade feedback network for training, the unsupervised learning mode only needs to use a plurality of normal images for training, and abnormal sample images and pre-labeling are not needed, so that not only is a training set easy to obtain, but also time and energy are not needed for labeling, and the construction efficiency of the image detection model is improved; in the second aspect, a shallow self-encoder is used to form network nodes in the cascade feedback network, the structure similar to a cyclic neural network can ensure that the parameters of each network node are completely consistent, the parameter quantity is greatly reduced compared with the prior method, and the advantage of convenient transmission, storage and deployment is achieved, so that the network training process is simplified, and the convergence rate of the loss function is accelerated; and in the third aspect, the image to be detected is input into the image detection model, so that the reconstructed image output by any network node in the cascade feedback network can be conveniently detected, the generated reconstructed image has the characteristics of high pixel precision and good characteristic reconstruction effect, the reconstruction error of the abnormal region is larger, and the detection of the surface abnormal region can be completed only by separating the abnormal region from the normal region through a simple standard.

Drawings

FIG. 1 is a flowchart illustrating a method for constructing an image detection model according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of establishing a cascaded feedback network;

FIG. 3 is a flow chart of a configuration loss function;

FIG. 4 is a flow chart of a trained image detection model;

FIG. 5 is a schematic diagram of a shallow layer self-encoder, in which FIG. 5a is a schematic diagram of the connection between the encoder and the decoder, and FIG. 5b is a schematic diagram of the connection between the convolutional layer, the down-sampling layer and the up-sampling layer;

FIG. 6 is a schematic diagram of cascade feedback of multiple shallow autoencoders;

FIG. 7 is a flowchart of an image detection method based on cascade feedback according to a second embodiment of the present application;

FIG. 8 is a flowchart for obtaining the surface abnormal region of the object to be detected;

fig. 9 is a detection result of an object to be detected, where fig. 9a is a detection result of an object to be detected in a no-abnormal area, fig. 9b is a detection result of an object to be detected in a hole area, fig. 9c is a detection result of an object to be detected in a burst area, fig. 9d is a detection result of an object to be detected in a shear mark area, and fig. 9e is a detection result of an object to be detected in a printed area;

FIG. 10 is a schematic structural diagram of an image detection apparatus according to a third embodiment of the present application;

FIG. 11 is a schematic diagram of a processor;

fig. 12 is a schematic structural diagram of an image detection apparatus according to a fourth embodiment of the present application.

Detailed Description

The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.

Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.

The numbering of the components as such, e.g., "first", "second", etc., is used herein only to distinguish the objects as described, and does not have any sequential or technical meaning. The term "connected" and "coupled" when used in this application, unless otherwise indicated, includes both direct and indirect connections (couplings).

The unsupervised image detection method mainly includes a generation countermeasure Network (GAN), an Auto Encoder (AE), and a Variation Auto Encoder (VAE). The generation countermeasure network (GAN) is composed of a generation network and a discriminant network, and the generation countermeasure network randomly samples from the potential space as input, and the output of the generation countermeasure network needs to imitate the real sample of the training set as much as possible. The self-encoder (AE) is composed of an encoder (encoder) and a decoder (decoder), image information generates high-dimensional and low-resolution semantic information through the encoder, the semantic information is directly used as a latent variable, the decoder restores the latent variable into an image with the same original image format through upsampling and convolutional neural network, and the output image needs to simulate the input image as much as possible to achieve the effect of image reconstruction. The variable self-encoder (VAE) is also composed of an encoder (encoder) and a decoder (decoder), image information generates high-dimensional and low-resolution semantic information through the encoder, potential variables are obtained by sampling in Gaussian noise and other random distribution through calculating information such as the mean value, the variance and the like of the semantic information generated by the encoder, the decoder restores the potential variables into image information through upsampling and a convolutional neural network, the output of the decoder needs to simulate the input image as much as possible, and the image reconstruction effect is achieved.

The generation countermeasure network, the self-encoder and the variation self-encoder are applied to an anomaly detection scene, and the working principle of the generation countermeasure network, the self-encoder and the variation self-encoder is as follows: the neural network is trained by using a normal image, and the output of the neural network needs to imitate the input image as much as possible, namely, the neural network has a smaller reconstruction error in a normal area (the input original image I outputs a reconstructed image O through the corresponding neural network, and the reconstruction error refers to the difference between the reconstructed image O and the original image I). Meanwhile, the neural network is not trained by abnormal image data, so that the neural network often has larger reconstruction errors in abnormal regions. By generating the reconstruction error, the region with smaller reconstruction error is judged to be a normal region, and the region with larger reconstruction error is judged to be an abnormal region, so that the pixel precision of the abnormal region is detected. However, there are some application deficiencies to adopting either method alone, and the disadvantages of generating the countermeasure network and the variational self-encoder are: because the information of the original image is not fully utilized, it is difficult to generate a reconstructed image with high pixel precision (i.e., the generated image is only approximately close to the input image, and the pixel position precision and numerical precision of the generated reconstructed image are poor), and it is difficult to establish a judgment standard to distinguish a defective region from a normal region. The disadvantages of the self-encoder are: due to the adoption of multi-layer down-sampling, the pixel position accuracy is poor, the reconstruction effect on smaller features is not good, and the normal area and the abnormal area are difficult to judge by comparing the difference between the original image and the reconstructed image (namely generating reconstruction errors).

For image reconstruction, several methods are needed to address the accuracy of feature reconstruction. One is to adopt a deeper network structure and ensure the position precision and the feature precision of output detection through the fusion of bottom layer features and high layer features; one is to adopt a wider network structure, and ensure the adaptability to objects with different sizes by fusing the characteristics of different reception fields, the detection effect of the large reception field on the large object is good, and the detection requirement of the small reception field on the small object is met. In the technical scheme of the application, for the input image x, a function needs to be constructed

To better reflect the transformation and mapping relationship between the input image to the output image, where ω is the reconstruction and solution parameter.

When the self-encoder is used for reconstructing an image, if a deep convolutional neural network structure is adopted (namely a plurality of convolutional layers and downsampling layers are adopted), the normal region and the abnormal region can be distinguished by comparing reconstruction errors of the normal region and the abnormal region, but due to the existence of the plurality of convolutional layers and the downsampling layers, the generated reconstructed image has the defects of poor pixel position precision, poor reconstruction effect on smaller features and the like. If a shallow convolutional neural network structure is adopted, the pixel position precision is high, and the reconstruction effect on smaller features is good, but because the number of convolutional layers and downsampling layers of a shallow self-encoder is small, the encoder uses semantic codes generated by lower-level features, and in the reconstruction process, similar low-level features are likely to be contained in an abnormal region, so that the reconstruction error of an abnormal image is likely to be close to the reconstruction error of a normal image in value, and the normal region and the abnormal region are difficult to distinguish by comparing the reconstruction errors of the normal region and the abnormal region. Then, the generated reconstructed image should satisfy: the reconstruction effect of the characteristic of high pixel precision and small pixel precision is good, and the reconstruction error of the abnormal region is large, so that the abnormal region and the normal region can be separated by a simple standard.

The technical scheme of the application is that an image detection model is constructed based on the concepts of cascade connection and feedback. The purpose of the cascade process is to construct a deeper network structure and simultaneously maintain the specific position information and characteristic information of the shallow network; the purpose of the feedback is to maintain normal structural features during the reconstruction process, gradually increasing the distance between the abnormal features and the normal features. According to the method, a form that a cyclic convolution neural network (RNN) and an Auto Encoder (Auto Encoder) are combined is adopted, high-quality reconstruction is generated on an original image through a shallow Auto Encoder, an output reconstruction image is input into the shallow Auto Encoder again, and by analogy, a reconstruction result of previous iteration is continuously used as the input of the iteration, the reconstruction error of an abnormal region is gradually amplified through multiple iterations, and meanwhile, a reconstruction image of a normal region is basically kept unchanged. Because the shallow self-encoder contains fewer continuous down-sampling layers, the reconstructed image always keeps higher pixel position precision and better smaller characteristic reconstruction effect in the reconstruction process.

The technical solution of the present application will be specifically described with reference to the following examples.

The first embodiment,

Referring to fig. 1, the present embodiment discloses a method for constructing an image detection model, which includes steps S110 to S130, which are described below.

Step S110, a structure of a cascaded feedback network is established, where the cascaded feedback network includes a plurality of network nodes formed by a plurality of shallow layer self-encoders through cascaded feedback.

The cascade feedback network is a network structure formed by a plurality of shallow self-encoders through cascade feedback, the shallow self-encoders are sequentially sequenced to form a hierarchical structure form, then the output of each shallow self-encoder is fed back to the input of the next shallow self-encoder, and at the moment, each shallow self-encoder is used as a network node in the cascade feedback network.

The shallow layer self-encoder is a self-encoder with a small number of convolution layers and downsampling layers, and is also an artificial neural network for realizing efficient representation of output data in an unsupervised learning mode. This efficient representation of the output data is called encoding, which typically encodes information in much smaller volumes than the input data, making the self-encoder useful for dimensionality reduction. More importantly, the self-encoder can be used as a powerful feature detector and applied to the pre-training of the deep neural network.

And step S120, obtaining a corresponding loss function according to the structural configuration of the cascade feedback network.

Because the cascade feedback network has a plurality of network nodes, and the network nodes are connected in a cascade feedback mode, the input and the output of each network node can be represented, so that the image reconstruction quality is obtained through image Euclidean distance calculation, pixel statistical analysis and other modes, and the loss function corresponding to the cascade feedback network is configured.

Step S130, training the cascade feedback network by using a plurality of normal sample images of the object to be detected, updating network parameters of the cascade feedback network through a loss function, and obtaining an image detection model after training.

In order to learn the surface characteristics of the object to be detected, a plurality of normal sample images can be sequentially input into a cascade feedback network for training; and ending the training when the calculation difference value before and after the loss function corresponding to the cascade feedback network is smaller than a preset threshold value or the corresponding loss function reaches a preset iteration number, so that the trained cascade feedback network is used as an image detection model of the object to be detected.

It should be noted that the object to be detected can be a product on an industrial production line, a mechanical part in an article box, a tool on an operation table, and the like. When an image of the surface of an object to be detected is shot and obtained, the surface characteristics of the object are displayed or presented on the corresponding pattern, and if the surface of the object to be detected has defects such as holes, cracks, cuts, printing, dust, flaws, dirt and the like, the shot image is an abnormal sample image; if the surface of the object to be detected does not have these defects, the captured image will be a normal sample image.

In this embodiment, referring to fig. 2, the step S110 mainly relates to a process of establishing a cascaded feedback network, and may specifically include steps S111 to S113, which are respectively described as follows.

And step S111, forming each shallow layer self-encoder by utilizing a convolution nerve unit.

Referring to fig. 5a and 5b, the shallow self-encoder includes an encoder composed of a convolutional layer and a downsampled layer, and a decoder composed of an upsampled layer and a convolutional layer; the convolutional layer and the downsampling layer included in the encoder and the convolutional layer and the upsampling layer included in the decoder have a one-to-one correspondence relationship. In addition, the encoder is used to receive shallow layers of images input from the encoder and convert to semantic information, such as latent semantic coding of features A, B, C, D in the input image; correspondingly, the decoder is used to restore semantic information and output reconstructed images, such as output image features a ', B', C ', D', where there is typically only a difference in information content between the features a ', B', C ', D' and the feature A, B, C, D.

For each shallow layer self-encoder, because less convolution operation and downsampling operation are adopted, the pixel position accuracy of a reconstructed image is easy to ensure to be higher, and the reconstruction effect on smaller features is better. The working principle of the shallow self-encoder can be expressed by the following formula:

wherein, x is an input image,

for output features, z is the latent semantic code, E is the encoder neural network, and D is the decoder neural network. The working goal of the shallow self-encoder is to make the output

As much as possible consistent with the input x.

It should be noted that the shallow encoder works by simply learning to copy the input to the output, and this task (i.e., the task of inputting the training data and then outputting the training data) adds constraints to the neural network in different ways, which can make this task extremely difficult. For example, the size of the internal representation may be limited, or the self-encoder may be trained to restore its original characteristics by adding noise to the training data, which prevents the self-encoder from mechanically copying the input to the output and forces it to learn an efficient representation of the data.

Since the number of convolutional layers and downsampling layers of the shallow self-encoder is small, meaning that the encoder uses semantic encoding generated by using lower-level features, during reconstruction, the abnormal region is likely to contain similar low-level features, and therefore the reconstruction error of the abnormal image may be close to the reconstruction error of the normal image in value.

And step S112, sequencing the shallow self-encoders in sequence, feeding back the output of each shallow self-encoder to the input of the next shallow self-encoder, and taking each shallow self-encoder as a network node in the cascade feedback network.

Referring to fig. 6, N shallow autoencoders (shallow autoencoder 1, shallow autoencoders 2, …, shallow autoencoder N) are sequentially arrangedSequencing, inputting the normal sample image into a shallow layer self-encoder 1, obtaining an image reconstruction result after the image passes through the shallow layer self-encoder 1, and reconstructing an image R₁Arriving as input to the shallow auto-encoder 2, resulting in an image reconstruction result, reconstructed image R₂And inputting the data to a next shallow layer self-encoder, so as to analogize, continuously taking the output of the previous iteration as the input of the next iteration, and gradually amplifying the reconstruction error through multiple loop iterations. In the resulting image reconstruction result, i.e. reconstructed image R_NIn the method, the reconstruction error of the abnormal region is obviously larger than that of the normal region, so that the abnormal region and the normal region can be distinguished.

Some formulas may be used to represent the image iteration process for shallow self-encoding:

where φ is the shallow autoencoder, ω is the weight parameter of the neural network, x₀Is the input image of the cascaded feedback network (the input image of the training phase is the normal sample image),

is x₀The reconstructed image generated by the 1 st network node,

is that

Through the n (n) th>0) The reconstructed image generated by each node (here, the reconstructed result and the original image are distinguished, and the reconstructed result is marked with a symbol).

Referring to fig. 6, the N shallow layer self-encoders structurally represent a structure similar to a cascaded feedback network, and each shallow layer self-encoder is a unit formed by each shallow layer self-encoder, and includes a plurality of shallow layer self-encoders with the same (or different) parameters, and each shallow layer self-encoder may be referred to as a network node. It should be noted that, the normal sample images are used to train these network nodes, and the training target is that the output of the network nodes is as consistent as possible with the input normal images, so that the trained neural network can be obtained.

And step S113, setting each network node as a node grouping, and establishing a cascade feedback network in a cascade form of each network node in the node grouping. For example, N network nodes in fig. 6 are set as a node group, so that the cascaded feedback forms a cascaded feedback network.

In this embodiment, referring to fig. 3, the step S120 mainly relates to a process of configuring a loss function, and specifically may include steps S121 to S124, which are respectively described as follows.

Step S121, calculating Euclidean distances of images corresponding to the first network node and the last network node in the node grouping respectively, thereby obtaining first image reconstruction quality represented by the Euclidean distances of the images, and expressing the first image reconstruction quality as the Euclidean distances of the images by using a formula

Wherein x is₀For the image input for the 1 st network node in the cascaded feedback network,

for the reconstructed image output by the 1 st network node,

and outputting the reconstructed image for the Nth network node.

In the cascaded feedback network, N network nodes are assumed, and the input of the cascaded feedback network is x₀. Then the 1 st network node input is x₀Output is

The 2 nd network node inputs as

Output is as

By analogy, the Nth network node can be known as input

Output is as

First image reconstruction quality Loss_bThe configuration is mainly based on the difference between the reconstruction result of the cascaded feedback network and the original image.

For the first image reconstruction quality, Loss_bRepresenting the quality of the reconstruction of the training image, is mainly composed of two parts: the Euclidean distance between the reconstruction result output by the first node and the training image, and the Euclidean distance between the reconstruction result output by the last node and the training image. It can be understood that the cascade feedback network is trained by using the normal sample image, so that it can be ensured that each network node of the cascade feedback network has a good reconstruction result (a small euclidean distance with the original image) for the normal region, and has a poor reconstruction result (a large euclidean distance with the original image) for the untrained abnormal region.

Step S122, calculating variance statistic of a plurality of network nodes in the node grouping so as to obtain second image reconstruction quality represented by the variance statistic, and expressing the second image reconstruction quality as

is the average of the ith channel over the m reconstructed images, an

Satisfy the requirement of

In the cascade feedback network, the reconstructed image output by each network node can be used

Indicating that nodes are distinguished by subscripts, e.g. the i-th node outputs a reconstruction result of

Each network node outputs a reconstructed image of w x h x c (w is the image width, h is the image height, and c is the number of image channels), l (2 < l is not more than N) network nodes are selected from the total N network nodes to obtain the output of each network node, then the original image is combined to obtain l +1 images in total, wherein 1 image is the original normal sample image, m-1 reconstructed images of different network nodes are included, each pixel point is analyzed to obtain w x h c histograms, each histogram is composed of l +1 groups of data, and statistical analysis (such as variance calculation, difference calculation among any number of groups and the like) is carried out on the histograms to obtain abnormal regions. Second image reconstruction quality Loss_varThe configuration is mainly based on statistical information of a plurality of network nodes of the cascaded feedback network.

Step S123, calculating to obtain a third image reconstruction quality according to the first image reconstruction quality and the second image reconstruction quality, and expressing the third image reconstruction quality as a formula

Loss_d＝Loss_b+Loss_var。

As can be appreciated, the third image reconstruction quality Loss_dIs the reconstruction quality Loss from the first image_bAnd second image reconstruction quality Loss_varAnd (4) combining to form.

Step S124, representing formula Loss of first image reconstruction quality_bOr third image reconstruction quality expression formula Loss_dConfigured to cascade corresponding loss functions of the feedback network.

It can be understood that the key to the training of the cascade feedback network is to construct a corresponding loss function, and the quality of the loss function reflects the construction capability of the image detection model to a certain extent. Will lose_dWhen the loss function corresponding to the cascaded feedback network is configured, the difference between the reconstruction result of the cascaded feedback network and the original graph is considered, and the statistical information of a plurality of network nodes of the cascaded feedback network is also considered.

In this embodiment, referring to fig. 4, the step S140 mainly relates to a process of training to obtain an image detection model, and specifically may include steps S131 to S133, which are respectively described as follows.

Step S131, a plurality of normal sample images of the object to be detected are obtained. The normal sample image here does not include the surface abnormality region of the object to be detected.

And S132, using the normal sample image as an image input by a first network node in the cascade feedback network, and sequentially inputting each normal sample image into the cascade feedback network for training.

Step S133, finishing training when the calculation difference value before and after the loss function corresponding to the cascade feedback network is smaller than a preset threshold value or the corresponding loss function reaches a preset iteration number; and obtaining an image detection model of the object to be detected by using the cascade feedback network with updated network parameters when the training is finished.

It can be understood that, in this embodiment, the cascade feedback network is used for training to obtain the image detection model, and the unsupervised learning mode only needs to use a plurality of normal images to participate in training, and does not need abnormal sample images and prior labeling, so that not only is the training set easy to obtain, but also the time and effort are not needed to be spent on labeling, and the construction efficiency of the image detection model is improved.

It can be understood that in the embodiment, the shallow self-encoder is used in the cascaded feedback network to form the network nodes, and the structure similar to the cyclic neural network can keep parameters of each network node completely consistent, so that the parameter quantity is greatly reduced compared with the existing method, and the advantage of convenience in transmission, storage and deployment is achieved, thereby simplifying the network training process and accelerating the convergence rate of the loss function.

Example II,

Referring to fig. 7, the present application discloses a method for constructing an image detection model, which includes steps S210-S230, which are described below.

Step S210, acquiring an image to be detected of the object to be detected. The image to be detected may include a surface normal region and a surface abnormal region of the object to be detected, and it is necessary to obtain the surface abnormal region by image detection.

And S220, inputting the image to be detected into the constructed image detection model, and detecting to obtain a reconstructed image output by any network node in the cascade feedback network.

It should be noted that the image detection model is an image detection model constructed by the construction method in the first embodiment, and is a trained cascade feedback network, and includes a plurality of shallow layer self-encoders formed by cascade feedback, and each shallow layer self-encoder serves as a network node in the cascade feedback network.

For any network node, the output reconstructed image can be represented as

Wherein phi is a shallow layer self-encoder, omega is a weight parameter of the neural network,

when the image to be detected is input into the image detection model, the input image of the nth network node,

and outputting the reconstructed image by the nth network node when the image to be detected is input into the image detection model.

And step S230, comparing the reconstructed image with the image to be detected to obtain the surface abnormal area of the object to be detected.

In this embodiment, referring to fig. 8, the step S230 mainly relates to a process of comparing and obtaining the surface abnormal region of the object to be detected, and may specifically include steps S231 to S233, which are respectively described as follows.

And S231, constructing an evaluation function of the surface abnormal region by using the reconstructed image and the image to be detected.

In a specific embodiment, after an image to be detected is input into an image detection model, a reconstructed image output by each network node in a feedback network is cascaded, so that an evaluation function of a surface abnormal area is constructed, wherein the evaluation function is expressed by any one of the following formulas

Wherein x is₀' is an image to be detected,

when the image to be detected is input into the cascade feedback network, the reconstructed image output by any network node,

an output image M (x ') of an intermediate network layer of any one of the network nodes when the image to be detected is input into the cascade feedback network'₀) When an image to be detected is input into a cascade feedback network, an output image of a middle network layer of a first network node is input, s is a set formed by a plurality of network nodes, and subscripts n, j and k are serial numbers of the network nodes; m is the number of a plurality of network nodes, c is the number of channels of the reconstructed image output by the kth network node,

when the image to be detected is input into the cascade feedback network, the value of the ith channel on the reconstructed image output by the kth network node,

and inputting the average value of the ith channel on the m reconstructed images when the images to be detected are input into the cascade feedback network.

It should be noted that, since M () represents an output result of the intermediate network layer, M may represent any one of a convolutional layer, a downsampling layer, and an upsampling layer.

Step S232, when the value of the evaluation function is larger than or equal to the preset value, determining that the reconstructed image contains the surface abnormal area of the object to be detected, and comparing the difference value of the reconstructed image and the image to be detected to obtain the surface abnormal area of the object to be detected.

For the

Any one of the evaluation functions represented indicates if the function value is greater than or equal to a predetermined value (which may be generated by user pre-setting or system default), and so on

Represented reconstructed image and to-be-detected image x'₀Or there is a difference with other reconstructed images, the reconstructed image

Including the surface anomaly region of the object to be inspected.

It should be noted that when the difference value between the reconstructed image and the image to be detected is compared, only the gray difference value between the reconstructed image and the image to be detected needs to be calculated, and the pixel points whose calculation result is greater than the preset threshold value are the pixel points in the surface abnormal region; and obtaining the surface abnormal area of the object to be detected after counting the pixel points.

Step S233, outputting the image to be detected of the object to be detected and the surface abnormal region of the object to be detected.

For an object to be detected, which is hazelnut, the image to be detected and the surface abnormal area can be referred to fig. 9. The left image in fig. 9a is an image to be detected with no surface abnormality of hazelnut, and the right image is a comparison result of a difference between a reconstructed image output by any network node and the image to be detected, which indicates that no surface abnormality region exists on the hazelnut surface. The left image in fig. 9b is an image to be detected with a hole on the hazelnut surface, and the right image is a comparison result of a difference between a reconstructed image output by any network node and the image to be detected, which shows that a surface abnormal region with a shape of a hole and the like exists on the hazelnut surface. The left image in fig. 9c is an image to be detected with a burst on the hazelnut surface, and the right image is a comparison result of a difference between a reconstructed image output by any network node and the image to be detected, which shows that a surface abnormal region with a shape of burst and the like exists on the hazelnut surface. The left image in fig. 9d is an image to be detected with a cut mark on the hazelnut surface, and the right image is a comparison result of a difference between a reconstructed image output by any network node and the image to be detected, which shows that a surface abnormal region with a shape of the cut mark and the like exists on the hazelnut surface. The left image in fig. 9e is the image to be detected with the printed hazelnut surface, and the right image is the difference comparison result between the reconstructed image output by any network node and the image to be detected, which indicates that the hazelnut surface has a surface abnormal area with the shape of printing and the like.

In fig. 9, the hazelnut surface anomaly region is represented by a large gray value because it is a region with a large difference; the normal area of the surface of hazelnut is represented by a smaller gray value because the difference is smaller.

It can be understood that, in this embodiment, the image to be detected is input to the image detection model, so that the reconstructed image output by any network node in the cascade feedback network is conveniently detected, and the generated reconstructed image has the characteristics of high pixel precision and good characteristic reconstruction effect, so that the reconstruction error of the abnormal region is relatively large, and the detection of the surface abnormal region can be completed only by separating the abnormal region from the normal region through a simple standard.

Example III,

Referring to fig. 10, the present embodiment discloses an image detection apparatus, and the image detection apparatus 3 mainly includes an image detection component 31, a processor 32 and a display 33, which are respectively described below.

The image pickup section 31 is for picking up an image to be inspected of an object to be inspected.

It should be noted that the acquisition of the image to be detected can be completed by using a CCD camera, a CMOS camera, a 3D camera or a video camera, and other grayscale or color video cameras, and if the camera/video camera captures a color image, the color image needs to be converted into a grayscale image to form the image to be detected. Of course, the image acquisition component 31 may also acquire a normal sample image of the object to be detected, thereby providing a sample for training of the cascaded feedback network.

It should be noted that the normal sample image of the object to be detected is used to participate in training the cascade feedback network, and the image to be detected of the object to be detected is used to input the image detection model to identify the surface abnormal region existing in the image. In addition, the object to be detected may be a product on a production line, a part on a tool table, or an object such as a human, an animal, or a plant, and is not limited specifically herein.

The processor 32 is connected to the image acquisition component 31, and the processor 32 is configured to obtain an image detection model through the construction method disclosed in the first embodiment, and/or obtain a surface abnormal region of the object to be detected in the image to be detected through the image detection method disclosed in the second embodiment.

The display 33 is connected to the processor 32, and the display 33 is used for displaying the image to be detected and the surface abnormal region of the object to be detected.

In this implementation, referring to FIG. 11, processor 32 may include a model building module 321 and an anomaly detection module 322, described separately below.

The model constructing module 321 is configured to train a pre-established cascade feedback model with one or more normal sample images, and update network parameters through a loss function to obtain an image detection model. The cascade feedback network comprises a plurality of network nodes formed by cascade feedback of a plurality of shallow self-encoders, and the loss function is configured according to the structure of the cascade feedback network. For specific functions of the model building module 321, reference may be made to steps S110 to S130 in the first embodiment, which is not described herein again.

The anomaly testing module 322 is connected to the model building module 321, and is configured to input the image to be detected into the image detection model, and output the surface anomaly region of the object to be detected through detection processing. For specific functions of the exception testing module 322, reference may be made to steps S210 to S230 in embodiment two, which are not described herein again.

Example four,

On the basis of the construction method disclosed in the first embodiment and the image detection method disclosed in the second embodiment, an image detection apparatus is disclosed in the present embodiment.

Referring to fig. 12, the image detection apparatus 4 mainly includes a memory 41 and a processor 42. The memory 41 serves as a computer-readable storage medium for storing a program, which may be a program code corresponding to the building methods S110 to S130 in the first embodiment, or a program code corresponding to the image detection methods S210 to S230 in the second embodiment.

The processor 42 is connected to the memory 41 for executing the programs stored in the memory 41 in a corresponding manner. The functions implemented by the processor 42 can refer to the processor 32 in the third embodiment, and will not be described in detail here.

Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by computer programs. When all or part of the functions of the above embodiments are implemented by a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.

The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

Claims

1. A construction method of an image detection model is characterized by comprising the following steps:

establishing a structure of a cascade feedback network; the cascade feedback network comprises a plurality of network nodes formed by a plurality of shallow layer self-encoders through cascade feedback;

obtaining a corresponding loss function according to the structural configuration of the cascade feedback network;

and training the cascade feedback network by using a plurality of normal sample images of the object to be detected, updating network parameters of the cascade feedback network through the loss function, and obtaining an image detection model after training.

2. The method of constructing according to claim 1, wherein the structure for establishing a cascaded feedback network comprises:

forming each shallow layer self-encoder by utilizing a convolution nerve unit; the shallow layer self-encoder comprises an encoder consisting of a convolutional layer and a downsampling layer, and a decoder consisting of an upsampling layer and a convolutional layer; the encoder is used for receiving the image input by the shallow self-encoder and converting the image into semantic information, and the decoder is used for restoring the semantic information and outputting a reconstructed image;

sequencing the shallow self-encoders in sequence, feeding back the output of each shallow self-encoder to the input of the next shallow self-encoder, and taking each shallow self-encoder as a network node in the cascade feedback network;

and setting each network node as a node grouping, and establishing the cascade feedback network in a cascade form of each network node in the node grouping.

3. The method of constructing according to claim 2, wherein the obtaining of the corresponding loss function according to the structural configuration of the cascaded feedback network comprises:

calculating the Euclidean distances of images corresponding to the head and the tail network nodes in the node grouping respectively so as to obtain the first image reconstruction quality represented by the Euclidean distances of the images, and expressing the first image reconstruction quality as

for the reconstructed image output by the 1 st network node,

a reconstructed image output for the nth network node;

calculating variance statistics of a number of network nodes within the node grouping to obtain a second image reconstruction quality characterized by variance statistics and formulated as

is the average of the ith channel over the m reconstructed images, an

Satisfy the requirement of

Loss_d＝Loss_b+Loss_var；

4. The constructing method of claim 3, wherein the training of the cascade feedback network by using a plurality of normal sample images of the object to be detected, the updating of the network parameters of the cascade feedback network by the loss function, and the obtaining of the image detection model after the training, comprises:

acquiring a plurality of normal sample images of an object to be detected; the normal sample image does not contain the surface abnormal area of the object to be detected;

the normal sample image is used as an image input by a first network node in the cascade feedback network, and each normal sample image is sequentially input to the cascade feedback network for training;

finishing training when the calculation difference value before and after the loss function corresponding to the cascade feedback network is smaller than a preset threshold value or the corresponding loss function reaches a preset iteration number;

and obtaining the image detection model of the object to be detected by using the cascade feedback network with updated network parameters when the training is finished.

5. An image detection method based on cascade feedback is characterized by comprising the following steps:

acquiring a to-be-detected image of an object to be detected;

inputting the image to be detected into an image detection model constructed by the construction method of any one of claims 1 to 4, and detecting to obtain a reconstructed image output by any network node in the cascade feedback network;

and comparing the reconstructed image with the image to be detected to obtain the surface abnormal area of the object to be detected.

6. The image detection method of claim 5, wherein comparing the reconstructed image with the image to be detected to obtain the surface abnormal region of the object to be detected comprises:

constructing an evaluation function of a surface abnormal region by using the reconstructed image and the image to be detected;

when the value of the evaluation function is larger than or equal to a preset value, determining that the reconstructed image contains the surface abnormal area of the object to be detected, and comparing the difference value of the reconstructed image and the image to be detected to obtain the surface abnormal area of the object to be detected;

and outputting the image to be detected of the object to be detected and the surface abnormal area of the object to be detected.

7. The image detection method according to claim 6, wherein said constructing an evaluation function of the surface abnormal region using the reconstructed image and the image to be detected comprises:

obtaining a reconstructed image output by each network node in the cascade feedback network after the image to be detected is input into the image detection model, thereby constructing an evaluation function of the surface abnormal area, wherein the evaluation function is expressed as the following formula

Wherein, x'₀For the image to be detected,

and inputting the average value of the ith channel on the m reconstructed images when the image to be detected is input into the cascade feedback network.

8. An image detection apparatus, characterized by comprising:

the image acquisition component is used for acquiring an image to be detected of an object to be detected;

a processor, connected to the image acquisition component, for constructing an image detection model by the construction method according to any one of claims 1 to 4, and/or for obtaining a surface abnormal region of an object to be detected in the image to be detected by the image detection method according to any one of claims 5 to 7;

and the display is connected with the processor and used for displaying the image to be detected and the surface abnormal area of the object to be detected.

9. The image sensing apparatus of claim 8, wherein the processor comprises a model building module and an anomaly detection module;

the model construction module is used for training a pre-established cascade feedback model by utilizing one or more normal sample images and updating network parameters through a loss function to obtain an image detection model; the cascade feedback network comprises a plurality of network nodes formed by a plurality of shallow layer self-encoders through cascade feedback, and the loss function is configured according to the structure of the cascade feedback network;

and the abnormity testing module is connected with the model building module and used for inputting the image to be detected into the image detection model and outputting the surface abnormal area of the object to be detected through detection processing.

10. A computer-readable storage medium, characterized by comprising a program executable by a processor to implement the construction method according to any one of claims 1 to 4 and/or to implement the image detection method according to any one of claims 5 to 7.