CN111083365B

CN111083365B - Method and device for rapidly detecting optimal focal plane position

Info

Publication number: CN111083365B
Application number: CN201911343111.1A
Authority: CN
Inventors: 陈根生
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2021-01-15
Anticipated expiration: 2039-12-24
Also published as: CN111083365A

Abstract

The invention discloses a method and a device for rapidly detecting the position of an optimal focal plane, wherein each original image is preprocessed by utilizing a first focusing evaluation function and a second focusing evaluation function which are constructed, wherein the first focusing evaluation function and the second focusing evaluation function are both in an image definition description mode; and then, a depth learning model is utilized to realize that the focusing state of the current image can be quickly judged by only utilizing one image, and the type of defocusing can be further judged.

Description

Method and device for rapidly detecting optimal focal plane position

Technical Field

The invention relates to the technical field of automatic focusing, in particular to a method and a device for quickly detecting the position of an optimal focal plane.

Background

With the rapid development of computer technology and the increasing maturity of digital image processing theory, the automatic focusing technology enters a new digital era. Auto Focus (Auto Focus) is a method that uses the principle of object light reflection to receive reflected light by a sensor CCD on a camera, and drives an electric focusing device to Focus through computer processing. In the automatic focusing technology, a problem to be solved is how to quickly and accurately determine the position of the optimal focal plane.

Generally, for a particular imaging system, the sharpness of its imaging reflects the state of focus of the system. When the focusing effect is good, the image is clear, the information such as contour details is rich, and different characteristic information is highlighted in a space domain or a frequency domain. For example, in the spatial domain, the gray value of the image is used as the main characteristic information; in the frequency domain, the characteristic information is a high frequency component. The sharpness evaluation function of an image, i.e., the focus function (focus function), can be used to measure whether the image is in the best focus state.

In the prior art, for example, chinese patent CN105785724 describes a mask pattern optimization method, an optimal focal plane position measurement method and a system, which construct an objective function by presetting transmittances and phases in different areas on an initial mask pattern, where the initial mask pattern corresponds to a phase shift mask, and continuously optimize the transmittances and phases of the objective function by combining an optimization algorithm to find the transmittance and phase parameters of the optimal mask pattern, that is, the optimal focal plane parameters. The patent uses an optimization algorithm to repeatedly optimize the objective function, so that the detection time of the optimal focal plane position is greatly increased.

Disclosure of Invention

The invention provides a method and a device for quickly detecting an optimal focal plane position, which are used for overcoming the defects of long detection time and the like in the prior art and realizing short detection time and high accuracy of the optimal focal plane position.

In order to achieve the above object, the present invention provides a method for rapidly detecting an optimal focal plane position, including:

101, acquiring N original images, wherein N is a positive integer;

102, preprocessing each original image based on the constructed first focusing evaluation function and the second focusing evaluation function, and marking the preprocessed images to obtain a data set { X, Y }, wherein X is a characteristic data set, and Y is a label set;

103, training the built deep learning model by using a data set { X, Y } to determine a weight parameter of the deep learning model and a mapping relation from input to output;

104, inputting the image to be detected preprocessed by the first focusing evaluation function and the second focusing evaluation function into a trained deep learning model to obtain whether the image to be detected is in a focusing or defocusing state and a defocusing type; and if the image is in the out-of-focus state, controlling the shooting system to move according to the out-of-focus type to obtain the image again, preprocessing the image, and inputting the preprocessed image into the trained deep learning model again for detection until the optimal focal plane position of the shooting system is obtained.

In order to achieve the above object, the present invention further provides a device for rapidly detecting an optimal focal plane position, including:

the image acquisition module is used for acquiring an original image;

the image processing module is used for marking the original images, preprocessing each original image based on the constructed first focusing evaluation function and the second focusing evaluation function, and obtaining a data set { X, Y }, wherein X is a characteristic data set, and Y is a label set;

the model training module is used for training the built deep learning model by utilizing a data set { X, Y } so as to determine the weight parameters of the deep learning model and the mapping relation from input to output;

the detection module is used for inputting the image to be detected preprocessed by the first focusing evaluation function and the second focusing evaluation function into the trained deep learning model to obtain whether the image to be detected is in a focusing or defocusing state and a defocusing type; and if the image is in the out-of-focus state, controlling the shooting system to move according to the out-of-focus type to obtain the image again, preprocessing the image, and inputting the preprocessed image into the trained deep learning model again for detection until the optimal focal plane position of the shooting system is obtained.

To achieve the above object, the present invention further provides a computer device, which includes a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method when executing the computer program.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a method for rapidly detecting the position of an optimal focal plane, which is characterized in that each original image is preprocessed by utilizing a first focusing evaluation function and a second focusing evaluation function which are constructed, wherein the first focusing evaluation function and the second focusing evaluation function are both in an image definition description mode, and the first focusing evaluation function aims at the problems of poor unimodal performance and poor anti-manufacturing effect of the traditional focusing evaluation function when the image brightness changes uniformly; the second focusing evaluation function aims at the problems that when the brightness of the image changes violently, the traditional focusing evaluation function cannot be suitable for various types of focusing images, the calculation speed is low, and the accuracy is low. According to the method, the accuracy of detection of the deep learning model is improved through the constructed first focusing evaluation function and the second focusing evaluation function; then, the method provided by the invention can rapidly judge the focusing state (in focus or out of focus) of the current image by using only one image by using the deep learning model, and can further judge the out-of-focus type (upper focal plane or lower focal plane and the like).

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

FIG. 1 is a flow chart of a method for rapidly detecting the position of an optimal focal plane according to the present invention;

FIG. 2 is a schematic structural diagram of a depth residual error network model according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a residual error learning unit according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In addition, the technical solutions in the embodiments of the present invention may be combined with each other, but it must be based on the realization of those skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination of technical solutions should not be considered to exist, and is not within the protection scope of the present invention.

The present embodiment provides a method for rapidly detecting an optimal focal plane position, as shown in fig. 1, including:

101, acquiring N original images, wherein N is a positive integer;

The method adopted in the embodiment is to convert the problem of judging the optimal focal plane position into the problem of classifying images of different focal planes, and combine the artificial intelligence technology to quickly and accurately detect the focusing state and the defocusing type of the images. The classification types in this embodiment include three classifications, i.e., an upper focal plane, a lower focal plane and an in-focal plane (i.e., an optimal focal plane), but the method of the present invention is not limited to detecting these three classifications, and may also include a left focal plane, a right focal plane, and so on.

The method of the embodiment specifically comprises the following steps: in the training stage, a large amount of sample data is collected in advance (namely, mass images on different focal planes are obtained), a data set is obtained through preprocessing, and the built deep learning model is trained through the data set to obtain an optimal focal plane position detection model; an image to be measured is given and preprocessed, and then the image is input into a trained deep learning model, so that the image is in a focusing or defocusing state and is in a defocusing type (an upper focal plane or a lower focal plane). The method of the embodiment rapidly judges the quality of the current image through the absolute definition of the image, outputs the focusing state and the defocusing type of the current image, further rapidly refocuses the blurred image, and accurately outputs the optimal focal plane position of the shooting system.

In 101, the original image includes an image taken at a best focus plane position of the photographing system and an image taken at an out-of-focus plane position of the photographing system. The image taken at the best focus plane position is a sharp original image, and the image taken at the out-of-focus plane position is a blurred original image, and the out-of-focus plane includes an upper focal plane and a lower focal plane in this embodiment, and the larger the deviation value from the best focus plane, the more blurred the image.

The original image is acquired by an image acquisition system, which can be any imaging device with adjustable focal length. For example: digital optical microscopes, network surveillance cameras, etc. The present embodiment employs a network monitoring camera.

And adjusting the focal length of the image acquisition system by adopting a fixed step length according to respective focusing modes of the image acquisition system, and further acquiring original images under different focal lengths. And acquiring the same number of original images at each non-focal plane position.

Furthermore, the acquired raw image comprises a large number of samples taken in different environments, situations and scenes.

In this embodiment, 20000 original images are acquired, including 10000 clear original images and 10000 blurred original images.

In 102, the first focus evaluation function and the second focus evaluation function are constructed based on a gray scale variance method.

In this embodiment, the focus evaluation functions (the first focus evaluation function and the second focus evaluation function) are used to pre-process the image, so that the deep learning model can learn the image features of a more complex structure, and based on this, the focus evaluation function adopted in this embodiment should have easy calculation and can extract the underlying features (edges and contour information) in the image.

The gray variance method is to use the gray variation of an image as the basis of focus evaluation, and the basis is that when the image is completely focused, the image is clearest, the high-frequency components in the image are the most, and the contrast variation of the image is obvious in a spatial domain, that is, the gray variation of the adjacent pixels of the image is large.

The gray scale variance method has good calculation performance, can fully extract the edge and contour information in the image, and integrates the calculation speed and the evaluation accuracy, and is an image definition evaluation method widely applied in the traditional method.

The image focusing function based on the gray scale variance method has the disadvantages that when the brightness change of an image is uniform, the difference between data values obtained by calculating by the gray scale variance method is small, the unimodal requirement of the image definition evaluation function is not met, namely the sensitivity is not enough near the focus, and the accuracy of evaluating the image definition with uniform gray scale change is not high. Therefore, the present embodiment adopts an improved gray variance method as a focus evaluation function of image sharpness.

When the brightness change of the image is uniform, the difference between the data values calculated by the gray variance method is small and the unimodal requirement of the image focusing function is not met, so that the focusing effect is poor and the definition degree of the images at different focal plane positions cannot be obviously represented. Therefore, this embodiment proposes a first focus evaluation function constructed based on a gray variance method, where the calculation method includes, for a 1-neighborhood region of each pixel f (x, y), calculating a median of a gray difference between two pixels on the left and right of the pixel f (x, y) in the x direction, calculating a median of a gray difference between two pixels on the top and bottom in the y direction, and then calculating a product of the two medias, where the first focus evaluation function is defined as follows:

in the formula, f (x +1, y) represents the gray value of the pixel at the x +1 th row and the y-th column of the image; f (x-1, y) represents the gray value of the pixel of the x-1 th row and the y-th column of the image; f (x, y +1) represents the gray value of the pixel at the x row and the y +1 column of the image; f (x, y-1) represents the gray value of the pixel at the x row and the y-1 column of the image; q and P represent the number of rows and columns of the image;

compared with the traditional gray difference method, the first focusing evaluation function provided by the embodiment calculates the product of the median values of the gray difference values in the x direction and the y direction in the field of each pixel 1, so that the focusing evaluation function is steep near an extreme point, namely, the image focusing evaluation function has better unimodal performance and real-time performance, and the noise interference in the image can be further avoided by adopting the median values of the gray difference values, namely, the anti-noise effect is good.

According to the image focusing function based on the gray scale variance method, when the brightness change of an image is severe, the difference between data values obtained by calculation of the gray scale variance method is large, and the image definition evaluation function has a plurality of peak values, so that the focus position is not easy to find out quickly and accurately. Based on this, it is considered that when the gradation change of an image is severe, the edge of the image becomes sharper, and the image also becomes sharper. Therefore, another second focusing evaluation function constructed based on the gray variance method is proposed in this embodiment, and the calculation method is as follows: for each pixel point P (x, y), drawing a circular area with the radius of 2 pixels by taking the point P (x, y) as the center of a circle, taking 8 pixel points on the circumference of the circular area as a neighborhood pixel point set { P } of the point P (x, y), and carrying out distance weighting on the gray level change of 8 neighborhood of the circular pixel, wherein the distance depends on the weight, and the weight is small when the distance is long. Therefore, in the present embodiment, the distances from the 8 pixels on the circumference of the circular area to the point P (x, y) are equal, and all the weight values are 1. The second focus merit function is defined as follows:

in the formula, I (x, y) represents a pixel in a pixel point set { P }, the pixel point set { P } represents a circular area with radius of r pixels drawn by taking the pixel point P (x, y) as a circle center, and the field pixel point set of the pixel point P (x, y) is formed by n pixel points on the circumference of the circular area; dx represents the distance increment of the two pixel points in the x direction; q and P represent the number of rows and columns of the image.

The second focus evaluation function proposed in this embodiment describes the sharpness of an image by using the change rate of the gray scale. When the brightness of the image changes violently, the sum of the gray scale change rates of each pixel point in the 8-neighborhood pixel point set to the central pixel point is calculated, the change rate of the gray scale reflects the edge sharpness information of the image, and the clearer the edge of the image is, the clearer the image is.

In addition, the second focusing evaluation function not only has the characteristics of being easy to realize and fast calculating the spatial domain gray scale parameters, but also has the characteristics of being sensitive to the change of the image gray scale distribution difference, and can fast and accurately evaluate the definition of various digital images.

Then, each of the 20000 original images acquired at 101 is preprocessed by the first and second focus evaluation functions constructed in this embodiment, and all the preprocessed images constitute a feature data set X. Finally, the preprocessed 20000 images are labeled, and the clear original image is labeled 0, which indicates that the image is taken at the best focal plane position. The blurred original images are marked as 1 or-1, specifically, for the blurred original images collected by the defocused plane, the blurred original image shot at the position of the upper focal plane is marked as-1, the blurred original image shot at the position of the lower focal plane is marked as 1, and the mark of each original image forms a label set Y ═ Y¹,y²,......,y^NIn which y is^NAnd the mark representing the Nth image takes the value of 0, 1 or-1.

In 103, the deep learning model is a deep residual error network model, and sequentially includes:

the input layer is used for inputting the characteristic image processed by the first focusing evaluation function and the second focusing evaluation function into the depth residual error network model;

the characteristic extraction layer is used for extracting the characteristics of the input characteristic image and reducing the dimensionality of the characteristics;

and the output layer is used for classifying the image characteristics to output the label value.

The feature extraction layer sequentially comprises:

and the convolution layer is used for extracting the features of the image on each layer. Firstly, the color and brightness characteristics of the bottom layer of the image, secondly, the local detail characteristics of edges, angular points, straight lines and the like in the image, and finally, the complex structure information of image texture, set shape and the like;

and the maximum pooling layer is used for reducing the dimension of the image characteristics and reducing the model parameters. The method for calculating the maximum value of a certain region of the image is adopted, the maximum value is used for replacing the image region, the reduction of the characteristic dimension of the image and the reduction of model parameters are realized, the model calculation speed is improved, and meanwhile, the image is not sensitive to translation and rotation;

and a residual learning unit for learning residual representation between the input and the output based on the parameter layers to improve the accuracy of the model. By using a plurality of parameter layers to learn residual representation between input and output, the network can keep the accuracy of the model under the condition of depth increase, and simultaneously reduce the training time of the model;

and the average pooling layer is used for reducing the size of the image and reducing the size of the model. The method for calculating the average value of a certain area of the image is adopted, the average value is used for replacing the image area, the size of the image is reduced, the size of the model is reduced, the calculation speed is improved, and the robustness of the translation and rotation of the image is good;

the deep learning model used in this embodiment is a deep residual error network model, as shown in fig. 2. In the present embodiment, the input size of the input layer of the deep learning model is 224 × 224, and the output layer is the softmax layer.

In this embodiment, the feature extraction layer of the depth residual error network model sequentially includes:

a: convolution layer, convolution kernel size 77, step size 2;

b: the maximum pooling layer, the convolution kernel size is 3 multiplied by 3, and the step length is 2;

c: 8 residual error learning units, wherein the 8 residual error learning units comprise 16 convolutional layers in total, and each two convolutional layers are added with a short circuit connection (short connection).

The unit consisting of two convolutional layers + short-circuited connections is called the residual learning unit, as shown in fig. 3. The convolution layer in each residual learning unit adopts a convolution kernel of 3 multiplied by 3, and the number of feature maps output by 8 residual learning units is 64, 128, 256, 512 and 512 in sequence.

In this embodiment, all the short-circuit connections use Identity Mapping (Identity Mapping), that is, when the input and output dimensions are consistent, the input may be directly added to the output, and when the dimensions are inconsistent (that is, when the corresponding dimensions are doubled), the dimensions are increased by zero-padding.

D: average pooling layer, convolution kernel size 7 × 7, step size 1, output feature dimension 1000.

In this embodiment, the training process of the deep learning model is as follows:

randomly disordering the data sequence in the data set { X, Y } and inputting the built deep learning model;

setting an initial learning rate of the deep learning model and starting training;

and when the training error does not decrease any more, resetting the learning rate of the deep learning model and setting the training iteration number to determine the weight parameters of the deep learning model and the mapping relation from input to output.

The specific training process in this embodiment is: the deep learning model was trained in a TensorFlow environment installed on the Ubuntu16.0 system. The specific method comprises the following steps: and (3) randomly disordering the data sequence in the data set { X, Y }, inputting the data sequence into a deep learning model, and obtaining a trained network model after operations such as convolution, maximum pooling, a plurality of residual learning units, average pooling and the like. In the training process, the initial learning rate is set to be 0.1, when the error is not reduced, the learning rate is reduced to be 0.01, and the iteration number is set to be 60 multiplied by 10⁴And adopting random horizontal mirror image, subtracting the mean value, training by adopting an SGD optimization algorithm, and determining the weight parameters of the deep learning model and the mapping relation from input to output.

The trained deep learning model can be used for detecting the definition of an image, namely detecting the focusing state of the image (in-focus or out-of-focus), and further judging the type of out-of-focus (upper focus surface or lower focus surface).

In 104, the specific process of step 104 is:

1041 preprocessing the image to be detected based on the first focus evaluation function and the second focus evaluation function;

1042 inputting the preprocessed image into the trained deep learning model for detection, and if the output label value is 1, the input image is located in the lower focal plane; if the output label value is-1, the input image is positioned on the upper focal plane; if the output label value is 0, the input image is located at the optimal focal plane position;

1043 outputting images with label values of-1 and 1, controlling the shooting system to move according to the output result to obtain the images again, inputting the images into the trained deep learning model again for detection after preprocessing until the output label value is 0, and recording the position of the shooting system when the label value is 0.

This embodiment also provides a best focal plane position rapid detection device, including:

the image acquisition module is used for acquiring an original image;

The present embodiment also provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the method when executing the computer program.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for rapidly detecting the position of an optimal focal plane is characterized by comprising the following steps:

101 acquisitionNThe original image is printed on the original image,Nis a positive integer;

102 preprocessing each original image based on the constructed first focusing evaluation function and the second focusing evaluation function, marking the preprocessed images, and obtaining a data setX，Y}，XIn order to be able to characterize the data set,Yis a label set; the first focus merit function is:

in the formula (I), the compound is shown in the specification,

representing the gray value of the pixel of the x +1 th row and the y-th column of the image;

representing the gray value of the pixel of the x-1 row and the y column of the image;

representing the gray value of the pixel of the x-th row and the y + 1-th column of the image;

representing the gray value of the pixel of the x-th row and the y-1 th column of the image;QandPthe number of rows and columns representing the image;

the second focus merit function is:

in the formula (I), the compound is shown in the specification,

representing a set of pixel points

A pixel, a collection of pixels

Representing by pixel points

Drawing a circular area with radius of r pixels for the circle center, wherein the pixel point is formed by n pixel points on the circumference of the circular area

The domain pixel point set of (1);dxrepresenting the distance increment of two pixel points in the x direction;QandPthe number of rows and columns representing the image;

103 using a data setX，YTraining the built deep learning model to determine the weight parameters of the deep learning model and the mapping relation from input to output;

2. The method for rapidly detecting a best focus plane position as claimed in claim 1, wherein said original image comprises an image taken at a best focus plane position and an image taken at an out-of-focus plane position.

3. The method for rapidly detecting the position of the best focal plane according to claim 1, wherein in the step 102, the first focus evaluation function and the second focus evaluation function are constructed based on a gray variance method.

4. The method for rapidly detecting the position of the best focal plane according to claim 1, wherein the deep learning model is a deep residual error network model, and sequentially comprises:

5. The method for rapidly detecting the position of the best focal plane according to claim 4, wherein the feature extraction layer sequentially comprises:

the convolution layer is used for extracting the features of the image on each layer;

the maximum pooling layer is used for reducing the dimension of the image characteristics and reducing the model parameters;

a plurality of residual learning units that learn residual representations between input and output based on a plurality of parameter layers to improve accuracy of the model;

and the average pooling layer is used for reducing the size of the image and reducing the size of the model.

6. The method for rapidly detecting the position of the best focal plane according to claim 5, wherein the training process of the deep learning model comprises the following steps:

mapping a data setX，YThe data order inRandomly disorganizing and inputting the built deep learning model;

7. The method for rapidly detecting the position of the best focal plane according to claim 1, wherein the 104 specifically comprises:

8. A device for rapidly detecting the position of an optimal focal plane, comprising:

the image acquisition module is used for acquiring an original image;

an image processing module for marking the original image, preprocessing each original image based on the constructed first and second focus evaluation functions to obtain a data setX，Y}，XIn order to be able to characterize the data set,Yis a label set; the first focus merit function is:

in the formula (I), the compound is shown in the specification,

the second focus merit function is:

in the formula (I), the compound is shown in the specification,

representing a set of pixel points

A pixel, a collection of pixels

Representing by pixel points

a model training module for using the data setX，YTraining the built deep learning model to determine the weight parameters of the deep learning model and the mapping relation from input to output;

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program implements the steps of the method of any of claims 1 to 7.