CN112508171A

CN112508171A - Image depth estimation method and device based on multilayer convolutional neural network

Info

Publication number: CN112508171A
Application number: CN202011320209.8A
Authority: CN
Inventors: 乔霈; 李德源; 牛蒙青
Original assignee: China Institute for Radiation Protection
Current assignee: China Institute for Radiation Protection
Priority date: 2020-11-23
Filing date: 2020-11-23
Publication date: 2021-03-16

Abstract

The invention discloses an image depth estimation method and device based on a multilayer convolutional neural network, wherein the method comprises the following steps: s100, acquiring a plurality of light field images of a physical model in a three-dimensional scene through a light field camera to form an image data set; s200, preprocessing a light field image in the image data set; s300, based on the preprocessed image data set, model training and feature learning are carried out on the multilayer convolutional neural network, and depth information of each image in the image data set is obtained. The invention designs a novel multilayer convolution neural network, and the depth information of the image is obtained by the method, so that the three-dimensional dose distribution information of the ray in the tissue equivalent material can be more accurately measured.

Description

Image depth estimation method and device based on multilayer convolutional neural network

Technical Field

The invention relates to the technical field of image depth information, in particular to an image depth estimation method and device based on a multilayer convolutional neural network.

Background

At present, in many aspects of the nuclear industry, such as research on radiation detection, influence of radiation on human body, and the like, three-dimensional dose measurement is required to provide data support. Radiotherapy is widely applied to treatment of human diseases such as cancer, dosage estimation is crucial in the radiotherapy process, and accurate dosage estimation can avoid the risk of over-dosage irradiation of a patient. The light field imaging technology is used as a branch of modern image measurement and is widely applied to the fields of industrial manufacturing, machine vision and the like. And acquiring a light field image by using a light field imaging technology, and performing inversion by using a depth estimation algorithm to further acquire three-dimensional distribution information of the light field. Therefore, the depth information acquisition of the image is important in the three-dimensional dose measurement task.

Currently, depth estimation methods are mainly divided into two main categories: an active depth estimation method and a passive depth estimation method. The active depth estimation method is to control a camera system and an imaging environment according to scene information, and then, to obtain the scene information. The method has high accuracy in obtaining depth information, but is generally not widely used due to the large size and high cost of a measurement system. The passive depth estimation method is different from the active method greatly, the best shooting angle of each pixel point is found when a picture is shot by adjusting the aperture of the lens of the camera, then the definition of each pixel is calculated according to the imaging principle of the camera, a full-focus image is formed by the clear pixel points, the depth estimation is carried out on the image, and finally the optimization processing is carried out to obtain the scene depth information of the image. The process of this method is complicated. Therefore, in order to further reduce the complexity of the EPI depth estimation algorithm and improve the resolution of image reconstruction, an image depth estimation algorithm needs to be considered comprehensively from the aspects of operation process, accuracy, cost and the like to reconstruct an optical field image, and finally the measurement result of three-dimensional dose is improved.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide an image depth estimation method and device based on a multilayer convolutional neural network, which designs a novel multilayer convolutional neural network for light field image depth estimation, obtains the depth information of an image and further can more accurately measure the three-dimensional dose distribution information of rays in tissue equivalent materials.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

an image depth estimation method based on a multilayer convolutional neural network comprises the following steps:

s100, acquiring a plurality of light field images of a physical model in a three-dimensional scene through a light field camera to form an image data set;

s200, preprocessing a light field image in the image data set;

s300, based on the preprocessed image data set, model training and feature learning are carried out on the multilayer convolutional neural network, and depth information of each image in the image data set is obtained.

Further, as above, in the image depth estimation method based on the multilayer convolutional neural network, S100 includes:

s101, when a scintillator in the physical model is subjected to energy deposition of rays of a radiation source, luminescence information is generated, and the light yield is in direct proportion to the energy deposition of the rays;

s102, the light field camera collects the luminescence information generated by the physical model, and reconstructs a light field image of three-dimensional emission light formed after the physical model is radiated according to the collected luminescence information to form an image data set.

Further, as above, the light field camera includes: the image sensor is aligned to the micro-lens array, the lens of the main lens is aligned to the physical model, and each micro-lens in the micro-lens array covers a plurality of sensor pixels.

Further, in the image depth estimation method based on the multilayer convolutional neural network as described above, S200 includes:

s201, extracting polar line diagram area blocks in the horizontal direction and the vertical direction corresponding to a plurality of pixel points from each light field image in the image data set;

s202, removing the polar line diagram area blocks with unclear textures from the extracted polar line diagram area blocks;

s203, carrying out balancing processing on the screened image data set to enable the number of training samples with different characteristics to be basically the same.

Further, in the image depth estimation method based on the multilayer convolutional neural network as described above, S300 includes:

s301, respectively training polar line diagram region blocks in the horizontal direction and the vertical direction based on the preprocessed image data set and the multilayer convolutional neural network, and respectively extracting feature vectors in the horizontal direction and the vertical direction of the image in the image data set;

s302, receiving the feature vectors output by the multilayer convolutional neural network through a softmax function, and acquiring the depth information of each image in the image data set.

Further, as above, the image depth estimation method based on the multilayer convolutional neural network includes two identical sub-networks, which are respectively used for training the pole line map region blocks in the horizontal direction and the vertical direction, and extracting the feature vectors in the horizontal direction and the vertical direction of the image in the image data set;

each sub-network consists of 7 convolutional layers, a pooling layer and a full-connection layer which are connected in sequence, the size of a convolutional kernel is 2 x 2, the number of first layer convolutional kernels is 16, the number of second layer convolutional kernels is 32, the convolutional kernels are doubled layer by layer, the number of seventh layer convolutional kernels is 1024, the convolutional layers are used for extracting the characteristics of an input image, the pooling layer is used for reducing the dimension of the extracted characteristics, and the full-connection layer is used for integrating the output information of the convolutional layers and the pooling layer and converting the output information into characteristic vectors.

The embodiment of the invention also provides an image depth estimation device based on the multilayer convolutional neural network, which comprises the following steps:

a first acquisition module for acquiring a plurality of light field images of a physical model in a three-dimensional scene by a light field camera to form an image dataset;

a preprocessing module for preprocessing a light field image in the image dataset;

and the second acquisition module is used for carrying out model training and feature learning on the multilayer convolutional neural network based on the preprocessed image data set so as to acquire the depth information of each image in the image data set.

Further, as above, in the image depth estimation apparatus based on a multilayer convolutional neural network, the first obtaining module is configured to:

when the scintillator in the physical model is subjected to energy deposition of rays of a radiation source, luminous information is generated, and the light yield is in direct proportion to the energy deposition of the rays;

the light field camera collects the luminescence information generated by the physical model, and reconstructs a light field image of three-dimensional emission light formed after the physical model is radiated according to the collected luminescence information to form an image data set.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, implements the aforementioned multilayer convolutional neural network-based image depth estimation method.

An image depth estimation apparatus based on a multilayer convolutional neural network, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the aforementioned multi-layer convolutional neural network-based image depth estimation method via execution of the executable instructions.

The invention has the beneficial effects that: according to the method and the device provided by the invention, the multilayer convolutional neural network is adopted to carry out light field image depth estimation, the traditional image depth calculation problem is converted into a classification problem, the image characteristics are learned from the artificial intelligence perspective, and the high resolution of image acquisition and the accuracy of image depth information are met. The image depth information estimation is an important link for three-dimensional image reconstruction, and can provide technical support for the field of scintillator-based three-dimensional dose distribution measurement.

Drawings

Fig. 1 is a schematic flowchart of an image depth estimation method based on a multilayer convolutional neural network according to an embodiment of the present invention;

FIG. 2 is an imaging schematic of a focusing light field camera provided in an embodiment of the present invention;

FIG. 3 is a technical route diagram of a light field image depth estimation method provided in an embodiment of the present invention;

FIG. 4 is a white image taken by a light field camera provided in an embodiment of the present invention under fixed camera parameters;

FIG. 5 is a schematic diagram of a model of a multi-layer convolutional neural network provided in an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an image depth estimation apparatus based on a multilayer convolutional neural network according to an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and the detailed description.

As shown in fig. 1, the present embodiment provides an image depth estimation method based on a multilayer convolutional neural network, including:

s100 includes:

In this embodiment, the light field camera includes: the image sensor is aligned to the micro-lens array, the lens of the main lens is aligned to the physical model, and each micro-lens in the micro-lens array covers a plurality of sensor pixels. In image acquisition, in order to obtain an image with high definition, a focusing light field camera is used to acquire the image, the imaging principle is as shown in fig. 2, light rays in all directions of an object are converged onto micro lenses through a main lens, and each micro lens obtains an image in an area on the image surface of the main lens. Due to different directions, the light is scattered to a sensor pixel behind the micro lens, so that secondary imaging of the scene on the image surface of the main lens is realized. a. b is the distance from the micro lens to the image plane of the main lens and the distance from the micro lens to the sensor, so that the directional resolution of the light field camera to the light field is a/b, a and b meet the basic imaging formula of the lens, and the directional sampling of the light field is realized by adjusting parameters a and b.

Before shooting, the fixed parameters of the light field camera can be adjusted to optimize the focal length. Images of different scenes are acquired as an image dataset. Since the points of the image on the polar plane can all be projected onto the corresponding polar lines, the information in the image polar plane can be contained in the corresponding polar line sequence. When analyzing the image sequence, in order to process the information collectively, it is necessary to arrange the epipolar line sequence pairs in a specific order to form a new image, which is an EPI and covers all the feature information in the image polar plane.

S200, preprocessing the light field image in the image data set.

S200 comprises the following steps:

s201, extracting polar line diagram area blocks in the horizontal direction and the vertical direction corresponding to a plurality of pixel points from each light field image in an image data set;

Specifically, the image data set may be divided into M parts in proportion, where the number of training samples in each part is the same, where N parts are used as training sets, M-N parts are used as test sets, M, N is a positive integer, and M is greater than N.

After the light field Image is obtained based on S100, in this step, the light field Image needs to be converted into a polar diagram, specifically, a horizontal polar diagram (EPI Image, abbreviated as EPI) Image block and a vertical EPI Image block at a certain pixel point in the Image may be respectively obtained through computer programming, so as to form an orthogonal EPI representation of the light field Image, which is used as a training data set for deep learning. The light field image is converted into an EPI image block, the EPI image is used as a medium, and the multi-dimensional light field data is mapped onto the two-dimensional image, so that the data dimension reduction effect is achieved. Invalid data in the image data set, namely invalid region blocks with unclear textures, need to be removed to avoid the influence on experimental results due to too much noise, and region blocks with undetected edge information are removed by adopting a Canny edge detection algorithm. Because the convolutional neural network is sensitive to whether the data set is balanced or not, a network model trained by an unbalanced data set is often poor in classification effect. Therefore, it is necessary to balance the data, so that the data of different label classes are distributed close to the average, and the parallelism of the sample set, i.e. the number of training samples of different features, is kept substantially the same. The data set may then be scaled into 5 shares (keeping the class scale the same in each share), 4 shares as a training set, and 1 share as a test set.

S300 comprises the following steps:

As shown in fig. 3, the multilayer convolutional neural network includes two identical sub-networks, which are respectively used for training the polar line map region blocks in the horizontal direction and the vertical direction, and extracting the feature vectors in the horizontal direction and the vertical direction of the image in the image data set; each sub-network consists of 7 convolutional layers, a pooling layer and a full-connection layer which are sequentially connected, the size of a convolutional kernel is 2 x 2, the number of the first layer of convolutional kernels is 16, the number of the second layer of convolutional kernels is 32, the convolutional kernels are doubled layer by layer, the number of the seventh layer of convolutional kernels is 1024, the convolutional layers are used for extracting the characteristics of an input image, the pooling layer is used for reducing the dimension of the extracted characteristics, and the full-connection layer is used for integrating the output information of the convolutional layers and the pooling layer and converting the output information into a characteristic vector.

Convolutional Neural Networks (CNNs) are a deep learning model connected by multi-layer neurons, the construction of which is inspired by the processing mechanisms of the human visual system. And carrying out convolution operation on the input training set image by using the neurons shared by the weight values. The general CNN model comprises a convolution layer, a pooling layer and a full-connection layer. The convolutional layers are mainly used for extracting the features of an input image, the image features are represented by feature maps, each feature map is obtained by convolution of one convolution kernel and the previous layer, each feature map corresponds to one convolution kernel, each convolutional layer can generate a plurality of feature maps, and abstract features of more images are extracted by adopting different convolution kernels. The pooling layer is mainly used for reducing the dimension of the extracted features, and the calculated amount of the whole neural network model is reduced through maximum pooling or average pooling, so that more abstract and more robust image features are extracted. The full-connection layer is of a multilayer perceptron structure, and is mainly used for classifying the operation results of input data passing through the convolution layer and the pooling layer, and outputting probability distribution of different categories by calculating classification scores.

Fig. 3 shows an overall technical route diagram of this embodiment, in this embodiment, a light field image is obtained by using a light field camera with a microlens array of 4 × 4, and a focal length of the light field camera is adjusted to enable the light field camera to accurately focus, so that a blank image shown in fig. 4 is obtained. And keeping the parameters of the camera unchanged, and transforming the scene to obtain more than 200 light field images. The resolution of each sub-image is 512 × 512, and the size of the EPI image area corresponding to each pixel point on the central sub-image in the horizontal direction and the vertical direction is 512 × 9. And (3) extracting the horizontal direction EPI region blocks and the vertical direction EPI region blocks corresponding to the center view points (x, y), wherein a large number of EPI regions can be extracted from each image, and the requirement of deep learning training data sets on the number is met. And removing invalid EPI area blocks, collectively referring the area blocks with unclear texture as invalid areas, and removing the area blocks without detected edge information by adopting a Canny edge detection algorithm. And finally, balancing the whole data set.

In particular, the depth range of the light field image is determined to be [ m, n ] according to the parameter distribution of the training data set]And the spacing of two adjacent fields is set to d, whereby the depth can be divided into

And (4) class. The structure of the multilayer convolutional neural network model is shown in fig. 5. The EPI area blocks in the horizontal direction and the vertical direction are respectively input into two sub-networks as a pair of features, and each feature vector is output. Finally, feature vectors from the two sub-networks are received by utilizing a softmax function, and prediction is made on the depth information of the image.

In this embodiment, the related training parameters of the multilayer convolutional neural network model are set as follows: the value of Batchsize (which refers to the number of samples selected for one training) is 64, the learning rate is set to 0.01, the learning rate reduction factor is set to 0.99, dropout is set to 0.5, and the number of iterations is 10.

The invention designs a novel multilayer convolutional neural network structure, and model training and feature learning are carried out by taking a light field image as a data set, so that the depth information of the image is finally obtained. The three-dimensional dose measuring system formed by the tissue equivalent scintillating material and the light field camera can accurately measure the three-dimensional dose distribution information of the rays in the tissue equivalent material, and provides more accurate reference for dose estimation in radiotherapy. The depth estimation algorithm has a large influence on the longitudinal resolution of the image, so that the accuracy of three-dimensional dose distribution information is influenced, and therefore, the research of the depth estimation algorithm has important significance in a three-dimensional dose measurement system.

As shown in fig. 6, the present embodiment further provides an image depth estimation apparatus based on a multilayer convolutional neural network, including:

a first acquisition module 100 for acquiring a plurality of light field images of a physical model in a three-dimensional scene by a light field camera to form an image dataset;

a preprocessing module 200 for preprocessing the light field image in the image data set;

the second obtaining module 300 is configured to perform model training and feature learning on the multilayer convolutional neural network based on the preprocessed image data set, and obtain depth information of each image in the image data set.

The first obtaining module 100 is configured to:

when the scintillator in the physical model is subjected to energy deposition of rays of the radiation source, luminous information is generated, and the light yield is in direct proportion to the energy deposition of the rays;

The light field camera includes: the image sensor is aligned to the micro-lens array, the lens of the main lens is aligned to the physical model, and each micro-lens in the micro-lens array covers a plurality of sensor pixels.

In this embodiment, an image depth estimation apparatus based on a multilayer convolutional neural network may be a computer, and includes: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the aforementioned method of determining potentially important information for a patient via execution of the executable instructions. The memory and the processor may be connected by a bus. The memory unit may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM) and/or a cache memory unit, and may further include a read only memory unit (ROM). The computer also includes a display unit connected to the bus. The display unit can display the potentially important information of the patient and the like.

The present embodiment also provides a computer-readable storage medium on which a computer program is stored, which, when executed by the above-mentioned processor, implements the aforementioned image depth estimation method.

It should be noted that, in the present embodiment, the image depth estimation method based on the multilayer convolutional neural network and the image depth estimation device based on the multilayer convolutional neural network are the same inventive concept, and details about the function of the image depth estimation device may be found in the embodiment of the image depth estimation method.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is intended to include such modifications and variations.

Claims

1. An image depth estimation method based on a multilayer convolutional neural network is characterized by comprising the following steps:

s200, preprocessing a light field image in the image data set;

2. The image depth estimation method based on the multilayer convolutional neural network of claim 1, wherein S100 comprises:

3. The multilayer convolutional neural network-based image depth estimation method of claim 1, wherein the light field camera comprises: the image sensor is aligned to the micro-lens array, the lens of the main lens is aligned to the physical model, and each micro-lens in the micro-lens array covers a plurality of sensor pixels.

4. The image depth estimation method based on the multilayer convolutional neural network of claim 1, wherein S200 comprises:

5. The image depth estimation method based on the multilayer convolutional neural network of claim 4, wherein S300 comprises:

6. The method according to claim 5, wherein the multi-layer convolutional neural network comprises two identical sub-networks for training horizontal and vertical polar line map region blocks, respectively, and extracting horizontal and vertical feature vectors of the images in the image data set;

7. An image depth estimation device based on a multilayer convolutional neural network, comprising:

8. The apparatus according to claim 7, wherein the first obtaining module is configured to:

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the multi-layer convolutional neural network-based image depth estimation method of any one of claims 1 to 6.

10. An image depth estimation device based on a multilayer convolutional neural network, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of image depth estimation based on multilayer convolutional neural network of any of claims 1-6 via execution of the executable instructions.