CN113450396B

CN113450396B - Three-dimensional/two-dimensional image registration method and device based on bone characteristics

Info

Publication number: CN113450396B
Application number: CN202110682717.9A
Authority: CN
Inventors: 付天宇; 杨健; 范敬凡; 宋红; 胡国语
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2023-05-30
Anticipated expiration: 2041-06-17
Also published as: CN113450396A

Abstract

The three-dimensional/two-dimensional image registration method and device based on the skeleton features have small occupied resources, can be used for back propagation optimization of registration parameters, solve the problem that an image similarity function is not convex or concave in the three-dimensional/two-dimensional registration process, enlarge the initial registration value range, realize the registration of the three-dimensional/two-dimensional skeleton features in the TIPS operation, and improve the registration precision. The method comprises the following steps: (1) preprocessing the segmented three-dimensional bone data; (2) Generating a large amount of simulation data for subsequent training and testing; (3) The three-dimensional data is processed by a three-dimensional convolution network jump connection module to obtain parameterized three-dimensional characteristic data, and the superimposed original image and the three-dimensional characteristic data are projected to two dimensions by a differential rendering module based on grid sampling; (4) Calculating similarity measure between the two-dimensional decoding network and the target image by using the two-dimensional decoding network; in order for the similarity measure to have the property of a concave function in European space, a gradient of geodesic distance of a rigid transformation parameter is used as training target.

Description

Three-dimensional/two-dimensional image registration method and device based on bone characteristics

Technical Field

The invention relates to the technical field of medical image processing, in particular to a three-dimensional/two-dimensional image registration method based on skeleton characteristics and a three-dimensional/two-dimensional image registration device based on skeleton characteristics.

Background

In the field of three-dimensional/two-dimensional registration, in the iterative process of the traditional registration method, re-projection and similarity measure calculation are required to be repeatedly carried out, so that excessive calculation resources are caused, and iterative optimization of parameters of a network model is inconvenient. Moreover, the similarity measure has a great limitation in solving the cross-modal problem due to the presence of the multipole problem.

Disclosure of Invention

In order to overcome the defects of the prior art, the technical problem to be solved by the invention is to provide a three-dimensional/two-dimensional image registration method based on skeleton features, which occupies small resources and can be used for counter-propagating and optimizing registration parameters, so that the problem that an image similarity function is not convex or concave in the three-dimensional/two-dimensional registration process is solved, the registration initial value range is enlarged, the registration of the three-dimensional/two-dimensional skeleton features in a TIPS (tip-in-place) operation is realized, and the method has an effective effect on improving the registration precision.

The technical scheme of the invention is as follows: the three-dimensional/two-dimensional image registration method based on the bone characteristics comprises the following steps:

(1) Preprocessing the segmented three-dimensional bone data, including region-of-interest clipping and downsampling;

(2) Carrying out random translation and rotation on the processed TIPS preoperative CT skeleton data to obtain different initial poses, and then carrying out random translation and rotation again to obtain target poses; projecting the TIPS preoperative CT data of the two poses to generate a large amount of simulation data for subsequent training and testing;

(3) In the training stage, three-dimensional data is firstly processed by a three-dimensional convolution network jump connection module to obtain parameterized three-dimensional characteristic data, and then the superimposed original image and the three-dimensional characteristic data are projected to two dimensions by a differential rendering module based on grid sampling;

(4) According to the two-dimensional image obtained by projection, calculating the similarity measure between the two-dimensional image and the target image by using a two-dimensional decoding network; in order to make the similarity measure obtained by network training have the property of a concave function in European space, a gradient of geodesic distance of a rigid transformation parameter is used as a training target.

Firstly, a traditional three-dimensional/two-dimensional registration frame is built; then constructing a differential rendering module based on grid sampling, wherein the differential rendering module occupies small resources and can be used for back propagation to optimize registration parameters, and replaces rigid transformation and re-projection of the volume data in each iterative process in a registration algorithm; finally, in the training stage of the concave function similarity measure model, the geodesic distance is used as a target, the problem that the image similarity function is not convex or concave in the three-dimensional/two-dimensional registration process is solved, the registration initial value range is enlarged, the registration of three-dimensional/two-dimensional skeleton features in the TIPS operation is realized, and the method has an effective effect on improving the registration precision.

There is also provided a three-dimensional/two-dimensional image registration apparatus based on bone features, comprising:

a data preprocessing module configured to preprocess the segmented three-dimensional bone data, including region of interest clipping and downsampling;

the simulation data generation module is configured to perform random translation and rotation on the processed TIPS preoperative CT skeleton data to obtain different initial poses, and then perform random translation and rotation again to obtain target poses; projecting the TIPS preoperative CT data of the two poses to generate a large amount of simulation data for subsequent training and testing;

the training module is configured to obtain parameterized three-dimensional characteristic data by processing the three-dimensional data through the three-dimensional convolution network jump connection module at the training stage, and then project the superimposed original image and the three-dimensional characteristic data to two dimensions through the differential rendering module based on grid sampling;

the similarity measure calculating module is used for calculating the similarity measure of the two-dimensional image obtained by projection and the target image according to the two-dimensional image obtained by projection by registration; in order to make the similarity measure obtained by network training have the property of a concave function in European space, a gradient of geodesic distance of a rigid transformation parameter is used as a training target.

Drawings

Fig. 1 is a flow chart of a three-dimensional/two-dimensional image registration method based on bone features according to the present invention.

Fig. 2 shows the steps of the covariance matrix adaptive evolution strategy registration algorithm.

Detailed Description

As shown in fig. 1, the three-dimensional/two-dimensional image registration method based on bone features comprises the following steps:

Preferably, in the step (4), two three-dimensional/two-dimensional registration frames are included:

(I) The test data is subjected to iterative optimization through concave function similarity measurement comprising a differentiable rendering module so as to obtain a rigid transformation parameter corresponding to the target image;

(II) optimization using a three-dimensional/two-dimensional registration optimizer, a conventional parameterized registration framework is implemented.

In the process of three-dimensional/two-dimensional matching image registration, the angle of three-dimensional projection needs to be continuously transformed, so that three-dimensional data is projected into a two-dimensional space. Such a process may alsoEquivalent expression is inverse rigid transformation of three-dimensional volume data, i.e. projection with the projection environment fixed, which is adopted by this subsection as T (θ _x ,θ _y ,θ _z ,t _x ,t _y ,t _z )·V _m Simplified expression was performed. Preferably, in the step (3), the image projected to two dimensions is expressed by formula (4.1):

I _m ＝P·T(θ _x ,θ _y ,θ _z ,t _x ,t _y ,t _z )·V _m (4.1)

wherein V_m Is three-dimensional volume data, I _m Is a corresponding image projected to two dimensions, wherein T (alpha), alpha E SE (3) is a three-dimensional rigid transformation matrix, the formulas are (4.2) - (4.5), P is an expression of a 3 x 4 point source projection matrix, and alpha= (theta) _x ,θ _y ,θ _z ,t _x ,t _y ,t _z ) Refer to the rigidity transformation parameters

wherein θ_x ,θ _y ,θ _z The rotation angles around the x coordinate axis, the y coordinate axis and the z coordinate axis are respectively t _x ,t _y ,t _z To translate along three axes, x ₀ ,y ₀ ,z ₀ Is V _m Coordinates of the pixel points on the pixel points, x, y, z are x ₀ ,t ₀ ,z ₀ Coordinates of the rigidly transformed points。

The three-dimensional/two-dimensional registration is aimed at the I after projection _m With reference image (Fixed image) I _f More closely, the key to the problem of registration is to measure I _f and I_m Similarity measure between the two. Preferably, in the step (3),

measurement of I _f and I_m Similarity measure between them, which represents a loss function of formula (4.6)

wherein ,p″_CT Representing the projection of three-dimensional data onto a two-dimensional camera internal reference matrix, the gradient difference (Gradient Difference, GD) and the normalized cross-correlation coefficient (Normalized Cross Correlation, NCC) are similarity measure functions commonly used in conventional registration methods;

gradient difference

Is based on image i _M and i_f The similarity calculated for the gradients of (2) is suitable for processing relatively long and narrow structures, as shown in formulas (4.7), (4.8) and (4.9)

/>

Diff _width and Diff_high Representing the difference in gradients of the two images in the vertical and horizontal directions respectively,

representing the gradient of the image I in the vertical direction, +.>

Representing the gradient in the horizontal direction, s being a gray scale decay factor, sigma _width and σ_high Respectively represent I _f Variance in the vertical and horizontal directions. It is clear from the formula (4.9) that if gradient difference is used as the evaluation function of registration, the floating image and the reference image need to have higher similarity in gradient information such as edges, and are not affected by the gray level distribution of the image;

normalized cross-correlation coefficient

Evaluation of similarity and correlation of two images in pixel distribution, NC and NCC are formulas (4.10) and (4.11)

wherein ,I_f,i and I_m,i Respectively represent I _f and I_m Is used for the pixel value of the (i) th pixel,

and />

Respectively represent I _f and I_m Average pixel value of (a); in the process of calculating the normalized cross-correlation coefficient, the image is required to be subjected to centering treatment to obtain the centered cosine similarity, which is equivalent to the normalized cross-correlation coefficient, and as can be seen from a formula (4.11), the robustness of the similarity measure on gray value difference can be effectively increased by the zero-averaging operation of the data;

the gradient normalized cross-correlation coefficient GradNCC combining the gradient information and the normalized cross-correlation coefficient is the formula (4.12)

wherein ,

and />

Respectively representing a gradient image of the ith pixel value of the two-dimensional image I in the vertical direction and a gradient image of the ith pixel value in the horizontal direction, and (2)>

and />

Representing average pixel values of the gradient image in the vertical direction and the gradient image in the horizontal direction of the two-dimensional image I, respectively. .

The goal of three-dimensional/two-dimensional registration is to have the post-projection I _m With reference image I _f More closely, for the final purpose of rigid registration, the result of solving a globally optimal rigid transformation parameter. The process of solving for optima can be generally divided into two types: and resolving and searching. The optimization of analytical solution is generally to solve a linear equation by specific coordinates or parameters to obtain transformation parameters, such as PnP algorithm. The search solution is generally that an initial value is given, and the optimal solution is gradually approximated in a plurality of iterations with a certain step length. Among the local optimization algorithms commonly used in registration algorithms, the evolution strategies (Evolutionary Strategies, ES) are better performing, and the main idea of such methods is to simulate the principle of biological evolution. The optimization process of iteratively solving the optimal result can also be used for three-dimensional/two-dimensional registration problems.

The most common method in evolutionary strategies is covariance matrix adaptive evolutionPolicy (Covariance matrix adaptation evolution strategy, CMA-ES). The three-dimensional/two-dimensional registration problem is a typical nonlinear optimization problem, and because three-dimensional data is required to be mapped to two dimensions in a projection mode, information of a plurality of three-dimensional data is missing in the two-dimensional data, and the change in unit data cannot find rules on the two-dimensional effect of projection, especially in the case of large translation and rotation transformation ranges. There are many challenges in nonlinear optimization problems, such as non-smoothness of the function (non-leadership, multi-modal, noise), excessive number of parameters, variable-to-variable dependencies, etc. These are problems that conventional optimization algorithms cannot solve, but CMA-ES does not rely on gradient optimization compared to other methods, and thus can solve the above-mentioned complex problems. On the other hand, the gradient-based optimization method requires gradient calculation of the objective function, namely

While the process of improved DRR is designed to be straightforward, it requires very large computing resources, a condition that existing computing resources are difficult to meet.

Preferably, as shown in fig. 2, in the step (3), the evolution strategy implements the spatial search by iterating the following four steps: (a) sampling a new set of candidate solutions; (b) calculating an optimal value for the new candidate solution; (c) selecting an optimal solution based on f (α); (d) updating the parameters according to the selected solution; in the course of the evolutionary algorithm, a complete iteration comprising (a) - (d) is referred to as a generation, each iteration producing new candidate solutions as children for generating the children as parents; for CMA-ES, the average value m according to the current generation after choosing the optimal point in step (c) ^(g-1) Calculating covariance matrix C of next generation ^(g) And at step (4) using the updated average value m ^(g) 。

Differential rendering has strong rendering capability, and can realize reverse drawing (Inverse Rendering) while guaranteeing rendering effect. The three-dimensional scene information required for generating the image is obtained by reversely drawing the information of the two-dimensional image, wherein the information can be one or more of geometry, lamplight, material and visual angle, and even three-dimensional reconstruction is carried out. The common rendering process is to input a three-dimensional scene into a renderer to obtain a two-dimensional image, and different from the common rendering process, the differential rendering which introduces reverse drawing can link the three-dimensional scene and the two-dimensional image, so that the method has become a popular method in recent years.

In the three-dimensional/two-dimensional registration problem, what needs to be solved is the corresponding three-dimensional shooting view angle when a two-dimensional image is shot, but many existing differentiable rendering methods are realized by performing the grid parameterization of the surface-mount processing on the three-dimensional scene, because the grid parameterization provides a relatively proper compromise between spatial resolution and data volume compared with the volumetric representation. For medical applications, in particular data observed during surgery, are mostly obtained based on a volumetric representation method, and each individual pixel has its specific physical properties. But the mesh data cannot represent the internal features and data of each closed surface, so the differentiable rendering of the study volume datamation process is particularly important.

The spatial transformation network proposed by Google deep allows the traditional convolutional neural network to learn characteristics such as clipping, translation, rotation and scaling effectively. In the spatial transformation network architecture, a feature map or an original image is input to a positioning network, which outputs transformation parameters of the image, and a grid generator and a sampler transform the feature map or the original image using the parameters output from the positioning network. Currently, spatial transformation networks have been used to solve the perspective transformation problem of three-dimensional data reconstruction and the estimation problem of three-dimensional registration deformation fields.

Preferably, in the step (3),

for a two-dimensional image I with the size of MxN, in the DRR process based on Ray-casting, the image is formed by integrating MxN rays through volume data to obtain MxN pixels I ^(m,n) Is formula (4.13); each ray is emitted by source, andand K control points G are uniformly distributed on each light ray ^(m,n,k) Representing a continuous integration over the light by discrete integration;

where (m, n) is coordinates on the image, G represents a control point in space;

for the rigid transformation matrix T (alpha), alpha epsilon SE (3), the coordinates of the point source are changed into T (alpha) · [0, 0], the parameter representation of the imaging plane image is changed into T (alpha) · [0, SID,1], and the M multiplied by N multiplied by K control points are subjected to rigid transformation and then subjected to difference calculation, and the transformation result is shown as a formula (4.14):

G′＝f _{Interpolation} (T(α)·G) (4.14)

wherein f_{Interpolation} Representing bilinear difference function, G' is control point after rigid transformation, and the transformed image is formula (4.15); the sampling process enables the parameter to carry out back propagation training through the module in the process of carrying out network training; deriving G' yields equation (4.16):

for the three-dimensional/two-dimensional registration problem, the presence of the non-linearity condition results in a large number of extreme points in the objective function constructed based on the conventional similarity measure, which makes the range of the initial rigid transformation parameters of the registration process a large limitation. The ideal objective function is in the shape of a concave function, so that the objective function can be effectively ensured not to be affected by the limitation of the range of the excessively small initial value in the registration process.

In order to obtain a similarity measure applicable to a larger initial parameter range, the present invention solves this problem using a concave function similarity measure model based on the segmentation results. The network training counter-propagation is realized by designing a differential rendering module based on grid sampling, so that in the integral concave function similarity measure, only the three-dimensional convolutional neural network and the two-dimensional image decoder need to be optimized for learning parameters.

Preferably, in the step (4), the three-dimensional volume data is added with a 6-layer three-dimensional convolutional neural network before being input into the differential rendering module based on grid sampling, the three-dimensional convolutional neural network is composed of five convolutional layers, the convolutional kernel of each convolutional layer is 3 multiplied by 3, the step length is (1, 1), after the three-dimensional data are subjected to convolutional neural network to obtain three-channel characteristic images, adding the original images with the obtained characteristic images through one jump connection, outputting three-dimensional data of three channels by a network, wherein the size of each channel is the same as that of the original images, and obtaining four-dimensional data containing image characteristics and the original images after superposition;

for the projected two-dimensional image I _m Reference image I _f Corresponding feature vectors are obtained through two decoders with the same network structure respectively, and the two decoders do not share parameters in the training process; the decoder is formed by connecting a convolution layer with a convolution kernel of 3 multiplied by 3 and a step length of (1, 1), a residual bottleneck module and a pooling layer, and finally outputting a feature vector through one convolution layer;

obtain I _m and I_f Feature vector E of (2) _m and E_f Then, the concave function similarity measure will be E _m and E_f As output of the network, this is I _m and I_f Is a similarity measure of (1).

Preferably, in the step (4),

for image similarity measure L in three-dimensional/two-dimensional registration process _C ＝MSE(E _m ,E _f ) For the concave function, the network uses the Euclidean space geodesic distance with concave function characteristics as the training objective function, as formula (4.17):

L _C ＝MSE(E _m ,E _f )～L _G (α,α _f ) (4.17)

wherein L_G Represents geodesic distance, alpha, in European space _f Represents I _f The corresponding rigidity transformation parameter is a gold standard in the registration task; in order to make the two functions more similar in terms of trend, the most straightforward way is to make the gradient of the two functions

and />

As close as possible, obtaining a target of network training;

during one iteration of the registration, the output is denoted as I _C (φ；V _m ,I _f α), wherein φ represents a learning parameter required by the network; during the back propagation, the networks respectively generate

and />

Two gradient vectors, the loss function is designed as alpha for L _C and L_G Two gradient vectors of the two>

and />

The distance measures of (2) are the formulas (4.18), (4.19) and (4.20), ensuring that the network parameter phi is not updated during back propagation, and the rotation parameter and the translation parameter are calculated respectively;

wherein v represents a vector, trans represents a translation parameter, rot represents a rotation parameter, M _D Representing a distance measure;

gradient-based optimization of the rigid transformation parameter α during registration testing by back propagation is achieved as equation (4.21)

Optimizing the parameter alpha by iteration and according to I _m Calculate L _C To solve for the registration parameters.

It will be understood by those skilled in the art that all or part of the steps in implementing the above embodiment method may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, where the program when executed includes the steps of the above embodiment method, and the storage medium may be: ROM/RAM, magnetic disks, optical disks, memory cards, etc. Accordingly, the present invention also includes, corresponding to the method of the present invention, a three-dimensional/two-dimensional image registration device based on skeletal features, which is generally represented in the form of functional modules corresponding to the steps of the method. The device comprises:

The present invention is not limited to the preferred embodiments, but can be modified in any way according to the technical principles of the present invention, and all such modifications, equivalent variations and modifications are included in the scope of the present invention.

Claims

1. The three-dimensional/two-dimensional image registration method based on skeleton features is characterized by comprising the following steps of: which comprises the following steps:

(4) According to the two-dimensional image obtained by projection, calculating the similarity measure between the two-dimensional image and the target image by using a two-dimensional decoding network; in order to enable the similarity measure obtained by network training to have the property of a concave function in European space, a gradient of geodesic distance of a rigid transformation parameter is used as a training target;

in the step (4), two three-dimensional/two-dimensional registration frames are included:

(II) optimizing using a three-dimensional/two-dimensional registration optimizer, implementing a traditional parameterized registration framework;

in the step (3), the step of (c),

the projection to two dimensions is expressed by formula (4.1):

I _m ＝P·T(θ _x ，θ _y ，θ _z ，t _x ，t _y ，t _z )·V _m (4.1)

wherein V_m Is three-dimensional volume data, I _m Is a corresponding image projected to two dimensions, wherein T (alpha), alpha E SE (3) is a three-dimensional rigid transformation matrix, the formulas are (4.2) - (4.5), P is an expression of a 3 x 4 point source projection matrix, and alpha= (theta) _x ，θ _y ，θ _z ，t _x ，t _y ，t _z ) Refer to the rigidity transformation parameters

wherein θ_x ，θ _y ，θ _z The rotation angles around the x coordinate axis, the y coordinate axis and the z coordinate axis are respectively t _x ，t _y ，t _z To translate along three axes, x ₀ ，y ₀ ，z ₀ Is V _m Coordinates of the pixel points on the pixel points, x, y, z are x ₀ ，y ₀ ，z ₀ Coordinates of the rigidly transformed points;

in the step (3), the step of (c),

wherein ,p″_CT Representing that three-dimensional data are projected to a two-dimensional camera internal reference matrix, wherein the gradient difference GD and the normalized cross correlation coefficient NCC are similarity measure functions commonly used by a traditional registration method;

gradient difference

Diff _width and Diff_high Representing two images in the vertical and horizontal directions, respectivelyThe difference in the gradient is such that,

representing the gradient of the image I in the vertical direction, +.>

Representing the gradient in the horizontal direction, s being a gray scale decay factor, sigma _width and σ_high Respectively represent I _f Variance in the vertical and horizontal directions;

normalized cross-correlation coefficient

wherein ,I_f，i and I_m，i Respectively represent I _f and I_m Is used for the pixel value of the (i) th pixel,

and />

Respectively represent I _f and I_m Average pixel value of (a);

wherein ,

and />

and />

Average pixel values of the gradient image in the vertical direction and the gradient image in the horizontal direction of the two-dimensional image I are represented respectively;

in the step (3), the evolution strategy realizes the space search by iterating the following four steps: (a) sampling a new set of candidate solutions; (b) calculating an optimal value for the new candidate solution; (c) selecting an optimal solution based on f (α); (d) updating the parameters according to the selected solution; in the course of the evolutionary algorithm, a complete iteration comprising (a) - (d) is referred to as a generation, each iteration producing new candidate solutions as children for generating the children as parents; for CMA-ES, the average value m according to the current generation after choosing the optimal point in step (c) ^(g-1) Calculating covariance matrix C of next generation ^(g) And at step (4) using the updated average value m ^(g) 。

2. The bone feature-based three-dimensional/two-dimensional image registration method of claim 1, wherein: in the step (3), the step of (c),

for a two-dimensional image I with the size of MxN, in the DRR process based on Ray-casting, the image is formed by integrating MxN rays through volume data to obtain MxN pixels I ^(m，n) Is formula (4.13); each ray is emitted by source, and eachThe light is uniformly distributed with K control points G ^(m，n，k) Representing a continuous integration over the light by discrete integration;

G′＝f _{Interpolation} (T(α)·G) (4.14)

3. the bone feature-based three-dimensional/two-dimensional image registration method of claim 2, wherein: in the step (4), a three-dimensional convolutional neural network of 6 layers is added to the three-dimensional volume data before the three-dimensional volume data is input to the differentiable rendering module based on grid sampling, the three-dimensional convolutional neural network is composed of five convolutional layers, the convolutional kernel of each convolutional layer is 3 multiplied by 3, the step length is (1, 1), after the three-dimensional data are subjected to convolutional neural network to obtain three-channel characteristic images, adding the original images with the obtained characteristic images through one jump connection, outputting three-dimensional data of three channels by a network, wherein the size of each channel is the same as that of the original images, and obtaining four-dimensional data containing image characteristics and the original images after superposition;

for the projected two-dimensional image I _m Reference image I _f Corresponding feature vectors are obtained through two decoders with the same network structure respectively, and the two decoders do not share parameters in the training process; the decoder is formed by connecting a convolution layer with a convolution kernel of 3 multiplied by 3 and a step length of (1, 1), a residual bottleneck module and a pooling layer, and finally outputting a feature vector through one convolution layer; obtain I _m and I_f Feature vector E of (2) _m and E_f Then, the concave function similarity measure will be E _m and E_f As output of the network, this is I _m and I_f Is a similarity measure of (1).

4. A three-dimensional/two-dimensional image registration method based on bone features according to claim 3, characterized in that: in the step (4) of the above-mentioned method,

for image similarity measure L in three-dimensional/two-dimensional registration process _C ＝MSE(E _m ，E _f ) For the concave function, the network uses the Euclidean space geodesic distance with concave function characteristics as the training objective function, as formula (4.17):

L _C ＝MSE(E _m ，E _f )～L _G (α，α _f ) (4.17)

and />

As close as possible, obtaining a target of network training;

during one iteration of the registration, the output is denoted as L _C (φ；V _m ，I _f α), wherein φ represents a learning parameter required by the network; during the back propagation, the networks respectively generate

and />

and />

/>

5. The apparatus of the bone feature-based three-dimensional/two-dimensional image registration method according to claim 1, wherein: it comprises the following steps: