CN113496494A

CN113496494A - Two-dimensional skeleton segmentation method and device based on DRR simulation data generation

Info

Publication number: CN113496494A
Application number: CN202110682691.8A
Authority: CN
Inventors: 李文杰; 杨健; 范敬凡; 宋红; 艾丹妮
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2021-10-12

Abstract

The two-dimensional bone segmentation method and device based on DRR simulation data generation can obtain enough algorithm training sample volume, breaks through the limitation of medical image volume on the performance of the method, does not use X-rays for training at all, can still segment bones of the X-rays, and the segmentation result is very close to the bones. The method comprises the following steps: (1) generating simulation data: acquiring improved DRRs (digital video processing) by sector area control point sampling and GPU (graphics processing unit) acceleration, and respectively generating two-dimensional X-ray images in simulated TIPS (TIPS surgery) and corresponding simulated skeleton segmentation gold standards by the improved DRRs; (2) model training: training a skeleton segmentation network model in a two-dimensional X-ray image by using a simulation data set; (3) bone segmentation: u-shaped network structure combined with traditionImproved nested U-shaped network U²-Net, performing a bone segmentation of the intraoperative two-dimensional image.

Description

Two-dimensional skeleton segmentation method and device based on DRR simulation data generation

Technical Field

The invention relates to the technical field of medical image processing, in particular to a two-dimensional skeleton segmentation method based on DRR simulation data generation and a two-dimensional skeleton segmentation device based on DRR simulation data generation.

Background

In the field of image segmentation, a segmentation method based on a neural network is often excellent in performance. Most existing methods are used for segmenting natural images, so that the required data set and gold standard are easy to acquire. However, since the pixel-level gold standard needs to be labeled by professional medical staff, and the overlapping of non-bone tissues and bones in the X-ray image increases the difficulty of gold standard labeling, which leads to the task of data labeling, the time and energy consumption and the occupation of labor cost, the public medical image data set is in short supply, and the medical image data set which can be used for training is far less than the richness and popularity of natural images.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a two-dimensional bone segmentation method based on DRR simulation data generation, which can obtain enough algorithm training sample volume, breaks through the limitation of medical image volume on the performance of the method, does not use X-rays for training at all, can still segment bones of the X-rays, and the segmentation result is very close to the bones.

The technical scheme of the invention is as follows: the two-dimensional bone segmentation method based on DRR simulation data generation comprises the following steps:

(1) generating simulation data: acquiring improved DRRs (digital video processing) by sector area control point sampling and GPU (graphics processing unit) acceleration, and respectively generating two-dimensional X-ray images in simulated TIPS (TIPS surgery) and corresponding simulated skeleton segmentation gold standards by the improved DRRs;

(2) model training: training a skeleton segmentation network model in a two-dimensional X-ray image by using a simulation data set;

(3) bone segmentation: nested U-shaped network U combined with traditional U-shaped network structure improvement²-Net, performing a bone segmentation of the intraoperative two-dimensional image.

Hair brushObtaining improved DRRs (Drift-data-processing) by sector area control point sampling and GPU (graphics processing Unit) acceleration, respectively generating two-dimensional X-ray images in simulated TIPS (TIPS surgery) and corresponding simulated skeleton segmentation gold standards by the improved DRRs, training skeleton segmentation network models in the two-dimensional X-ray images by using a simulated data set, and improving a nested U-shaped network U by combining a traditional U-shaped network structure²And Net, performing bone segmentation of the intraoperative two-dimensional image, so that enough algorithm training sample size can be obtained, the limit of medical image size on the performance of the algorithm is broken through, the X-ray is not used for training at all, the bone of the X-ray can be segmented, and the segmentation result is very close to the bone.

There is also provided a two-dimensional bone segmentation device generated based on DRR simulation data, comprising:

the simulation data generation module is configured to acquire improved DRRs through sector area control point sampling and GPU acceleration, and generate two-dimensional X-ray images in simulated TIPS (TIPS surgery) and corresponding simulated skeleton segmentation golden standards respectively through the improved DRRs;

a model training module configured to train a bone segmentation network model in a two-dimensional X-ray image using the simulation dataset;

bone segmentation module configured to improve nested U-network U in conjunction with traditional U-network architecture²-Net, performing a bone segmentation of the intraoperative two-dimensional image.

Drawings

FIG. 1 is a flow chart of a method of two-dimensional bone segmentation based on DRR simulation data generation in accordance with the present invention.

FIG. 2 shows a Ray-casting algorithm model based on the sector area processing method.

FIG. 3 shows the results of improved DRR of CT data in sector area based processing using only CPU and using GPU acceleration. (a) And (c) CPU-only DRR results for the bone data and the original volume data, respectively, (b) and (d) improved DRR-based results using GPU acceleration for the bone data and the original volume data, respectively.

Fig. 4 shows a comparison of the residual block and the residual U-shaped block.

FIG. 5 is RSU-L (C)_in,M,C_out) And (5) a structural design schematic diagram.

FIG. 6 shows U²-Net architecture.

FIG. 7 shows U²And the Net outputs results after taking different thresholds, and 20/255 threshold gold standard mask data global accuracy, Dice and IoU index curves are taken.

Fig. 8 shows a comparison of bone segmentation on simulated test data for different two-dimensional segmentation methods.

Fig. 9 shows PR curves of segmentation results of different two-dimensional segmentation methods.

FIG. 10 shows a comparison of two-dimensional bone segmentation results on simulation data with the gold standard according to the proposed method in this chapter.

Fig. 11 shows a contrast map of bone segmentation on real X-ray data for different two-dimensional segmentation methods.

Detailed Description

As shown in fig. 1, the two-dimensional bone segmentation method based on DRR simulation data generation includes the following steps:

The method comprises the steps of acquiring improved DRRs through sector area control point sampling and GPU acceleration, generating two-dimensional X-ray images in simulated TIPS (TIPS surgery) and corresponding simulated skeleton segmentation golden standards respectively through the improved DRRs, training skeleton segmentation network models in the two-dimensional X-ray images by using a simulated data set, and improving a nested U-shaped network U by combining a traditional U-shaped network structure²Net, performing intraoperative bone segmentation of two-dimensional images, thus being able to obtain sufficient algorithm training sample size, breaking through the limitation of medical image size on method performance, and being completeWithout using X-ray for training, the bones of the X-ray can still be segmented, and the segmentation result is very close to the bones.

Preferably, in the step (1), in the process of generating DRRs by using Ray-casting simulation C-arm shooting X-rays, the DRR plane corresponds to an imaging plane of a C-arm projection; assuming that the size of the plane is MxN, simulating MxN light rays in the Ray-casting process, emitting the light rays from a simulated light source, and reaching a simulated imaging plane imager through volume data, wherein the light rays finally form a pyramid-like shape in space; the condition that the X-ray is absorbed by human tissues in the forming process is simulated through the attenuation of the light pixel values, so that the CT volume data is prevented from being too far away from the virtual light source and the virtual imaging plane imager in the rotating process.

Preferably, in the step (1), an Iso-center coordinate system is first established for subsequent calculation, and M × N rays are emitted from the light source, and each ray corresponds to the M, N pixel positions on the DRR plane through the volume data; the time complexity of the algorithm is reduced by adopting a fan-shaped area processing method and a light parallel operation mode, the fan-shaped area is realized by uniformly selecting K points on a fixed area of light, the points are called control points, and the process of X-ray projection is a process of simulating integration by accumulating values of the control points on the light; optimizing the speed of the control point on each ray by using an octree coding mode; and parallel simulation of imaging rays is realized in a GPU acceleration mode, and a projected image is quickly generated under the optimization of a sector sampling area.

Preferably, in the step (2),

the residual U-shaped module replaces the common single-stream convolution in the original residual module with a structure similar to U-Net, further replaces the original characteristic x with a local characteristic formed by a weighting layer, and the output condition of the residual U-shaped module is as shown in a formula (3.3):

H_RSU(x)＝μ(F₁(x))+F₁(x) (3.3)

wherein mu represents a multi-layer U-shaped structure,

the structure of the residual U-shaped module is as follows:

inputting a convolution layer as a common convolution layer for extracting local feature information, which inputs a feature map x (H × W × C)_in) Transformation into an intermediate feature map F₁(x) The number of channels of the intermediate feature map is C_out；

A symmetrical encoder-decoder structure of a U-like network of height L, which combines an intermediate feature map F₁(x) As input, learning to extract and encode multi-scale context information μ (F)₁(x))；

Residual concatenation, fusing local features and multi-scale features by a summation operation:

μ(F₁(x))+F₁(x)。

preferably, in the step (2), in U²In the Net network architecture, a residual U-shaped module is adopted as a bottom layer component, U²The top layer of Net is a U-shaped structure consisting of 11 levels, each level of the network architecture is filled with well-configured residual U-shaped structures; u shape²-Net comprises: six-stage coder on left and lower side of U-shaped junction, five-stage decoder on right side of U-shaped junction, and multi-layer joint output module.

Preferably, in the step (2), U is²The six-level coding path of Net consists of En _1-En _6, the input profiles of these two levels have a lower resolution for the encoder levels En _5 and En _6, in which case at the encoder levels En _5 and En _6, pooling and upsampling operations in the residual U-shaped module are replaced with a convolution with holes; u shape²The five-level decoding path of Net consists of De _5-De _1, the decoder level having a similar structure to the encoder level symmetrical with respect to En _1-En _5, and, similarly to the configuration in the encoder levels En _5 and En _6, the residual U-shaped module using hole convolution is also applied to the decoder levels De _5, U²The input of each decoder level in the Net network is the concatenation result of the upsampled feature maps from the previous level and from the symmetric coding level, respectively; u shape²-a multi-level joint output module of the Net network for generating a final segmentation result graph, U²The Net network is first generated in 6 levels using the 3 x 3 convolution and Sigmoid functions, respectivelyCorresponding 6 side output segmentation result graphs

Then, the side output segmentation result graphs are up-sampled to the size of an input image, then cascade operation is carried out to realize combination on channel dimensions, and finally, the final segmentation result graph S is generated through the operation processing of a 1 multiplied by 1 convolution layer and a Sigmoid function_fuse。

Preferably, in the step (2),

the binary cross entropy is obtained by equation (3.4):

wherein, M is 6,

graph representing side output segmentation result

Loss situation of l_fuseSegmentation result graph S representing the final output_fuseIn the event of a loss of power, the,

and w_fuseWeights representing each of the loss terms, pair U of the subsequent utilization data sets²Selecting w in the experimental evaluation process of the Net network_fuseAs U²Net network final segmentation result, and in addition, w_fuseThe thresholding operation yields binary mask data for the bone segmentation.

It will be understood by those skilled in the art that all or part of the steps in the method of the above embodiments may be implemented by hardware instructions related to a program, the program may be stored in a computer-readable storage medium, and when executed, the program includes the steps of the method of the above embodiments, and the storage medium may be: ROM/RAM, magnetic disks, optical disks, memory cards, and the like. Therefore, in accordance with the method of the present invention, the present invention also includes a two-dimensional bone segmentation apparatus generated based on DRR simulation data, which is generally represented in the form of functional blocks corresponding to the steps of the method. The device includes:

The present invention will be described in more detail below.

1. Improved DRR method based on sector area processing

On a theoretical level, the Ray-casting algorithm is more suitable for simulating the generation of the X-Ray image compared with other image generation algorithms by virtue of the advantage that the illumination model is closest to the imaging principle of the projection of the X-Ray point light source. On one hand, the algorithm can provide more accurate and scientific similarity measurement for the optimization process of the registration parameters, and on the other hand, the generated DRR image is used as training data, so that the network model can be ensured to have more stable effect when being migrated to approximate X-ray data. Therefore, the Ray-casting algorithm is selected herein to implement DRR.

In generating DRRs using Ray-casting to simulate C-arm imaging X-rays, the DRR plane corresponds to the imaging plane of the C-arm projection. Assuming that the plane size is M × M, M × N light rays need to be simulated in the Ray-casting process, and they are emitted from the simulated light source and reach the simulated imaging plane imager through the volume data, and these light rays finally form a pyramid-like shape in space. The condition that the X-ray is absorbed by human tissues in the forming process is simulated through the attenuation of the light pixel values, so that the CT volume data is prevented from being too far away from the virtual light source and the virtual imaging plane imager in the rotating process.

Fig. 2 shows a flow of performing DRR projection by the Ray-casting algorithm based on the sector area processing method. The Iso-center coordinate system is first established for subsequent calculations, emitting M × N rays from the light source, each ray passing through the volume data corresponding to the M, N pixel locations on the DRR plane. The basic Ray-casting projection algorithm has too long calculation time, and the time complexity of the algorithm is reduced by adopting a fan-shaped area processing method and a light parallel operation mode in the chapter. Since the light rays pass through a plurality of non-volume data areas in the process of propagation and the areas do not contribute to the DRR image, a sector-like area is established in the space, and only the value of the volume data in the area is considered, so that the resource consumed by Ray-casting calculation is effectively reduced. The sector area is realized by uniformly selecting K points on a fixed area of light, the points are called control points, and the process of X-ray projection is a process of simulating integration by accumulating values of the control points on the light.

The existing improvement for Ray-casting imaging algorithm mainly focuses on how to effectively reduce the time complexity of the algorithm, and some methods reduce the complexity of the algorithm by reducing the number of pixels on the DRR imaging plane to reduce the number of projected light rays, but this may result in the generated two-dimensional image with too low resolution. Another effective method is to optimize the speed of processing the control point on each light ray, for example, the sector area processing method proposed in this chapter can effectively reduce the computation time of the non-contribution value; for example, an octree coding mode is used, so that the processing time of the non-interested region in the volume data can be effectively reduced. In addition, a recently more commonly used optimization mode is provided, and the parallel computing of the light rays is used for carrying out synchronization processing on data, so that the time optimization can be effectively carried out under the currently developed GPU technology. The method mainly realizes parallel simulation of imaging rays in a GPU acceleration mode, and realizes rapid generation of projection images under sector sampling area optimization.

FIG. 3CT data results in improved DRR based on sector area processing using CPU only and GPU acceleration. (a) And (c) CPU-only DRR results for the bone data and the original volume data, respectively, (b) and (d) improved DRR-based results using GPU acceleration for the bone data and the original volume data, respectively.

As shown in fig. 3, the improved DRR using only CPU and using GPU acceleration has no significant difference in overall results, but has a great improvement in the time to generate the image: the generation time of each chapter of image only using the CPU is about 7.97-17.73 seconds, and the speed of each chapter of image can be achieved by using the improved DRR accelerated by the GPU based on the GPU, wherein the speed of each chapter of image is 0.08-0.09 seconds.

2. Based on U²-Net two-dimensional bone segmentation method

And a network structure which uses a U-shaped structure to nest U-shaped modules. Firstly, the constitution of a residual U-shaped module is explained, and then the specific details of a two-layer nested U-shaped architecture constructed based on the residual U-shaped module are introduced, wherein the three component modules in the architecture are focused: six-stage coder, five-stage coder and multilayer joint output module. Finally, U is elucidated²Supervision strategies of the Net network and training loss cases of the network. U shape²-Net is a two-layer nested U-shaped structure. In addition, nested networks enable the model to extract deeper features while ensuring that memory consumption and computational costs are not significantly increased.

2.1 residual U-shaped module

Local-global context information is very important feature information for the segmentation task. Small convolution filters of size 1 × 1 or 3 × 3 are the most common way to extract features by virtue of small memory space and efficient computation rate, but such small convolution filters cannot capture global feature information because the receptive field is too small. In order to extract more global feature information from the shallow high-resolution feature map, different convolution module improvement ideas are successively proposed: the method includes the steps of utilizing a hole convolution to expand a receptive field, namely an inclusion-like Block (INC) and a PoolNet (PPM) utilizing a Pyramid Pooling module, and the like, but multiple hole convolution operations can cause a network to need more computing resources and storage space, and the degradation of high-resolution features can be generally caused in a mode of directly fusing features of different scales through up-sampling, cascading and adding operations.

U²The Net network can obtain accurately-positioned and contour-consistent segmentation results from data, and this requires a ReSidual U-block (RSU) at the bottom layer in the network architecture to extract global contrast features from different levels and local detail features from the ReSidual U-block. The residual U-shaped module is a model obtained by further improving the original residual module, and fig. 4(a) is the original residual module, which is proposed to solve the problem of the degradation of training and testing accuracy caused by the increase of the number of network layers. The residual module adds a short Connection (i.e. a Connection line from input x to "add" in fig. 4 (a)) compared to the normal two-layer network, and the effect of the above design is equivalent to reducing the addition of Identity Mapping in the module. For the output of the residual block, there is h (x) F₂(F₁(x) + x, where H (x) represents the output data before the first activation operation, i.e. the target mapping of the input features x, F₂、F₁Representing the weighting layer that performs the convolution operation. Moreover, thanks to the near-path connection design of the residual error module, the optimization difficulty of the residual error module is greatly reduced, and at this time, the optimization of the residual error module can be realized only by optimizing the residual error mapping f (x) ═ h (x) — x to 0.

As the continuation of the residual error module, the residual error U-shaped module still keeps the near-path connection in the module, so that the network can obtain good output accuracy in the training stage, the number of parameters is reduced, and the types of the characteristic diagrams are richer. In addition, in order to better capture the multi-scale features in the network hierarchy, a series of design and improvement are made on the residual U-shaped module, as shown in fig. 4(b), the main differences between the residual U-shaped module and the original residual module in terms of module structure are as follows: and the residual U-shaped module replaces the common single-stream convolution in the original residual module with a structure similar to U-Net, and further replaces the original characteristic x with a local characteristic formed by a weighting layer. The output condition of the residual U-shaped module is as follows:

H_RSU(x)＝μ(F₁(x))+F₁(x) (3.3)

wherein μ represents the multi-layer U-shaped structure in fig. 4 (b). Multi-layer U-shaped structure makes U²The Net network can directly extract multi-scale feature information from the residual U-shaped module. Since most of the operations in the residual U-shaped module are applied to the downsampled feature map, the module has less computational overhead.

The structure of the residual U-shaped module is shown in FIG. 5, here using RSU-L (C)_in,M,C_out) Denotes a residual U-shaped module, where L denotes the number of encoder layers of the module, M denotes the number of channels in the inner layers of the residual U-shaped module, C_inAnd C_outRepresenting the input and output channels of the module. The module comprises the following three parts:

(1) inputting the convolution layer. As a common convolution layer for extracting local feature information, it inputs a feature map x (H × W × C)_in) Transformation into an intermediate feature map F₁(x) The number of channels of the intermediate feature map is C_out。

(2) A symmetric encoder-decoder structure resembling a U-network of height L. The structure is characterized by an intermediate feature map F₁(x) As input, learning to extract and encode multi-scale context information μ (F)₁(x) μ is a U-Net like structure as shown in fig. 5). The size of the structural height L of the symmetric encoder-decoder plays an important role in regulating and controlling the performance of the module, namely, the larger height L corresponds to the deeper residual U-shaped module, and further means that the times of pooling operation are more, the larger receptive field range is and the richer local and global characteristics are.

(3) And residual connecting. This section uses the connection method of fig. 4(b) to fuse local features and multi-scale features through a summation operation: mu (F)₁(x))+F₁(x)。

FIG. 5 shows RSU-L (C)_in,M,C_out) And (5) a structural design schematic diagram. The RSU is the bottom layer module of the nested U-network, which is mainly composed of input convolutional layers, symmetric encoder-decoder structures and residual connection, where L is the number of layers of the encoder, which is 7 in this figure.

2.2 U²-Net architecture

At U²In the Net network architecture, a residual U-shaped module is adopted as a bottom component of the network, so that the network extracts abundant local and global context characteristics in data to the maximum extent to enhance U²-split performance of Net. Unlike the cascaded U-shaped structures of the previous section, the nested U-shaped structures require less computational cost and memory space and can more fully capture multi-scale, multi-level features from both the intra-level and aggregation stages, where U is usedⁿ-Net to represent a nested U-shaped structure. When the value of n is large, the constructed architecture has too many nested layers, and the architecture is too complex on the whole and is not beneficial to concrete realization and use.

Setting the index n as 2, and constructing to obtain two layers of nested U²Net architecture for bone segmentation of two-dimensional images, the network structure of which is shown in fig. 6. It can be seen that U²The top layer of Net is a U-shaped structure consisting of 11 levels, i.e. the cubic structure in fig. 6. In addition, each level of the network architecture is filled with well-configured residual U-shaped structures. U shape²Net mainly comprises the following three parts: (1) six-stage encoders positioned on the left and lower sides of the U-shaped junction; (2) a five-level decoder located on the right side of the U-shaped junction; (3) and a multilayer joint output module.

FIG. 6 is U²-Net architecture. The main structure is a U-Net like encoder-decoder, where each stage consists of the RSU proposed in this chapter.

TABLE 1

A first part: u shape²The six-level coding path of-Net consists of En _1-En _ 6. For the encoder levels En _5 and En _6, the input feature maps of these two levels have a lower resolution, in which case downsampling it would result in a loss of useful context information. Thus, at the encoder levels En _5 and En _6, pooling and upsampling operations in the residual U-shaped module are replaced with a convolution with holes. Watch (A)1 lists U²-detailed configuration in each encoder level of the Net network.

TABLE 2

A second part: u shape²The five-level decoding path of-Net is composed of De _5-De _ 1. The decoder level has a similar structure as the encoder level is symmetric with respect to En _1-En _ 5. Furthermore, similar to the configuration in the encoder levels En _5 and En _6, a residual U-shaped module using hole convolution is also applied to the decoder level De _ 5. As can be seen in FIG. 6, U²The input to each decoder level in the Net network is the concatenation result of the upsampled feature maps from the previous level and from the symmetric coding level, respectively. U is listed in Table 2²-detailed configuration in each decoder level of the Net network.

And a third part: u shape²-a multi-layer joint output module of the Net network for generating a final segmentation result graph. Fusion strategy similar to HED (Holsticaily-Nested Edge Detection) algorithm, U²The Net network firstly uses the 3 x 3 convolution and Sigmoid function to respectively generate 6 side output segmentation result graphs corresponding to the 6 hierarchies

Then, the side output segmentation result graphs are up-sampled to the size of an input image, then cascade operation is carried out to realize combination on channel dimensions, and finally a final segmentation result graph S is generated through the operation processing of a 1 × 1 convolution layer and a Sigmoid function_fuse。

In summary, U²The Net network architecture is designed to have a powerful local-global multi-scale feature extraction capability while maintaining the computational and memory costs of the network at a relatively low level. Besides, due to U²The bottom layer architecture of the-Net network is built by using residual U-shaped modules only, and other existing pre-training backbone networks are not adopted, so that U is formed²-Net networkThe performance loss of the method is small, the method can be more flexibly adapted to different application scenes, and the robustness is high.

2.3 loss function

During the training process, U²The Net network adopts a deep supervision strategy similar to HED. Specifically, for each codec level, the loss condition of the level is calculated by using a cross-entropy loss function, and the calculation formula of the binary cross-entropy is as follows:

wherein the content of the first and second substances,

(M-6 for Sup1, Sup2, …, Sup6 in FIG. 6) represents the side output segmentation result graph

Loss situation of l_fuse(Sup 0 in FIG. 6) represents the final output segmentation result graph S_fuseThe loss case of (c). In addition to that, in the formula

And w_fuseRepresenting the weight of each loss term. U shape²The training process of the-Net network is equivalent to the process of minimizing the total loss L in equation (3.4). Pair U of subsequent utilization data sets²Selecting S in the process of experimental evaluation of the Net network_fuseAs U²Net network final segmentation result, and, in addition, S_fuseThe thresholding operation may result in binary mask data for the bone segmentation.

3. Experimental results and discussion

3.1 Experimental data and data Generation

The data set used in the method is mainly divided into two parts, the first part is a simulated X-ray data set, and the data set is processed by improving DRR method to a common data set containing the patient's chest cavityThree-dimensional CT data in LIDC-IDRI, CPTAC-LUA and CPTAC-LSCC are generated, wherein the data from LIDC-IDRI and the data used in chapter II are identical. One part of CT data is selected to be used for training a two-dimensional bone segmentation network, and the other part of CT data is used for network testing and quantitative evaluation. The second part of the X-ray data set, which was obtained from the general hospital of the people's liberation military in china for a total of 119 patients each from 52 different patients, had no gold standard for skeleton segmentation and was therefore all used for qualitative assessment of the final X-ray skeleton segmentation results. Since the partial data has corresponding three-dimensional CT images, the three-dimensional/two-dimensional registration experiment can be used in the subsequent process. The data of the first part is strictly screened, and CT data with good imaging quality and the thickness between layers being less than or equal to 1.25 mm is selected, so that the completeness of the bone structure is ensured, and other large lesions influencing the DRR image are avoided. In addition, in order to enhance data, small-scale rotation and left-right direction mirror image operation are also carried out in the process of generating DRR data, and 1200 cases of simulated X-ray data are finally obtained by improving DRR, wherein 900 cases are used for training, and 300 cases are used for testing. The data set of this part is composed of two images, and one piece of CT data V and mask data for bone segmentation are first required, and the mask data is applied to the CT data to obtain CT data V only with bone information_boneV and V to be in the same spatial position_boneGeneration of I by modified DRR under the same simulated imaging conditions, respectively₀And I_boneAs shown in FIG. 7, using I₀As U²Inputs to the Net network, I_boneAs U²Net network output, unlike a general two-dimensional image (single class) segmentation network, the input and output are not binary mask data, but a probability map-like skeleton extraction result, i.e. the goal is not to obtain I alone₀Masking the bone and letting S_fuseIs more close to I_bone。

3.2 implementation details

TABLE 3U²-training parameter configuration of Net

The training details of the network in this chapter are shown in table 3, and in order to make the training data more approximate to the real X-ray data used in the TIPS surgery process, the ratio of the real X-ray data to the complete two-dimensional image of the thoracic cavity is randomly distributed between 0.5 and 1. According to the proportion, the training of the network in this chapter adjusts the size of random clipping after training to 800 epochs. To verify the effect of the two-dimensional bone segmentation method proposed in this chapter, the most advanced two-dimensional image segmentation Networks deep v3, DNL (discrete Non-Local Neural Networks), UNet + FCN, Non-Local (Non-Local Neural Networks), UNet + pspnet (dense Scene matching Networks) were used for comparison in this chapter on the same training data. Wherein for training data of the segmented network, I_boneThe mask data was obtained by taking a threshold of 20/255.

3.3 simulation data experiment result analysis

FIG. 7 is U²And the Net outputs results after taking different thresholds, and 20/255 threshold gold standard mask data global accuracy, Dice and IoU index curves are taken. The dotted line in the figure is in the pair U²The magnitude of each index in case of 20/255 threshold value taken by Net output

For the evaluation of the simulation data, 5 evaluation indexes such as Accuracy (Acc) of the segmentation foreground, Dice, overlap ratio (IoU) coefficient, and PR curve for evaluating recall rate and precision rate are used. To measure the outcome of the segmentation, the golden standard I in the prediction data is evaluated_boneTo calculate U during Acc, Dice, IOU²-output result S of Net network_fuseThresholding of 20/255 was also used to obtain mask data. As can be seen from FIG. 7, in the handle U²After thresholding is carried out on the skeleton extraction result of Net, indexes are relatively good when 20/255 is obtained, and Acc, Dice and IOU are all obviously reduced under the condition that other thresholds are obtained, which proves that U provided by the method is²Net bone extraction results have a higher similarity to the gold standard.

On the test set of simulation data, it can be found from Table 4 that U as proposed herein²Even if the result is thresholded, Net is best in the accuracy of the bone segmentation result and the overlapping rate of the result and the gold standard relative to other methods, particularly Acc is 5% -2% higher than that of other methods and is obviously better than that of other methods, and proves that U proposed in this chapter is U²Net can extract bone more accurately than other methods.

TABLE 4

Fig. 8 is a comparison graph of bone segmentation on simulated test data for different two-dimensional segmentation methods. The first column is the original image to be segmented, i.e., the result of the direct generation of the improved DRR. By applying to the golden standard I in the predicted data_boneAnd U²-output result S of Net network_fuseTaking 20/255 the threshold, the mask data obtained corresponds to the last column and the second to last column. Each of the remaining columns represents the segmentation results of a different method.

From the results of the different segmentation methods shown in fig. 8, it can be readily seen that U as proposed herein²The threshold of the extracted bone segmentation result of Net and the threshold of the gold standard are very close to the threshold of the extracted bone segmentation result, and complete segmentation can be achieved no matter whether the thickness of the bone changes, the overall shape of the bone or the complicated bone overlapping exists. And for comparison of the mask results, U²Net also works better, closer to real bone (fig. 8 first and sixth rows), relative to other segmentation methods, which are more evident when dealing with rib regions with lower contrast and bone fracture sites occluded by some organs (fig. 8 second to fifth rows). And U²-Net has no disadvantages above and the overall structure is very complete, therefore, U²Net can efficiently extract bone structures that are occluded by organ regions.

By observing FIG. 9, U²The segmentation result of-Net is closer to the upper right corner than other common segmentation resultsIs based on a deep learning segmentation method, U²Net has stronger discrimination capability for skeleton information in two-dimensional X-ray images, and the shape of the curve obviously reflects U²-Net can ensure higher recall rate and has higher precision, proving that U provided by the chapter²Net is able to perform two-dimensional image segmentation tasks more excellently.

FIG. 10 is a graph comparing the two-dimensional bone segmentation results of the present method on simulated data with the gold standard. Randomly selecting data of 5 patients, wherein each column from left to right represents a simulated image, U, directly generated by improving DRR of original CT data²-output result S of Net network_fuseAnd bone segmentation golden standard I generated directly by improving DRR in CT data_bone。

In addition to that, the proposed U illustrated by fig. 10²the-Net two-dimensional bone segmentation result is compared with the gold standard, and the bone segmentation method proposed in this chapter is very similar to the gold standard in both overall structure and local detail. Besides weak image blurring caused by the fact that the textures on the bones cannot be restored, the structure of the bones in the chest can be restored accurately, clearly and obviously, and rib fracture, wrong bone pose and over-segmentation can not occur.

Fig. 11 is a contrast diagram of bone segmentation on real X-ray data by different two-dimensional segmentation methods. The first column is the original image to be segmented, i.e., the result of the direct generation of the improved DRR. U shape²-output result S of Net network_fuseTaking 20/255 threshold is additionally employed to obtain mask data. Each of the remaining columns represents the results of bone segmentation by a different method. Real X-ray data has no skeletal gold standard.

3.4 analysis of true X-ray data Experimental results

Some real X-ray data are selected to carry out a bone extraction experiment, the experiment result is shown as 11, and the data have no gold standard due to difficult delineation and extraction. From a review of FIG. 11, it is evident that Deeplab V3, DNL, Non-local, UNet + FCN and UNet + PSPNet are nonetheless within the same image propertiesThe DRR segments bones, but after the modal change is real X-ray data, the segmentation result of the bones is poor, and a plurality of situations of fracture, blurring and wrong segmentation obviously occur, wherein a plurality of over-segmentation situations exist in the DNL result, and a plurality of non-bone areas are segmented; the accuracy of Non-local segmentation is low, and a lot of under-segmentation conditions exist; bone fracture is most obvious as a result of Deeplab V3 segmentation; no normal bone structure at all; the results of UNet + FCN and UNet + PSPNet are also poor, with more breakage and incompleteness; indicating that these networks clearly lack generalization capability. And U²Net has strong field adaptability, the information of the spine and ribs can be successfully segmented from each data by observing a penultimate column, the whole body is clear and complete, and the situations of fracture and mistaken segmentation rarely occur.

A two-dimensional bone segmentation method based on DRR simulation data generation is presented herein. Firstly, DRR based on Ray-casting principle is improved through sector area control point sampling and GPU acceleration, and is used for generation of two-dimensional bone segmentation simulation data and subsequent registration links. It is worth mentioning that the image generation speed for improving DRR is improved by two orders of magnitude compared to the original. Then, aiming at the bone segmentation problem of the two-dimensional thoracic cavity image, the method introduces a nested U-shaped network. Specifically, the method combines the jump connection of the residual error network with the characteristic layer resolution symmetric structure of the U-shaped network, builds a residual error U-shaped module, and uses the residual error U-shaped module as a bottom layer module to form a six-stage encoder and a five-stage decoder in a combined mode. As the core component of the network, the six-stage coder and the five-stage decoder form U through U-shaped connection²Net, the task of bone segmentation on X-ray images is completed.

Qualitative and quantitative experiments on simulation data and real data show that the method has stronger bone feature extraction and field adaptability compared with other algorithms due to the local-global context feature information extraction capability of the residual U-shaped module. By analyzing the skeleton segmentation result of the algorithm on the two-dimensional X-ray image or the DRR image, U can be easily found²Net is able to solve well bones in complex scenariosAnd (5) segmenting the task to obtain a segmentation result which is closer to the real skeleton physiological structure.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications, equivalent variations and modifications made to the above embodiment according to the technical spirit of the present invention still belong to the protection scope of the technical solution of the present invention.

Claims

1. The two-dimensional skeleton segmentation method based on DRR simulation data generation is characterized by comprising the following steps of: which comprises the following steps:

(3) bone segmentation: nested U-shaped network U combined with traditional U-shaped network structure improvement²-

And Net, performing bone segmentation of the intraoperative two-dimensional image.

2. The two-dimensional bone segmentation method based on DRR simulation data generation as claimed in claim 1, wherein: in the step (1), in the process of generating DRR by shooting X-Ray by using Ray-casting simulation C-shaped arm, the DRR plane corresponds to the imaging plane projected by the C-shaped arm; assuming that the size of the plane is MxN, simulating MxN light rays in the Ray-casting process, emitting the light rays from a simulated light source, and reaching a simulated imaging plane imager through volume data, wherein the light rays finally form a pyramid-like shape in space; the condition that the X-ray is absorbed by human tissues in the forming process is simulated through the attenuation of the light pixel values, so that the CT volume data is prevented from being too far away from the virtual light source and the virtual imaging plane imager in the rotating process.

3. The two-dimensional bone segmentation method based on DRR simulation data generation as claimed in claim 2, wherein: in the step (1), firstly, establishing an Iso-center coordinate system for subsequent calculation, and emitting M multiplied by N rays from a light source, wherein each ray corresponds to the M and N pixel positions on a DRR plane through volume data; the time complexity of the algorithm is reduced by adopting a fan-shaped area processing method and a light parallel operation mode, the fan-shaped area is realized by uniformly selecting K points on a fixed area of light, the points are called control points, and the process of X-ray projection is a process of simulating integration by accumulating values of the control points on the light; optimizing the speed of the control point on each ray by using an octree coding mode; and parallel simulation of imaging rays is realized in a GPU acceleration mode, and a projected image is quickly generated under the optimization of a sector sampling area.

4. The method of claim 3 in which the two-dimensional bone segmentation is based on DRR simulation data generation, comprising: in the step (2), the step (c),

H_RSU(x)＝μ(F₁(x))+F₁(x) (3.3)

wherein mu represents a multi-layer U-shaped structure,

the structure of the residual U-shaped module is as follows:

A symmetrical encoder-decoder structure of a U-like network of height L, which combines an intermediate feature map F₁(x) As input, learning to extract and encode multi-scale context information μ (F)₁(x) ); residual concatenation, fusing local features and multi-scale features by a summation operation:

μ(F₁(x))+F₁(x)。

5. the method of two-dimensional bone segmentation based on DRR simulation data generation of claim 4, wherein: in the step (2), in U²In the Net network architecture, a residual U-shaped module is adopted as a bottom layer component, U²The top layer of Net is a U-shaped structure consisting of 11 levels, each level of the network architecture is filled with well-configured residual U-shaped structures; u shape²-Net comprises: six-stage coder on left and lower side of U-shaped junction, five-stage decoder on right side of U-shaped junction, and multi-layer joint output module.

6. The method of two-dimensional bone segmentation based on DRR simulation data generation of claim 5, wherein: in the step (2), U²The six-level coding path of Net consists of En _1-En _6, the input profiles of these two levels have a lower resolution for the encoder levels En _5 and En _6, in which case at the encoder levels En _5 and En _6, pooling and upsampling operations in the residual U-shaped module are replaced with a convolution with holes; u shape²The five-level decoding path of Net consists of De _5-De _1, the decoder level having a similar structure to the encoder level symmetrical with respect to En _1-En _5, and, similarly to the configuration in the encoder levels En _5 and En _6, the residual U-shaped module using hole convolution is also applied to the decoder levels De _5, U²The input of each decoder level in the Net network is the concatenation result of the upsampled feature maps from the previous level and from the symmetric coding level, respectively; u shape²-a multi-level joint output module of the Net network for generating a final segmentation result graph, U²The Net network firstly uses the 3 x 3 convolution and Sigmoid function to respectively generate 6 side output segmentation result graphs corresponding to the 6 hierarchies

Then, the side output segmentation result images are up-sampled to the size of the input image, then cascade operation is carried out to realize combination on channel dimensions, and finally, the side output segmentation result images pass through a 1 multiplied by 1 convolution layer and a Sigmoid function calculation processing to generate the final segmentation result graph S_fuse。

7. The method of claim 6 in which the two-dimensional bone segmentation is based on DRR simulation data generation, comprising: in the step (2), the step (c),

the binary cross entropy is obtained by equation (3.4):

wherein, M is 6,

graph representing side output segmentation result

and w_fuseWeights representing each of the loss terms, pair U of the subsequent utilization data sets²Selecting S in the process of experimental evaluation of the Net network_fuseAs U²Net network final segmentation result, and, in addition, S_fuseThe thresholding operation yields binary mask data for the bone segmentation.

8. Two-dimensional skeleton segmentation device based on DRR simulation data generation, its characterized in that: it includes: