CN111369567B

CN111369567B - Method and device for segmenting target object in three-dimensional image and electronic equipment

Info

Publication number: CN111369567B
Application number: CN201811603470.1A
Authority: CN
Inventors: 卓嘉璇; 李悦翔; 郑冶枫
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-12-26
Filing date: 2018-12-26
Publication date: 2022-12-16
Anticipated expiration: 2038-12-26
Also published as: CN111369567A

Abstract

The embodiment of the application provides a method and a device for segmenting a target object in a three-dimensional image and electronic equipment. The method comprises the following steps: according to a plurality of branch characteristic three-dimensional networks of the multi-channel three-dimensional network model, respectively extracting the characteristics of three-dimensional images of a plurality of modal groups to be segmented to obtain branch characteristic graphs of a plurality of branches; according to the fusion characteristic three-dimensional network, performing characteristic extraction and fusion on the branch characteristic diagrams of the multiple branches to obtain a fusion characteristic diagram; and according to the size amplification three-dimensional network, fusing and size-amplifying the fused feature map and the branch feature maps of the multiple branches to obtain a three-dimensional image of the segmented target object. In the embodiment of the application, different morphological characteristics of the same target object in three-dimensional images of different modal groups are extracted and fused, so that the type and edge identification precision of the target object are greatly improved, and the segmentation precision of the target object is improved.

Description

Method and device for segmenting target object in three-dimensional image and electronic equipment

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for segmenting a target object in a three-dimensional image, and an electronic device.

Background

Currently, there are some three-dimensional images including medical three-dimensional images, for example, CT (Computed Tomography) images and MRI (nuclear Magnetic Resonance Imaging) images. It is difficult for a user to visually identify each target object from the three-dimensional image, for example, it is difficult for a doctor to screen and identify the target objects one by one from the medical three-dimensional image, so that false detection and missed detection are easy to occur, and time is consumed. Accordingly, the related art has developed a segmentation method of a target object in a three-dimensional image.

In the prior art, there is a method for segmenting a target object based on Instance Segmentation (Instance Segmentation), which segments each target object (e.g. cell nucleus) in a three-dimensional image by classifying the three-dimensional image of a certain modality.

However, the inventors of the present application have found that the accuracy of the target object segmented from the three-dimensional image by using the prior art is low, or the segmentation process is slow and takes a long time.

Disclosure of Invention

The application provides a method and a device for segmenting a target object in a three-dimensional image and electronic equipment, which can solve the problems that the segmentation precision of the target object is low or the speed of the segmentation process is low.

The technical scheme is as follows:

in a first aspect, a method for segmenting a target object in a three-dimensional image is provided, which includes:

according to a plurality of branch characteristic three-dimensional networks of the multi-channel three-dimensional network model, respectively extracting the characteristics of three-dimensional images of a plurality of modal groups to be segmented to obtain branch characteristic graphs of a plurality of branches; the multi-channel three-dimensional network model comprises a branch characteristic three-dimensional network group, a fusion characteristic three-dimensional network and a size amplification three-dimensional network which are sequentially cascaded; the branch characteristic three-dimensional network group comprises a plurality of parallel branch characteristic three-dimensional networks;

according to the fusion characteristic three-dimensional network, performing characteristic extraction and fusion on the branch characteristic graphs of the multiple branches to obtain a fusion characteristic graph;

and according to the size amplification three-dimensional network, fusing and size-amplifying the fused feature map and the branch feature maps of the multiple branches to obtain a three-dimensional image of the segmented target object.

Optionally, the three-dimensional network of fusion features comprises a cascaded three-dimensional sub-network of primary fusion features and a three-dimensional sub-network of secondary fusion features;

and according to the fusion characteristic three-dimensional network, performing characteristic extraction and fusion on the branch characteristic graphs of the plurality of branches to obtain a fusion characteristic graph, wherein the fusion characteristic graph comprises the following steps:

according to the primary fusion feature three-dimensional sub-network, fusing the branch feature maps of the branches, and performing feature extraction on the fused branch feature maps of the branches to obtain a primary fusion feature map in the fusion feature map;

and according to the secondary fusion feature three-dimensional sub-network, extracting the features of the primary fusion feature graph to obtain a secondary fusion feature graph in the fusion feature graph.

Optionally, the merging the branch feature maps of the branches according to the first-level merging feature three-dimensional sub-network, and performing feature extraction on the merged branch feature maps of the branches to obtain a first-level merging feature map in the merging feature map, including:

the first-level fusion feature three-dimensional sub-network comprises a cascaded fusion three-dimensional volume block and at least one three-dimensional volume block;

respectively convolving the branch characteristic graphs of the branches according to the fused three-dimensional volume block, and fusing the branch characteristic graphs after the convolution of the branches to obtain an original fused characteristic graph;

and according to at least one three-dimensional volume block, performing feature extraction on the original fusion feature map to obtain a primary fusion feature map.

Optionally, according to the fused three-dimensional volume block, respectively convolving the branch feature maps of the branches, and fusing the branch feature maps after the convolution of the branches to obtain an original fused feature map, including:

according to the three-dimensional convolution layers in the three-dimensional convolution units fused with the three-dimensional convolution blocks, respectively convolving the branch characteristic graphs of the branches, wherein the three-dimensional convolution units are parallel;

and connecting the branch characteristic diagrams after the convolution of each branch on the channel dimension according to the connecting layers in the fused three-dimensional convolution blocks to obtain an original fused characteristic diagram, wherein the connecting layers are cascaded behind each three-dimensional convolution unit.

Optionally, before connecting the branch feature maps after convolving the branches in the channel dimension, the method further includes:

according to a batch normalization layer and an activation function layer which are sequentially cascaded in each three-dimensional convolution unit fused with the three-dimensional convolution blocks, sequentially carrying out normalization and nonlinear processing on the branch characteristic diagram after each branch convolution;

and connecting the branch feature maps after the convolution of the branches on the channel dimension according to the connection layer in the fused three-dimensional convolution block to obtain an original fused feature map, wherein the method comprises the following steps:

connecting the branch characteristic diagrams of all branches which are sequentially subjected to convolution, normalization and nonlinear processing on channel dimensions according to the connection layer to obtain a connection characteristic diagram;

and smoothing the connection characteristic diagram according to the three-dimensional convolution layer in the fused three-dimensional convolution block to obtain an original fused characteristic diagram.

Optionally, the two-level fusion feature three-dimensional sub-network comprises cascaded stepping three-dimensional volume blocks and at least one three-dimensional volume block;

and/or, the at least one three-dimensional convolution block comprises a plurality of three-dimensional convolution blocks in cascade; the three-dimensional convolution block includes: at least one of a deep stereo dense network block and a three-dimensional graph convolution network block; the three-dimensional graph convolution network block comprises two-dimensional convolution unit branches connected in parallel, a parallel connection block cascaded behind the two-dimensional convolution unit branches, and a first-dimensional convolution unit cascaded behind the parallel connection block; the two-dimensional convolution unit branch comprises a second-dimensional convolution unit and a third-dimensional convolution unit which are cascaded or a third-dimensional convolution unit and a second-dimensional convolution unit which are cascaded.

Optionally, performing feature extraction on the original fusion feature map according to at least one three-dimensional volume block to obtain a primary fusion feature map, including:

for each cascaded three-dimensional graph convolution network block, taking an original fusion feature graph or an intermediate fusion feature graph output by a previous three-dimensional graph convolution network block as an input feature graph;

according to the two-dimensional convolution unit branches of the three-dimensional graph convolution network block, respectively extracting the features of the input feature graph in a second dimension and a third dimension to obtain the feature graphs of the second dimension and the third dimension of each branch;

according to the parallel connection block of the three-dimensional graph convolution network block, parallel connection is carried out on the second dimension characteristic graph and the third dimension characteristic graph of each branch;

and according to a first dimension convolution unit of the three-dimensional graph convolution network block, performing feature extraction on a second dimension and a third dimension feature graph obtained by parallel connection on a first dimension, and overlapping the input feature graphs to obtain and output an intermediate fusion feature graph or a first-level fusion feature graph corresponding to the three-dimensional graph convolution network block.

Optionally, the three-dimensional network of branch characteristics comprises a cascade of a primary three-dimensional subnetwork of branch characteristics and a secondary three-dimensional subnetwork of branch characteristics;

and according to the plurality of branch characteristic three-dimensional networks, respectively extracting the characteristics of the three-dimensional images of the plurality of modality groups to be segmented to obtain branch characteristic graphs of the plurality of branches, wherein the branch characteristic graphs comprise:

according to the primary branch characteristic three-dimensional sub-network in each branch characteristic three-dimensional network, performing characteristic extraction on the three-dimensional image of each modal group to be segmented to obtain a primary branch characteristic diagram in the branch characteristic diagram of the branch;

and according to the secondary branch characteristic three-dimensional sub-network in the branch characteristic network, performing characteristic extraction on the primary branch characteristic diagram to obtain a secondary branch characteristic diagram in the branch characteristic diagram of the branch.

Optionally, the secondary branch feature three-dimensional sub-network includes a two-dimensional cascaded dense block or a cascaded deep stereo residual error network block;

and according to the secondary branch feature three-dimensional sub-network in the branch feature network, performing feature extraction on the primary branch feature graph to obtain a secondary branch feature graph in the branch feature graph of the branch, wherein the feature extraction comprises the following steps:

the two-dimensional cascade dense block comprises a first two-dimensional convolution unit, a second two-dimensional convolution unit and a third two-dimensional convolution unit which are cascaded;

extracting the characteristics of the primary branch characteristic diagram on a first two-dimension according to the first two-dimension convolution unit to obtain a first two-dimension characteristic diagram;

performing primary superposition on the first two-dimensional feature map and the primary branch feature map; extracting the characteristics of the feature map after the first-level superposition on a second two-dimension according to a second two-dimension convolution unit to obtain a second two-dimension feature map;

performing two-level superposition on the second two-dimensional feature map, the first two-dimensional feature map and the first-level branch feature map; extracting the features of the feature map after the secondary superposition on a third two-dimension according to a third two-dimension convolution unit to obtain a third two-dimension feature map;

and superposing the third two-dimensional feature map, the first two-dimensional feature map and the first-level branch feature map to obtain a second-level branch feature map of the three-dimensional image to be segmented of the modal group.

Optionally, the size-enlarged three-dimensional network comprises cascaded first to fourth size-enlarged superposition blocks;

and according to the size enlargement three-dimensional network, fusing and size enlargement are carried out on the fusion characteristic diagram and the branch characteristic diagrams of the plurality of branches, and a three-dimensional image of the segmented target object is obtained, wherein the three-dimensional image comprises the following steps:

according to the first size amplification superposition block, performing channel number conversion and size amplification on the second-level fusion characteristic diagram to obtain a first-level size amplification image;

according to the second size enlargement superposition block, after channel number conversion is carried out on the primary fusion characteristic diagram, superposition is carried out on the primary fusion characteristic diagram and the primary size enlargement diagram, and size enlargement is carried out on the superposed three-dimensional image to obtain a secondary size enlargement diagram;

superposing the secondary branch characteristic graphs of all the branches; according to the third size enlargement superposition block, after channel number conversion is carried out on the superposed secondary branch characteristic diagram, superposition is carried out on the superposed secondary branch characteristic diagram and the secondary size enlargement diagram, and size enlargement is carried out on the superposed three-dimensional image to obtain a tertiary size enlargement diagram;

superposing the primary branch characteristic graphs of all branches; and according to the fourth size enlargement superposition block, performing channel number conversion on the superposed primary branch characteristic diagram, and superposing the converted primary branch characteristic diagram with the tertiary size enlargement diagram to obtain a three-dimensional image of the segmented target object.

Optionally, the multi-channel three-dimensional network model is obtained by pre-training through the following method:

determining an extended sample set according to the original sample set;

dividing a verification set and a training set from the extended sample set;

carrying out preliminary training on the original multi-channel three-dimensional network model by using a training set to obtain a preliminarily trained multi-channel three-dimensional network model;

carrying out verified target object segmentation on the verification set by using the preliminarily trained multi-channel three-dimensional network model to obtain a segmentation result;

determining a three-dimensional image of each difficult sample according to the segmentation result; the problematic sample three-dimensional image comprises a sample three-dimensional image of which the area occupied by the target object is smaller than an area threshold value or the classification error rate of the target object is higher than an error rate threshold value;

and training the preliminarily trained multi-channel three-dimensional network model according to the three-dimensional images of the difficult samples to obtain the selected multi-channel three-dimensional network model.

Optionally, the multi-channel three-dimensional network model comprises a loss function layer cascaded after the three-dimensional network is enlarged in size in a training process; the loss function layer comprises a cross entropy function and an auxiliary weighting loss function;

and training the preliminarily trained multi-channel three-dimensional network model according to the three-dimensional image of each difficult sample to obtain a selected multi-channel three-dimensional network model, wherein the training comprises the following steps:

performing iterative training on the preliminarily trained multi-channel three-dimensional network model according to the three-dimensional images of the difficult samples until a preset convergence condition is met; one of the iterative trainings includes:

inputting a three-dimensional image of a current difficult sample into a multi-channel three-dimensional network model obtained by last training; outputting a current prediction result through a loss function layer of the multi-channel three-dimensional network model;

determining the error between the current prediction result and the sample segmentation result corresponding to the current problematic sample three-dimensional image; reversely propagating the error to each hidden layer in the multi-channel three-dimensional network model, and calculating the gradient of the error reversely propagated to each hidden layer; and updating parameters of all hidden layers in the multi-channel three-dimensional network model according to the gradient to obtain the multi-channel three-dimensional network model obtained by the iterative training.

Optionally, the method for segmenting the target object in the three-dimensional image provided by the first aspect further includes at least one of:

the three-dimensional images of the plurality of modality groups include medical three-dimensional images of three modality groups; the medical three-dimensional images of the three modality groups include: the method comprises the following steps that an electronic computed tomography three-dimensional image and a four-dimensional perfusion diffusion weighted imaging three-dimensional image of a first modality group, a cerebral blood flow three-dimensional image and a cerebral blood flow volume three-dimensional image of a second modality group and a contrast agent average transit time three-dimensional image and a cerebral blood flow peak time three-dimensional image of a third modality group are obtained; the target object includes one of a tissue, an internal organ, and a lesion;

the size of the three-dimensional image of the segmented target object is consistent with that of the three-dimensional image to be segmented;

the size enlargement includes one of upsampling, deconvolution, and interpolation.

In a second aspect, an embodiment of the present application provides an apparatus for segmenting a target object in a three-dimensional image, including:

the branch feature extraction module is used for respectively extracting features of three-dimensional images of a plurality of modal groups to be segmented according to a plurality of branch feature three-dimensional networks of the multi-channel three-dimensional network model to obtain branch feature maps of a plurality of branches; the multi-channel three-dimensional network model comprises a branch characteristic three-dimensional network group, a fusion characteristic three-dimensional network and a size amplification three-dimensional network which are sequentially cascaded; the branch characteristic three-dimensional network group comprises a plurality of parallel branch characteristic three-dimensional networks;

the characteristic fusion module is used for extracting and fusing the characteristics of the branch characteristic graphs of the branches according to the fusion characteristic three-dimensional network to obtain a fusion characteristic graph;

and the fusion amplification module is used for carrying out fusion and size amplification on the fusion characteristic diagram and the branch characteristic diagrams of the plurality of branches according to the size amplification three-dimensional network to obtain a three-dimensional image of the segmented target object.

Optionally, the feature fusion module comprises:

the feature fusion unit is used for fusing the branch feature maps of the branches according to the primary fusion feature three-dimensional sub-network, and extracting features of the fused branch feature maps of the branches to obtain a primary fusion feature map in the fusion feature map;

and the feature extraction unit is used for extracting features of the primary fusion feature map according to the secondary fusion feature three-dimensional sub-network to obtain a secondary fusion feature map in the fusion feature map.

Optionally, the feature fusion unit is specifically configured to convolve the branch feature maps of the branches according to the fusion three-dimensional convolution block, and fuse the branch feature maps after convolution of the branches to obtain an original fusion feature map; according to at least one three-dimensional volume block, extracting the characteristics of the original fusion characteristic graph to obtain a primary fusion characteristic graph; the first-level blend feature three-dimensional subnetwork includes a cascaded blend three-dimensional volume block and at least one three-dimensional volume block.

Optionally, the feature fusion unit is specifically configured to convolve the branch feature maps of the branches respectively according to the three-dimensional convolution layers in the three-dimensional convolution units of the fused three-dimensional convolution block, where the three-dimensional convolution units are parallel to each other; and connecting the branch characteristic diagrams after the convolution of each branch in the channel dimension according to the connecting layers in the fused three-dimensional convolution blocks to obtain an original fused characteristic diagram, wherein the connecting layers are cascaded behind each three-dimensional convolution unit.

Optionally, the feature fusion unit is further configured to, before performing a connection operation on the branch feature maps after the convolution of the branches in the channel dimension, sequentially perform normalization and nonlinear processing on the branch feature maps after the convolution of the branches according to a batch normalization layer and an activation function layer that are sequentially cascaded in each three-dimensional convolution unit fusing the three-dimensional convolution blocks; connecting the branch characteristic diagrams of the branches sequentially subjected to convolution, normalization and nonlinear processing on channel dimensions according to the connection layer to obtain a connection characteristic diagram; and smoothing the connection characteristic diagram according to the three-dimensional convolution layer in the fused three-dimensional convolution block to obtain an original fused characteristic diagram.

Optionally, the feature fusion unit is specifically configured to, for each cascaded three-dimensional graph convolution network block, use an original fusion feature graph or an intermediate fusion feature graph output by a previous three-dimensional graph convolution network block as an input feature graph; according to the two-dimensional convolution unit branches of the three-dimensional graph convolution network block, respectively extracting the features of the input feature graph in a second dimension and a third dimension to obtain the feature graphs of the second dimension and the third dimension of each branch; according to the parallel connection block of the three-dimensional graph convolution network block, parallel connection is carried out on the second dimension characteristic graph and the third dimension characteristic graph of each branch; according to a first dimension convolution unit of the three-dimensional graph convolution network block, performing feature extraction on a second dimension and a third dimension feature graph obtained by parallel connection on a first dimension, and overlapping input feature graphs to obtain and output an intermediate fusion feature graph or a first-level fusion feature graph corresponding to the three-dimensional graph convolution network block;

the two-level fusion characteristic three-dimensional sub-network comprises cascaded stepping three-dimensional volume blocks and at least one three-dimensional volume block; and/or, the at least one three-dimensional convolution block comprises a plurality of three-dimensional convolution blocks in cascade; the three-dimensional convolution block includes: at least one of a deep stereo dense network block and a three-dimensional graph convolution network block; the three-dimensional graph convolution network block comprises two-dimensional convolution unit branches connected in parallel, a parallel connection block cascaded behind the two-dimensional convolution unit branches, and a first-dimensional convolution unit cascaded behind the parallel connection block; the two-dimensional convolution unit branch comprises a second-dimensional convolution unit and a third-dimensional convolution unit which are cascaded, or a third-dimensional convolution unit and a second-dimensional convolution unit which are cascaded.

Optionally, the branch feature extraction module includes:

the first-level branch feature extraction unit is used for performing feature extraction on the three-dimensional image of each modal group to be segmented according to the first-level branch feature three-dimensional sub-network in each branch feature three-dimensional network to obtain a first-level branch feature map in the branch feature map of the branch; the branch characteristic three-dimensional network comprises a first-level branch characteristic three-dimensional sub-network and a second-level branch characteristic three-dimensional sub-network which are connected in cascade;

and the secondary branch feature extraction unit is used for extracting the features of the primary branch feature graph according to the secondary branch feature three-dimensional sub-network in the branch feature network to obtain the secondary branch feature graph in the branch feature graph of the branch.

Optionally, the secondary branch feature extraction unit is specifically configured to extract features of the primary branch feature map in a first two-dimensional manner according to the first two-dimensional convolution unit to obtain a first two-dimensional feature map; performing primary superposition on the first two-dimensional feature map and the primary branch feature map; according to a second two-dimensional convolution unit, extracting the characteristics of the feature map after the first-level superposition on a second two-dimension to obtain a second two-dimensional feature map; performing two-stage superposition on the second two-dimensional feature map, the first two-dimensional feature map and the first-stage branch feature map; extracting the features of the feature map after the secondary superposition on a third two-dimension according to a third two-dimension convolution unit to obtain a third two-dimension feature map; superposing the third two-dimensional feature map, the first two-dimensional feature map and the first-level branch feature map to obtain a second-level branch feature map of the three-dimensional image to be segmented of the modal group;

the secondary branch characteristic three-dimensional sub-network comprises two-dimensional cascade dense blocks or cascade deep stereo residual error network blocks; the two-dimensional cascade dense block comprises a first two-dimensional convolution unit, a second two-dimensional convolution unit and a third two-dimensional convolution unit which are cascaded.

Optionally, the fusion amplification module comprises:

the first amplification superposition unit is used for performing channel number conversion and size amplification on the second-level fusion characteristic diagram according to the first size amplification superposition block to obtain a first-level size amplification image. The size amplification three-dimensional network comprises first to fourth size amplification superposition blocks which are cascaded;

the second amplification superposition unit is used for converting the channel number of the primary fusion characteristic diagram according to a second size amplification superposition block, superposing the converted channel number with the primary size enlarged image, and amplifying the size of the superposed three-dimensional image to obtain a secondary size enlarged image;

the third amplification superposition unit is used for superposing the secondary branch characteristic graphs of all the branches; according to the third size enlargement superposition block, after channel number conversion is carried out on the superposed secondary branch characteristic diagram, superposition is carried out on the superposed secondary branch characteristic diagram and the secondary size enlargement diagram, and size enlargement is carried out on the superposed three-dimensional image to obtain a tertiary size enlargement diagram;

the fourth amplification superposition unit is used for superposing the primary branch characteristic graphs of the branches; and according to the fourth size enlargement superposition block, performing channel number conversion on the superposed primary branch characteristic diagram, and superposing the converted primary branch characteristic diagram with the tertiary size enlargement diagram to obtain a three-dimensional image of the segmented target object.

Optionally, the apparatus for segmenting a target object in a three-dimensional image provided by the second aspect of the embodiment of the present application further includes:

the training module is used for pre-training to obtain a multi-channel three-dimensional network model by the following method: determining an extended sample set according to the original sample set; dividing a verification set and a training set from the extended sample set; carrying out preliminary training on the original multi-channel three-dimensional network model by using a training set to obtain a preliminarily trained multi-channel three-dimensional network model; carrying out verified target object segmentation on the verification set by using the preliminarily trained multi-channel three-dimensional network model to obtain a segmentation result; determining a three-dimensional image of each difficult sample according to the segmentation result; the problematic sample three-dimensional image comprises a sample three-dimensional image of which the area occupied by the target object is smaller than an area threshold value or the classification error rate of the target object is higher than an error rate threshold value; and training the preliminarily trained multi-channel three-dimensional network model according to the three-dimensional images of the difficult samples to obtain the selected multi-channel three-dimensional network model.

Optionally, the training module is specifically configured to perform iterative training on the preliminarily trained multi-channel three-dimensional network model according to the three-dimensional image of each problematic sample until a preset convergence condition is met; one of the iterative trainings includes: inputting a three-dimensional image of a current difficult sample into a multi-channel three-dimensional network model obtained by last training; outputting a current prediction result through a loss function layer of the multi-channel three-dimensional network model; determining the error between the current prediction result and the sample segmentation result corresponding to the current problematic sample three-dimensional image; reversely propagating the error to each hidden layer in the multi-channel three-dimensional network model, and calculating the gradient of the error reversely propagated to each hidden layer; and updating parameters of all hidden layers in the multi-channel three-dimensional network model according to the gradient to obtain the multi-channel three-dimensional network model obtained by the iterative training.

In a third aspect, an electronic device is provided, which includes:

a processor, a memory, and a bus;

the bus is used for connecting the processor and the memory;

the memory is used for storing operation instructions;

the processor is configured to execute any one of the segmentation methods for a target object in a three-dimensional image provided in the first aspect of the embodiments of the present application by calling the operation instruction.

In a fourth aspect, a computer-readable storage medium is provided, which stores at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method for segmenting a target object in a three-dimensional image according to any one of the first aspect of the embodiments of the present application.

The technical scheme provided by the embodiment of the application has the following beneficial effects:

in the embodiment of the application, the branch feature three-dimensional networks are equivalent to a plurality of input channels, receive three-dimensional images of a plurality of modal groups, and extract different morphological features of the same target object in the three-dimensional images of different modal groups; according to the fusion characteristic three-dimensional network, different morphological characteristics of the same target object in three-dimensional images of different modal groups are fused and further extracted, different morphological characteristics of the target object in three-dimensional images of different modalities are comprehensively utilized, the type and edge identification precision of the target object are greatly improved, and therefore the segmentation precision of the target object is improved; according to the size amplification three-dimensional network, the branch feature map of the shallow layer with higher resolution and the fusion feature map of the deep layer with richer and more prominent features (namely semantic information) are fused, so that the identification precision of the category of the target object can be improved according to the rich features, and the identification precision of the edge of the target object can be improved according to the high resolution, and the segmentation precision of the target object in the three-dimensional image is integrally improved.

Optionally, in the training method of the multi-channel three-dimensional network model provided in the embodiment of the present application, the multi-channel three-dimensional network model obtained through preliminary training is further trained according to the difficult sample three-dimensional image, so that the performance of the multi-channel three-dimensional network model on the segmentation accuracy and the like of the target object is improved; in addition, an auxiliary weighting loss function is added in a loss function layer of the multi-channel three-dimensional network model, and the classification precision of the multi-channel three-dimensional network model on the target object is improved, so that the performances such as the segmentation precision of the multi-channel three-dimensional network model are integrally improved.

Optionally, in the embodiment of the present application, the electronic device convolves the input three-dimensional feature map in three different two-dimensional directions (i.e., direction axes) according to the two-dimensional cascade dense blocks, which is equivalent to extracting features of the three-dimensional feature map in each dimension, and compared with directly performing three-dimensional convolution, the amount of computation is greatly reduced; and the feature maps of a plurality of layers are fused through superposition processing by intensive operation, which is equivalent to fusing the feature maps of each two-dimensional dimension, so that the feature information of the target object in the feature maps is greatly enriched, the accuracy of the edge information of the target object is improved, the quality of the feature maps is improved, and the segmentation accuracy of the target object is integrally improved.

Optionally, in the embodiment of the present invention, the electronic device performs convolution on the input feature map in multiple different dimensions by using multiple single-dimension convolution modules in different dimension directions, extracts features of the input feature map in each dimension, and superimposes the input feature map, so that on the basis of ensuring the feature extraction performance, the number of parameters of a three-dimensional map convolution network block is greatly reduced by using each single-dimension convolution module, thereby on the basis of integrally ensuring the performance of the multi-channel three-dimensional network model, the number of parameters of the multi-channel three-dimensional network model can be reduced, the processing speed and the training speed of the multi-channel three-dimensional network model can be increased, and the segmentation speed can be increased on the basis of ensuring the segmentation accuracy.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic structural diagram of a segmentation system for a target object in a three-dimensional image according to an embodiment of the present disclosure;

fig. 2 is a flowchart illustrating a method for segmenting a target object in a medical image according to an embodiment of the present application;

fig. 3 is a schematic diagram of a frame structure of a multi-channel three-dimensional network model provided in an embodiment of the present application;

fig. 4a is a schematic diagram of a sample three-dimensional image of each modality provided by an embodiment of the present application;

fig. 4b is a schematic diagram of a sample three-dimensional image of each modality after preprocessing provided by the embodiment of the present application;

FIG. 5 is a schematic diagram of a frame structure of a multi-channel three-dimensional network model in a training process according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a one-to-two level fused feature three-dimensional sub-network according to an embodiment of the present disclosure;

FIG. 7 is a diagram illustrating a specific structure of a multi-channel three-dimensional network model according to an embodiment of the present application;

fig. 8 is a schematic diagram of an internal structure of a VoxRes block according to an embodiment of the present application;

FIG. 9 is a schematic diagram of another specific structure of a multi-channel three-dimensional network model provided in an embodiment of the present application;

fig. 10 is a schematic flowchart of another method for segmenting a target object in a three-dimensional image according to an embodiment of the present application;

fig. 11 is a schematic diagram of a specific internal structure and a working principle of a 2D cascaded sense block according to an embodiment of the present application;

FIG. 12a is a diagram illustrating a specific internal structure of a fused three-dimensional volume block according to an embodiment of the present application;

FIG. 12b is a schematic diagram illustrating the operation of the connection layer in the fused three-dimensional volume block according to the embodiment of the present application;

fig. 13 is a schematic diagram of a specific internal structure and a working principle of a 3D GCN block according to an embodiment of the present application;

FIG. 14 is a schematic diagram of the parallelconcat block according to the embodiment of the present application;

FIG. 15a is a diagram illustrating a specific internal structure of a fusion block according to an embodiment of the present application;

FIG. 15b is a schematic diagram illustrating an example of feature maps of various stages obtained by processing a multi-channel three-dimensional network model according to an embodiment of the present disclosure;

FIG. 16 is a schematic diagram of a three-dimensional image of brain pathology and a lesion of multiple modality groups to be segmented according to an embodiment of the present application;

fig. 17 is a schematic diagram of an internal structure of an apparatus for segmenting a target object in a three-dimensional image according to an embodiment of the present application;

fig. 18 is a schematic internal structural diagram of another segmentation apparatus for a target object in a three-dimensional image according to an embodiment of the present application;

fig. 19 is a schematic internal structure diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Several terms referred to in this application will first be introduced and explained:

ISLES (Ischemic stroke lesion) refers to hemiplegia and disturbance of consciousness caused by cerebral infarction and cerebral artery occlusion based on cerebral thrombosis or cerebral thrombosis.

MRI (nuclear Magnetic Resonance Imaging) is one of medical images.

CT (Computed Tomography), one of the medical images, is used for examination of various diseases.

Instance Segmentation means that each object in a picture is segmented separately and its category information is given.

Interactive Segmentation refers to Segmentation of objects in a picture through human-computer interaction.

A full convolution network is a convolution network commonly used in image segmentation, and is completely composed of a convolution layer and a pooling layer.

Feature map, refers to a Feature map obtained by convolving an image with a filter. The Feature map may be convolved with a filter to generate a new Feature map.

The inventor of the application finds that three-dimensional images in different modes can be obtained by adopting different imaging modes, and the same target object has different morphological characteristics in the three-dimensional images in different modes. For example, the morphology of the same lesion under CT and MRI images is distinctive. However, the method for segmenting the target object in the three-dimensional image in the prior art often adopts a single-channel input mode, and can only input the three-dimensional image in one modality, and can only segment the target object by using the morphological characteristics of the target object in the three-dimensional image in one modality, but cannot segment the target object by using the morphological characteristics of the same target object in the three-dimensional images in multiple modalities, so that the segmentation accuracy of the target object in the three-dimensional image is low.

The target object includes one of a tissue, an internal organ, and a lesion. The focus refers to a part of the body where a lesion occurs, or a limited lesion tissue with pathogenic microorganisms in the body. For example, a certain part of the lung is destroyed by tubercle bacillus, which is the focus of tuberculosis; for example, brain tumors or brain stroke lesions, etc.

The inventor of the present application further finds that, in the segmentation method of a target object in a three-dimensional image in the prior art, the adoption of more three-dimensional (3D) convolution modules results in huge parameters of the whole model, greatly increases the calculation burden, and results in a slow segmentation speed and a long time consumption.

The application provides a method and a device for segmenting a target object in a three-dimensional image and electronic equipment, and aims to solve the technical problems in the prior art.

The following describes the technical solution of the present application and how to solve the above technical problems in detail by specific embodiments. These several specific embodiments may be combined with each other below, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

An embodiment of the present application provides a segmentation system for a target object in a three-dimensional image, as shown in fig. 1, the segmentation system includes: terminal equipment and electronic equipment.

The terminal device is electrically connected with the electronic device through a network. The network in the embodiment of the present application may include at least one of the internet and a mobile communication network; the internet may include a local area network.

The terminal device has networking, input/output and display functions, and for example, the terminal device may be a desktop computer, a smart phone, a tablet computer or the like. The terminal device may also have a function of acquiring a three-dimensional image, such as a CT (Computed Tomography) instrument or an MRI (Magnetic Resonance Imaging) instrument.

The terminal equipment can be accessed to the network through a local area network or a mobile communication network.

For example, the terminal device may access the internet through a WiFi (Wireless Fidelity) local area network.

For another example, the terminal device may access the internet through a mobile communication network such as 3rd-Generation wireless telephone technology (3 rd-Generation wireless telephone technology, third Generation mobile communication technology), long Term Evolution (LTE), and the like.

The electronic device may be at least one of a single server, a cluster of servers, and a distributed server.

The electronic device may access the network in a wired manner. For example, the electronic equipment accesses the wide area network or backbone network of the internet through fiber optics.

Optionally, in the segmentation system of the target object in the three-dimensional image according to the embodiment of the present application, the terminal device may be a single terminal device, and the terminal device is configured to send the acquired three-dimensional image to the electronic device; the electronic device is configured to implement a segmentation method for a target object (example) in a three-dimensional image provided subsequently in the embodiment of the present application (specific methods are introduced in detail later, and are not described herein), and output an image including each independent target object to the terminal device; the terminal device displays the image.

Alternatively, in the segmentation system for the target object in the three-dimensional image according to the embodiment of the present application, the terminal device may be multiple, and the difference from the above-mentioned segmentation system including only a single terminal device is that the electronic device may output (equivalent to return) the image including each independent target object to the terminal device providing the three-dimensional image or to another terminal device according to the actual situation.

Based on the same inventive concept, the present application provides a method for segmenting a target object in a medical image, as shown in fig. 2, the method includes the following steps:

s201: and respectively extracting the features of the three-dimensional images of the plurality of modal groups to be segmented according to the plurality of branch feature three-dimensional networks of the multi-channel three-dimensional network model to obtain branch feature maps of the plurality of branches.

Specifically, according to a plurality of branch feature three-dimensional networks of a multi-channel three-dimensional network model, the electronic device performs feature extraction on three-dimensional images of a plurality of modality groups to be segmented respectively to obtain branch feature maps of a plurality of branches.

In the application, the multi-channel three-dimensional network model comprises a branch characteristic three-dimensional network group, a fusion characteristic three-dimensional network and a size amplification three-dimensional network which are sequentially cascaded; the branch characteristic three-dimensional network group comprises a plurality of parallel branch characteristic three-dimensional networks.

Optionally, a schematic diagram of a frame structure of the multichannel three-dimensional network model of the present application is shown in fig. 3, where the branch feature three-dimensional network group includes a 1 st branch feature three-dimensional network to an nth branch feature three-dimensional network that are parallel to each other; n is a positive integer greater than 1.

S202: and according to the fusion characteristic three-dimensional network, performing characteristic extraction and fusion on the branch characteristic graphs of the multiple branches to obtain a fusion characteristic graph.

Specifically, the electronic device performs feature extraction and fusion on the branch feature maps of the multiple branches according to the fusion feature three-dimensional network to obtain a fusion feature map.

Optionally, the three-dimensional images of the plurality of modality groups include medical three-dimensional images of three modality groups.

Optionally, the medical three-dimensional image includes a CT three-dimensional image and an MRI three-dimensional image, the CT three-dimensional image may be synthesized from the CT image, and the MRI three-dimensional image may be synthesized from the MRI image. The MRI images include images of a 4PDWI (4 Perfusion Diffusion Weighted Imaging) modality, a CBF (Cerebral Blood Flow) modality, a CBV (Cerebral Blood Flow Volume) modality, a MTT (contrast medium Mean Transit Time) modality, and a Tmax (Cerebral Blood Time max) modality, and the images of the 4PDWI modality include images of 3 sub-4 PDWI modalities.

Optionally, the medical three-dimensional images of the three modality groups include: CT and 4PDWI three-dimensional images of a first modality group, CBF and CBV three-dimensional images of a second modality group and MTT and Tmax three-dimensional images of a third modality group; the target object includes one of a tissue, an internal organ, and a lesion.

S203: and according to the size amplification three-dimensional network, fusing and size-amplifying the fused feature map and the branch feature maps of the multiple branches to obtain a three-dimensional image of the segmented target object.

Specifically, the electronic device of the application performs fusion and size enlargement on the fusion feature map and the branch feature maps of the plurality of branches according to the size enlargement three-dimensional network to obtain a three-dimensional image of the segmented target object. The target object in this application is three-dimensional.

Alternatively, the target object in the present application includes one of a tissue, an internal organ, and a lesion. The lesion refers to a diseased part of the body or a limited diseased tissue with pathogenic microorganisms in the body. For example, a certain part of the lung is destroyed by tubercle bacillus, which is the focus of tuberculosis; for example, brain tumors or brain stroke lesions, etc. The nucleus includes the nuclei of various organs or tissues of the body. For example, the cell nucleus of brain tumor, and the cell nucleus of focus of gastric cancer, rectal cancer, breast cancer, etc.

Optionally, the size enlargement in the present application comprises one of upsampling, deconvolution, and interpolation

In the embodiment of the application, the branch feature three-dimensional networks are equivalent to a plurality of input channels, receive three-dimensional images of a plurality of modal groups, and extract different morphological features of the same target object in the three-dimensional images of different modal groups; according to the fusion characteristic three-dimensional network, different morphological characteristics of the same target object in three-dimensional images of different modal groups are fused and further extracted, different morphological characteristics of the target object in three-dimensional images of different modalities are comprehensively utilized, the type and edge identification precision of the target object are greatly improved, and therefore the segmentation precision of the target object is improved; according to the size-enlarged three-dimensional network, the branch feature map of the shallow layer with higher resolution and the fusion feature map of the deep layer with richer and more prominent features (namely semantic information) are fused, so that the identification precision of the target object category can be improved according to the abundant features, and the identification precision of the target object edge can be improved according to the high resolution, and the segmentation precision of the target object in the three-dimensional image is integrally improved.

In the present application, unless otherwise specified, feature maps related to a multi-channel three-dimensional network model are all three-dimensional feature maps, and even though they are abbreviated as feature maps, they should be regarded as three-dimensional feature maps.

Another embodiment of the present application is described below, which provides another possible implementation manner of the segmentation method for the target object in the three-dimensional image.

Optionally, the multi-channel three-dimensional network model in the application is obtained by training the electronic device in the embodiment of the present invention before implementing the method for segmenting the target object in the three-dimensional image.

The following describes a training method of a multi-channel three-dimensional network model according to an embodiment of the present invention.

Optionally, the electronic device of the present application determines the extended sample set according to the original sample set. Specifically, the electronic device of the present application obtains an original sample set, and performs preprocessing (expansion) based on the original sample set to obtain an expanded sample set.

For example, the electronic device obtains the published ISLES2017 and ISLES2018 ischemic stroke lesion segmentation datasets as the raw sample set for training. The original sample set includes a three-dimensional image of the medical sample and a marked lesion. The medical sample three-dimensional image includes sample three-dimensional images of CT, 4PDWI, CBF, CBV MTT and Tmax modalities, as shown in fig. 4a, from left to right are sample three-dimensional images of CT, MR (abbreviation of MRI) _4PDWI, MR _ CBF, MR _ CBV, MR _ MTT and MR _ Tmax modalities, respectively.

The electronic device pre-processes the sample medical three-dimensional image in the original sample set.

Specifically, because there is a large gray difference between voxels of the brain region (i.e., voxels) and voxels of other regions (i.e., regions other than the brain region or the brain region) in the sample three-dimensional image of the CBV modality, the electronic device may extract the brain region (i.e., brain region foreground for short) and the background (i.e., other regions) as a foreground from the sample three-dimensional image of the CBV modality by using a gray threshold algorithm; and according to the extracted brain region foreground and background, separating the brain region foreground and background of the sample three-dimensional image in other modes.

The electronic equipment performs color histogram equalization operation on the brain region of the sample three-dimensional image of each modality; overturning each equalized sample three-dimensional image, wherein the overturning comprises respectively rotating the sample three-dimensional images by 90 degrees, 180 degrees and 270 degrees by taking a central point as an axis; and performing downsampling on each inverted sample three-dimensional image (for example, sampling the sample three-dimensional image by one fourth of the original size), so as to obtain a preprocessed sample three-dimensional image of each modality, as shown in fig. 4 b. And randomly sampling the three-dimensional image of each sample 100 times, wherein the sample set is expanded to be 100 times of the original sample set at the moment and is taken as an expanded sample set.

Optionally, in the embodiment of the present application, the training process of the multi-channel three-dimensional network model includes: the loss function layers after the three-dimensional network is scaled up are cascaded as shown in fig. 5.

In the embodiment of the application, the electronic device initializes the multi-channel three-dimensional network model in a Kaiming norm (Caesamin normalization) mode to obtain an original multi-channel three-dimensional network model. Specifically, for the Batch normalization weight parameter in the multi-channel three-dimensional network model is set to 1, the bias parameter is set to 0.

Optionally, the specific training process of the multi-channel three-dimensional network model of the present application is divided into a first stage and a second stage.

In the first stage, the electronic device performs K-fold cross validation on the original multichannel three-dimensional network model according to the extended sample set, where K is a positive integer, for example, 5-fold cross validation, and obtains the preliminarily trained multichannel three-dimensional network model and a validation segmentation result.

Specifically, the electronic device divides a verification set and a training set from an extended sample set; carrying out preliminary training on the original multi-channel three-dimensional network model by using a training set to obtain a preliminarily trained multi-channel three-dimensional network model; and carrying out verified target object segmentation on the verification set by using the preliminarily trained multi-channel three-dimensional network model to obtain a segmentation result.

In the second stage, the electronic equipment determines three-dimensional images of all difficult samples according to the segmentation result; and training the preliminarily trained multi-channel three-dimensional network model according to the three-dimensional images of the difficult samples to obtain the selected multi-channel three-dimensional network model. Optionally, the problematic sample three-dimensional image includes a sample three-dimensional image in which the area occupied by the target object is smaller than an area threshold or the classification error rate of the target object is higher than an error rate threshold.

Optionally, in this embodiment of the present application, the loss function layer of the multi-channel three-dimensional network model includes a cross entropy function and an auxiary loss function.

The auxiarily loss function is also called a BF-loss function, and the expression is expressed as an auxiliary loss for providing foreground and background distance supervision in the embodiment of the present application, and is used for assisting the cross entropy to classify and judge positive and negative samples (the positive sample refers to a sample three-dimensional image containing a target object, and the negative sample refers to a sample three-dimensional image not containing the target object), as shown in the following expression:

loss _foregroud ＝||mean(gt×prob _foregroud )-mean(gt)|| ₂ expression (1)

loss _backgroud ＝||mean((1-gt)×prob _backgroud )-mean(1-gt)|| ₂ Expression (2)

loss _bf ＝loss _foregroud +loss _background Expression (3)

In the above expression, gt represents the actual lesion area (ground route), prob _foreground Foreground probability map, prob, representing a prediction _background Background probability map, loss, representing a prediction _bf The method is characterized by comprising loss functions of a foreground and a background, supervised learning is carried out on a prediction region (a predicted target object region), and expressions (1) and (2) adopt a two-norm expression mode.

Optionally, in this embodiment of the application, the training, by the electronic device, of the preliminarily trained multi-channel three-dimensional network model according to the three-dimensional image of each problematic sample to obtain the selected multi-channel three-dimensional network model includes:

and performing iterative training on the preliminarily trained multi-channel three-dimensional network model according to the three-dimensional images of the difficult samples until a preset convergence condition is met. For example, the electronic device performs iterative training using an Adam (Adaptive moment estimation) based gradient descent method.

The training for one time in the iterative training of the preliminarily trained multi-channel three-dimensional network model comprises the following steps: inputting a three-dimensional image of a current difficult sample into a multi-channel three-dimensional network model obtained by last training; outputting a current prediction result through a loss function layer of the multi-channel three-dimensional network model; determining the error between the current prediction result and the sample segmentation result corresponding to the current problematic sample three-dimensional image; reversely propagating the error to each hidden layer in the multi-channel three-dimensional network model, and calculating the gradient of the error reversely propagated to each hidden layer; and updating parameters of all hidden layers in the multi-channel three-dimensional network model according to the gradient, such as updating parameters w and bias parameters b of all convolution modules, so as to obtain the multi-channel three-dimensional network model obtained by the iterative training.

According to the training method of the multi-channel three-dimensional network model, the multi-channel three-dimensional network model obtained through preliminary training is further trained according to the difficult sample three-dimensional image, and the performances of the multi-channel three-dimensional network model such as the segmentation precision of a target object are improved; moreover, an auxiary loss function is added in a loss function layer of the multi-channel three-dimensional network model, and the classification precision of the multi-channel three-dimensional network model on the target object is improved, so that the performances of the multi-channel three-dimensional network model such as the segmentation precision and the like are integrally improved.

The internal structure of the multi-channel three-dimensional network model according to the embodiment of the present application is described below.

Optionally, as shown in fig. 3, in the multi-channel three-dimensional network model according to the embodiment of the present application, the branch feature three-dimensional network includes a first-level branch feature three-dimensional sub-network and a second-level branch feature three-dimensional sub-network which are cascaded; the three-dimensional network of the fusion characteristics comprises a first-level three-dimensional sub-network of the fusion characteristics and a second-level three-dimensional sub-network of the fusion characteristics which are cascaded; the size amplification three-dimensional network comprises a first amplification superposition block, a second amplification superposition block, a third amplification superposition block and a fourth size amplification superposition block which are cascaded.

Optionally, in this embodiment of the present application, the primary branch feature three-dimensional sub-network includes cascaded three-dimensional volume blocks; the two-stage branch characteristic three-dimensional sub-network comprises a 2D cascade density block or a cascade VoxRes block (deep Voxelwise residual network block).

Optionally, as shown in fig. 6, the first-level fused feature three-dimensional sub-network includes a cascaded fused three-dimensional volume block and at least one three-dimensional volume block; the secondary fused feature three-dimensional subnetwork comprises a cascade of stepping three-dimensional volume blocks and at least one three-dimensional volume block.

Optionally, in this embodiment of the present application, the at least one three-dimensional convolution block includes a plurality of cascaded three-dimensional convolution blocks; the three-dimensional convolution block includes: at least one of a VoxDense (deep stereo Dense Network) block and a 3D GCN (three-Dimensional Graph volume Network) block.

Optionally, the last three-dimensional convolution blocks in each one-level branch feature three-dimensional sub-network are connected with each other and connected to a fourth size amplification superposition block; the 2D cascade Dense blocks or the last VoxRes block in each two-stage branch characteristic three-dimensional sub-network are mutually connected and connected to a third amplification superposition block; and the last three-dimensional volume block in the first-level fusion feature three-dimensional sub-network is connected to the second amplification superposition block.

Optionally, fig. 7 is a schematic diagram of a specific structure of the multi-channel three-dimensional network model according to the embodiment of the present application.

As shown in fig. 7, the multi-channel three-dimensional network model includes 3 branch feature three-dimensional networks, which are respectively used for receiving a medical three-dimensional image of a first CT +4PDWI modality, a medical three-dimensional image of a second CBF + CBV modality, and a medical three-dimensional image of a third MTT + Tmax modality; since 4PDWI actually contains 3 sub-modalities, this branch provides 4 channels for 4 modalities; the other branches are all 2 channels for 2 modes.

Each primary branch-characteristic three-dimensional subnetwork comprises two concatenated Conv (three-dimensional convolution) blocks, the last Conv block of the first branch (i.e. the first branch-characteristic network) being connected to the last Conv block of the second branch, which is connected to the last Conv block of the third branch.

Each two-stage branch feature three-dimensional sub-network comprises two cascaded VoxRes blocks. Fig. 8 is a schematic diagram of an internal structure of the VoxRes block. The last VoxRes block of the first branch is connected to the last VoxRes block of the second branch, and the last VoxRes block of the second branch is connected to the last VoxRes block of the third branch.

The primary fused feature three-dimensional subnetwork comprises a cascaded Conv, stride =2 (i.e. fused three-dimensional convolution) Block and two VoxDense blocks (blocks); the two-level fusion feature three-dimensional sub-network comprises Conv, stride =2 (i.e. a stepped three-dimensional volume Block) and two VoxDense blocks (blocks) in cascade.

In fig. 7, two rectangles at x 8 represent a first enlarged superposition block in the size enlarged three-dimensional network, two rectangles at x 4, two rectangles at x 2, and two rectangles at x 1 similarly represent a second enlarged superposition block, a third enlarged superposition block, and a fourth enlarged superposition block, respectively, and the first to fourth enlarged superposition blocks are sequentially cascaded. The last Conv Block in the third branch is connected to the fourth size amplification superposition Block, the last VoxRes Block in the third branch is connected to the third amplification superposition Block, and the last VoxSense Block in the first-level fusion feature three-dimensional sub-network is connected to the second amplification superposition Block.

In fig. 7, the lowermost rectangle represents a loss function layer used only during training. Softmax loss represents the cross entropy loss function and auxiliary loss represents the auxiliary weighted loss function.

Optionally, fig. 9 is a schematic diagram of another specific structure of the multi-channel three-dimensional network model according to the embodiment of the present application. Compared with the specific structure of fig. 7, the specific structure of the multichannel three-dimensional network model of fig. 9 is different in that a 2D cascaded density block is used instead of two VoxRes blocks cascaded in fig. 7, and a 3D GCN block is used instead of the VoxDense block in fig. 7.

Another method for segmenting a target object in a three-dimensional image according to an embodiment of the present application is described below based on a multi-channel three-dimensional network model, and a flow diagram of the method is shown in fig. 10, and includes the following steps:

s1001: and the electronic equipment performs feature extraction on the three-dimensional image of each modal group to be segmented according to the primary branch feature three-dimensional sub-network in each branch feature three-dimensional network to obtain a primary branch feature map in the branch feature map of the branch.

For example, as shown in fig. 7 or fig. 9, the first-level branch-feature three-dimensional sub-network in each branch-feature three-dimensional network includes two Conv blocks (three-dimensional volume blocks) in cascade.

And the electronic equipment performs feature extraction on the three-dimensional images of the CT mode and the 4PDWI mode in the first mode group to be segmented according to the two Conv blocks cascaded in the first primary branch feature three-dimensional sub-network to obtain a primary branch feature map of the first branch.

And the electronic equipment performs feature extraction on the three-dimensional images of the CBF mode and the CBV mode in the second mode group to be segmented according to the two Conv blocks cascaded in the second primary branch feature three-dimensional sub-network to obtain a primary branch feature map of the second branch.

And the electronic equipment performs feature extraction on the three-dimensional images of the MTT mode and the Tmax mode in the third mode group to be segmented according to the two Conv blocks cascaded in the third primary branch feature three-dimensional sub-network to obtain a primary branch feature map of the third branch.

S1002: and the electronic equipment performs feature extraction on the primary branch feature diagram according to the secondary branch feature three-dimensional sub-network in the branch feature network to obtain a secondary branch feature diagram in the branch feature diagram of the branch.

Optionally, the two-stage branch feature three-dimensional sub-network comprises a 2D cascaded sense block or a cascaded VoxRes block.

Optionally, the electronic device performs feature extraction on the primary branch feature map of the branch according to a 2D cascaded sense block or a cascaded VoxRes block in each secondary branch feature three-dimensional sub-network, so as to obtain a secondary branch feature map of the branch.

The internal structure and the working principle of the 2D cascaded Dense block are described below.

Optionally, the 2D cascaded density block comprises a first two-dimensional convolution unit, a second two-dimensional convolution unit and a third two-dimensional convolution unit in cascade.

Optionally, the electronic device extracts, according to a first two-dimensional convolution unit in the 2D cascaded density block, a feature of the first-level branch feature map in a first two-dimensional direction to obtain a first two-dimensional feature map.

The electronic equipment performs primary superposition on the first two-dimensional feature map and the primary branch feature map; and according to a second two-dimensional convolution unit, extracting the characteristics of the characteristic diagram after the first-level superposition on a second two-dimension to obtain a second two-dimensional characteristic diagram.

The electronic equipment carries out two-stage superposition on the second two-dimensional feature map, the first two-dimensional feature map and the first-stage branch feature map; and extracting the features of the feature map after the secondary superposition on the third two dimension according to a third two-dimension convolution unit to obtain a third two-dimension feature map.

And the electronic equipment superposes (sum) the third two-dimensional feature map, the first two-dimensional feature map and the first-level branch feature map to obtain a second-level branch feature map of the three-dimensional image to be segmented of the mode group.

For example, fig. 11 shows a specific internal structure and operation principle of the 2D cascaded density block. As shown in fig. 11, a rectangular box in the 2D cascaded sense block represents a convolution unit, and the rectangular box represents the convolution unit in a form of Z × X × Y in the form of Z × X × Y, which respectively senses the field in the Z axis, the X axis, and the Y axis; a 1 × 3 × 3 convolution unit, a 3 × 1 × 3 convolution unit, and a 3 × 3 × 1 convolution unit are cascaded in sequence; the 1 × 3 × 3 convolution unit is a first two-dimensional (i.e., XY dimension) convolution unit with 1 pixel, 3 pixels and 3 pixels in the Z-axis, X-axis and Y-axis receptive fields, respectively; the 3 × 1 × 3 convolution unit is a second two-dimensional (i.e., YZ dimension) convolution unit with 3 pixels, 1 pixel and 3 pixels in the Z-axis, X-axis and Y-axis receptive fields, respectively; the 3 × 3 × 1 convolution unit is a third two-dimensional (i.e., ZX-dimension) convolution unit with 3 pixels, 3 pixels and 1 pixel in the Z-axis, X-axis and Y-axis receptive fields, respectively; an input image input (not shown) is transmitted to the output ends of the 1 × 3 × 3 convolution unit, the 3 × 1 × 3 convolution unit and the 3 × 3 × 1 convolution unit respectively in the form of arrow lines for superposition; the output results of the 1 × 3 × 3 convolution units are respectively delivered to the output ends of the 3 × 1 × 3 convolution units and the 3 × 3 × 1 convolution units for superposition.

In the embodiment of the application, the electronic equipment convolutes the input three-dimensional characteristic diagram in three different two-dimensional directions (namely direction axes) according to the 2D cascade Dense block, which is equivalent to extracting the characteristics of the three-dimensional characteristic diagram in each dimension, and greatly reduces the calculated amount compared with the direct three-dimensional convolution; and moreover, the feature maps of multiple layers are fused through superposition processing by dense operation, which is equivalent to fusing the feature maps of all two-dimensional dimensions, so that the feature information of the target object in the feature maps is greatly enriched, the accuracy of the edge information of the target object is improved, the quality of the feature maps is improved, and the segmentation accuracy of the target object is integrally improved.

S1003, carrying out: and the electronic equipment fuses the branch feature maps of the branches according to the primary fusion feature three-dimensional sub-network, and performs feature extraction on the fused branch feature maps of the branches to obtain a primary fusion feature map in the fusion feature map.

Optionally, the first-level fused feature three-dimensional sub-network of the embodiment of the present application includes a cascaded fused three-dimensional volume block and at least one three-dimensional volume block.

Optionally, the electronic device convolves the branch feature maps of the branches respectively according to the fused three-dimensional convolution blocks, and fuses the branch feature maps after convolution of the branches to obtain an original fused feature map; and according to at least one three-dimensional volume block, performing feature extraction on the original fusion feature map to obtain a primary fusion feature map. The first-level fused feature three-dimensional sub-network comprises a cascade of fused three-dimensional volume blocks and at least one three-dimensional volume block.

Optionally, the electronic device convolves the branch feature maps of the branches respectively according to the three-dimensional convolution layers in the three-dimensional convolution units fused with the three-dimensional convolution blocks; and connecting the branch characteristic graphs after the convolution of each branch on the channel dimension according to the connecting layer in the fused three-dimensional convolution block to obtain the original fused characteristic graph. The fused three-dimensional convolution block includes a plurality of three-dimensional convolution units arranged in parallel, and a connection layer cascaded after each three-dimensional convolution unit. Each three-dimensional convolution unit comprises a three-dimensional convolution layer.

For example, fig. 12a shows one particular internal structure of a fused three-dimensional volume block. As shown in fig. 12a, a fused three-dimensional convolution block includes a plurality of three-dimensional convolution units connected in parallel, each of the three-dimensional convolution units includes a three-dimensional convolution (Conv) layer, a Batch Normalization (Batch Normalization) layer, and an activation function (ReLU) layer, which are cascaded in sequence; the fused three-dimensional convolution block further includes a connection (concatenate) layer cascaded after the activation function layer of each three-dimensional convolution unit and a three-dimensional convolution (Conv) layer cascaded after the connection layer. In each three-dimensional convolution unit, 64 of three-dimensional convolution (Conv) layers indicate that the layer has 64 convolution kernels, 3 × 3 × 3 indicates that the visual field of each convolution kernel in three dimensions (an X axis, a Y axis, and a Z axis) is divided into 3 pixels, and/2 indicates that the three-dimensional convolution layer extends 2 pixels outwards from the boundary of an input image in each dimension and then performs convolution.

Optionally, before connecting the branch feature maps after convolving the branches in the channel dimension, the electronic device further includes: and the electronic equipment sequentially performs normalization and nonlinear processing on the branch characteristic diagram after each branch convolution according to the batch normalization layer and the activation function layer which are sequentially cascaded in each three-dimensional convolution unit fused with the three-dimensional convolution block.

And the electronic equipment connects the branch feature maps after the convolution of the branches on the channel (channel) dimension according to the connection layer in the fused three-dimensional convolution block to obtain an original fused feature map, wherein the method comprises the following steps: the electronic equipment connects the branch characteristic diagrams of the branches sequentially subjected to convolution, normalization and nonlinear processing on a channel dimension according to the connection layer to obtain a connection characteristic diagram; and smoothing (smooth) the connection feature map according to the three-dimensional convolution layer in the fused three-dimensional convolution block to obtain an original fused feature map.

Connecting the branch characteristic graphs after convolution of the branches on the channel dimension, wherein the connection comprises the following steps:

before connection, the electronic device performs feature extraction (including 3D convolution) on the three-dimensional image of the first modality group, the three-dimensional image of the second modality group, and the three-dimensional image of the third modality group to be segmented respectively according to the three branch feature three-dimensional networks, so as to obtain a 3D stereoscopic branch feature map of a plurality of channels (channels) of each branch.

And the electronic equipment respectively convolves the branch characteristic diagrams of the branches according to the three-dimensional convolution layers in the three-dimensional convolution units in the three-dimensional convolution block to obtain the branch characteristic diagrams after convolution of the branches.

For example, as shown in fig. 12b, a CT and 4PDWI three-dimensional image of a first modality group to be segmented sequentially passes through a first branch feature three-dimensional network and convolution of three-dimensional convolution layers in a first three-dimensional convolution unit fusing three-dimensional convolution blocks, so as to obtain a branch feature map after convolution of 0 th to 31 th channels with a first branch channel number of 32 (i.e., channel = 32); sequentially convolving the CBF and CBV three-dimensional images of the second modal group to be segmented by a second branch feature three-dimensional network and a three-dimensional convolution layer in a second three-dimensional convolution unit fusing three-dimensional convolution blocks to obtain a convolved branch feature map of 0 th to 31 th channels with a second branch channel number of 32 (namely channel = 32); and the MTT and Tmax three-dimensional images of the third modal group to be segmented sequentially pass through the convolution of the three-dimensional convolution layer in the third three-dimensional convolution unit of the third branch feature three-dimensional network and the fusion three-dimensional convolution block to obtain the branch feature graph after convolution of the 0 th to 31 th channels with the third branch channel number of 32 (namely channel = 32).

The electronic device arranges and connects (conate) the branch feature maps after convolution of the channels of the branches into a connection feature map of a total channel of the total path according to a connection layer (conate) in the fused three-dimensional volume block. The total number of channels of the total is equal to the sum of the number of channels of the branches.

For example, as shown in fig. 12b, the electronic device arranges the convolved branch feature maps of the 0 th to 31 th channels of the first branch into the connection feature map of the 0 th to 31 th channels of the total channel connected into the total path according to the connection layer in the fused three-dimensional volume block; arranging the convolved branch feature maps of the 0 th to 31 th channels of the second branch into a connection feature map of the 32 th to 63 th channels in the total channel connected into the total path; and arranging the convolved branch characteristic diagrams of the 0 th to 31 th channels of the third branch into a connection characteristic diagram of the 64 th to 95 th channels in the total channels of the total path.

And the electronic equipment performs 3D convolution on the connection characteristic diagram of the total channel which is arranged and connected into the total path according to the three-dimensional convolution layer in the fused three-dimensional convolution block, so as to realize the conversion of the channel number. For example, as shown in fig. 12b, the electronic device performs channel transformation on the connection feature maps arranged and connected into 96 channels, and then performs 1 × 1 × 1 3D convolution to obtain a 128-channel 3D feature map as the original fusion feature map.

Optionally, the at least one three-dimensional convolution block in the first-level fused feature three-dimensional sub-network comprises a plurality of three-dimensional convolution blocks in cascade; the three-dimensional convolution block includes: at least one of a VoxSense block and a 3D GCN block.

Optionally, the 3D GCN block includes two-dimensional convolution unit branches connected in parallel, a parallel connection block cascaded after the two-dimensional convolution unit branches, and a first-dimensional convolution unit cascaded after the parallel connection block; the two-dimensional convolution unit branch comprises a second-dimensional convolution unit and a third-dimensional convolution unit which are cascaded, or a third-dimensional convolution unit and a second-dimensional convolution unit which are cascaded.

Optionally, the electronic device performs feature extraction on the original fusion feature map according to at least one three-dimensional volume block to obtain a primary fusion feature map, including:

the electronic device takes as an input feature map either the original fused feature map or an intermediate fused feature map output by a previous 3D GCN block for each 3D GCN block of the cascade.

And the electronic equipment respectively extracts the features of the input feature map in a second dimension and a third dimension according to the two-dimensional convolution unit branches of the 3D GCN block to obtain the feature maps of the second dimension and the third dimension of each branch.

The electronic device connects the second and third dimensional feature maps of each branch in parallel according to the parallel connection block of the 3D GCN block.

And the electronic equipment performs feature extraction on the second-dimensional feature map and the third-dimensional feature map which are obtained by parallel connection on the first dimension according to the first dimension convolution unit of the 3D GCN block, and superposes the input feature maps to obtain and output a middle fusion feature map or a first-level fusion feature map corresponding to the 3D GCN block.

In the embodiment of the invention, the electronic equipment convolutes the input feature map on a plurality of different dimensions by using a plurality of single-dimension convolution modules in different dimension directions, extracts the features of the input feature map on each dimension, and superposes the input feature map, so that the parameter number of the 3D GCN block is greatly reduced by using each single-dimension convolution module on the basis of ensuring the feature extraction performance, the parameter number of the multi-channel three-dimensional network model can be reduced on the basis of ensuring the performance of the multi-channel three-dimensional network model on the whole, the processing speed and the training speed of the multi-channel three-dimensional network model can be improved, and the segmentation speed can be improved on the basis of ensuring the segmentation precision.

For example, fig. 13 shows a specific internal structure and operation principle of the 3D GCN block. As shown in fig. 13, input represents an input feature map.

In fig. 13, one two-dimensional convolution unit branch includes a cascade of 1 × 7 × 1 convolution units and 1 × 1 × 7 convolution units; for the convolution unit with a Z multiplied by X multiplied by Y structure, the visual field in which dimension is larger than 1 is the convolution unit in which dimension; the 1 × 7 × 1 convolution unit is an X-axis convolution unit (belonging to the second convolution unit), the 1 × 1 × 7 convolution unit is a Y-axis convolution unit (belonging to the third convolution unit), and the two-dimensional convolution unit branches sequentially convolve the input feature map in the X-axis and Y-axis directions to extract features in the X-axis and Y-axis directions.

In fig. 13, another convolution unit branch includes a cascade of 1 × 7 × 1 convolution units and 1 × 1 × 7 convolution units, and performs convolution on the input feature map in the Y-axis and X-axis directions in sequence to extract features in the Y-axis and X-axis directions.

In fig. 13, parallel concat blocks are cascaded after the last convolution unit of two branches, and feature maps including X-axis and Y-axis features and feature maps including Y-axis and X-axis features, which are output by the two branches respectively, are connected in parallel.

In fig. 13, a 7 × 1 × 1 convolution unit is cascaded behind a parallel concatemer block, the 7 × 1 × 1 convolution unit is a Z-axis convolution unit (belonging to the first-dimension convolution unit), a parallel-connected feature map output by the parallel concatemer block is convolved in the Z-axis direction, a feature in the Z-axis direction is extracted, and then the feature is superimposed on an input feature map to obtain an intermediate fusion feature map or a primary fusion feature map corresponding to the 3D GCN block and output the intermediate fusion feature map or the primary fusion feature map.

As can be seen from fig. 13, assuming that only one 7 × 7 × 7 convolution unit is included in the 3D GCN block, the parameter quantity of the structure of the convolution unit is 7 with the power of three equal to 343; while the 3D GCN block in the embodiment of the present application uses several 1 × 7 × 1 convolution units, 1 × 1 × 7 convolution units, and 7 × 1 × 1 convolution units to replace 7 × 7 × 7 convolution units to perform convolution (feature extraction) in three dimensions, according to the connection structure of fig. 13, the parameter quantities of two cascaded convolution units in each branch are added (i.e., 7+ 7), and since there are two branches and x 2 is needed, the last convolution unit is cascaded to make parameter +7, so the parameter quantity in the 3D GCN block of fig. 13 is (7) × 2+7=35, compared with 343, the parameter quantity is reduced by 90% approximately, and the performance of feature extraction is substantially the same, so on the basis of ensuring the performance of feature extraction, the processing speed and the training speed of the 3D GCN block can be greatly improved, thereby the parameter quantity of the multi-channel three-dimensional network model is reduced as a whole, the processing speed and the training speed of the multi-channel three-dimensional network model are improved, and on the basis of ensuring the segmentation accuracy, and the segmentation speed is improved.

Alternatively, fig. 14 is a schematic diagram of the working of parallel concat blocks. Since the previous volume in the 3D GCN block integrates two branch extraction features, branch 1: convolution Y-axis → convolution X-axis; branch 2: convolution X axis → convolution Y axis, respectively obtaining three-dimensional characteristic maps (represented by two cubes in figure 14) of two branches, wherein the slice of each three-dimensional characteristic map in the z-axis direction represents the physical meanings of slice (build) 1, slice 2 \8230 \ 8230;, slice n of nuclear magnetic scanning slice; n is a positive integer.

When two cubes are merged (connected), in order to keep the relative positions of the slices unchanged, that is, the slice 1 must be arranged in front of the slices larger than 1, so that the three-dimensional feature map after being connected in series can be made to be physically similar, so that the three-dimensional feature map of the first row of fig. 14 is merged with the whole three-dimensional feature map as a unit, which results in the slice n being in front of the slice 1, the embodiment of the present application adopts a merging manner of the next row of fig. 14, specifically, the slices with the serial numbers 1 in the three-dimensional feature maps of the two branches are taken out, merged and sorted in front, and then all the slices with the serial numbers 2 are merged and arranged behind the merged slice with the serial number 1; and finally combining all slices with the serial number n and arranging to obtain a three-dimensional feature map which is connected in parallel and contains X-axis and Y-axis features.

S1004: and the electronic equipment extracts the characteristics of the primary fusion characteristic diagram according to the secondary fusion characteristic three-dimensional sub-network to obtain a secondary fusion characteristic diagram in the fusion characteristic diagram.

Optionally, the two-level fused feature three-dimensional sub-network according to the embodiment of the present application includes a cascaded stepped three-dimensional volume block and at least one three-dimensional volume block. Optionally, the at least one three-dimensional convolution block comprises a cascaded plurality of three-dimensional convolution blocks; the three-dimensional convolution block includes: at least one of a VoxDense block and a 3D GCN block.

Optionally, in the electronic device in the embodiment of the present application, feature extraction is sequentially performed on the primary fusion feature map according to the stepping three-dimensional volume block and the at least one three-dimensional volume block, so as to obtain a secondary fusion feature map.

S1005: and the electronic equipment performs channel number conversion and size amplification on the secondary fusion characteristic diagram according to the first size amplification superposition block to obtain a primary size amplified image.

For example, as shown in fig. 7 or 9, the electronic device takes the size of the three-dimensional image of each modality to be segmented in each dimension as a reference size (specifically, 1024, 512, or 256 pixels, etc.), which is denoted as 1; the size of the first-level branch characteristic diagram is also 1, and the size of the second-level branch characteristic diagram is reduced to 1/2 of the reference size and is marked as 1/2; similarly, the size of the first-level fusion feature map is reduced to 1/4, and the size of the second-level fusion feature map is reduced to 1/8.

Therefore, as shown in fig. 7 or 9, the sizes of the three-dimensional feature maps (input at the previous stage) received by the first-size enlarged superposition block to the fourth-size enlarged superposition block are 1/8, 1/4, 1/2, and 1, respectively, and x 8, x 4, x 2, and x 1, respectively, are required before they can be matched with the size of the three-dimensional image to be divided.

Each of the first through third enlarged-size superposition blocks includes: and the device comprises a cascaded conversion convolution unit and a size amplification unit. Optionally, the size amplifying unit is further cascaded with a smoothing convolution unit.

The fourth size amplification superposition block comprises a conversion convolution unit; optionally, the smooth convolution unit is cascaded after the conversion convolution unit.

Optionally, the size enlargement comprises one of upsampling, deconvolution, and interpolation. Each size-enlarging overlap-add block may be one of an upsampling overlap-add block, a deconvolution overlap-add block, and an interpolation overlap-add block. For example, the upsampling overlap-add block may be a fusion block (fusion block) as shown in fig. 15a, which includes a 1 × 1 × 1conv unit, an upsampling unit, and a 3 × 3 × 3conv unit in cascade. The 1 × 1 × 1conv unit indicates a conversion convolution unit, and the 3 × 3 × 3conv unit indicates a smoothing convolution unit.

For example, the electronic device performs channel number conversion on the 1/8 secondary fused feature map according to the conversion convolution unit in the first size-enlarged superposition block at x 8 in fig. 7 or fig. 9; according to a size amplifying unit, amplifying the size of the 1/8 second-level fusion characteristic diagram after the channel number conversion to 1/4 to obtain a first-level size amplifying diagram; and according to a smooth convolution unit, smoothing the 1/4 primary size enlargement image, and then conveying the image to a second size enlargement superposition block at the position of x 4.

S1006: and the electronic equipment performs channel number conversion on the primary fusion characteristic diagram according to the second size amplification superposition block, then superposes the converted primary fusion characteristic diagram with the primary size enlarged image, and performs size amplification on the superposed three-dimensional image to obtain the secondary size enlarged image.

For example, the electronic device performs channel number conversion on the 1/4 primary fusion feature map according to the conversion convolution unit in the second size-enlarging superposition block at x 4 in fig. 7 or fig. 9, and superposes the 1/4 primary fusion feature map after channel number conversion and the 1/4 primary size-enlarging map; amplifying the size of the superposed three-dimensional image to 1/2 according to a size amplifying unit to obtain a secondary size amplified image; and according to a smooth convolution unit, smoothing the two-level enlarged scale of 1/2, and then transmitting the two-level enlarged scale to a third enlarged scale superposition block at the position of multiplied by 2.

S1007: the electronic equipment superposes the secondary branch characteristic graphs of all the branches; and according to the third size enlargement superposition block, performing channel number conversion on the superposed secondary branch characteristic diagram, superposing the converted secondary branch characteristic diagram with the secondary size enlarged image, and performing size enlargement on the superposed three-dimensional image to obtain a tertiary size enlarged image.

For example, the electronic device superimposes a secondary branch feature map of 1/2 (size) of each branch; according to a conversion convolution unit in a third size enlarging and overlapping block at the position of multiplied by 2 in the figure 7 or the figure 9, channel number conversion is carried out on the 1/2 secondary branch feature map after overlapping, and the 1/2 secondary branch feature map after channel number conversion and the 1/2 secondary size enlarged view are overlapped; according to a size amplifying unit, amplifying the size of the superposed three-dimensional image to 1 to obtain a three-level size amplified image; after smoothing the three-level enlarged scale image of 1 by the smoothing convolution unit, the image is transferred to a fourth enlarged scale superposition block at x 1.

S1008: the electronic equipment superposes the primary branch characteristic graphs of all the branches; and according to the fourth size enlargement superposition block, performing channel number conversion on the superposed primary branch characteristic diagram, and superposing the converted primary branch characteristic diagram with the tertiary size enlargement diagram to obtain a three-dimensional image of the segmented target object.

For example, the electronic device superimposes a primary branch feature map of 1 (size) for each branch; according to the conversion convolution unit in the fourth size enlarging and overlapping block at the position of multiplied by 1 in fig. 7 or fig. 9, the channel number conversion is carried out on the first-level branch feature diagram of the overlapped 1, and the first-level branch feature diagram of the 1 after the channel number conversion is overlapped with the three-level size enlarging and overlapping block of the 1; amplifying the size of the superposed three-dimensional image to 1 according to a size amplifying unit to obtain a three-dimensional image of a segmented target object; and according to the smooth convolution unit, conveying the three-dimensional image of the segmented target object after smoothing.

Optionally, the size of the three-dimensional image of the segmented target object is consistent with the size of the three-dimensional image to be segmented, so that end-to-end output of the three-dimensional image is realized.

Alternatively, as shown in fig. 3, fig. 7, and fig. 9, in any block or unit having a convolution function in the multi-channel three-dimensional network model according to the embodiment of the present application, a BN (Batch Normalization layer) layer and a RelU (activation function) layer may be cascaded behind the convolution layer.

Blocks with convolution functions, including but not limited to: fused three-dimensional volume blocks, stepped three-dimensional volume blocks, 2D cascaded density blocks, voxRes blocks, 3D GCN blocks, various size enlargement overlay blocks, fusion blocks, and the like.

Units with convolution functions include, but are not limited to: each two-dimensional convolution unit in the 2D cascaded Dense block, each one-dimensional convolution unit in the 3D GCN block, a conversion convolution unit and a smoothing convolution unit in the upscaling superposition block, and a deconvolution unit (belonging to one of the upscaling units), and so on.

In the embodiment of the application, the resolution of the first-level branch characteristic image and the second-level branch characteristic image is high (namely, the size is large) but the characteristic information (namely, semantic information) of the target object in the images is not rich enough, and the characteristic information of the target object in the first-level fusion characteristic image and the second-level fusion characteristic image is rich but the resolution is low; the electronic equipment performs size amplification layer by layer from the second-level fusion characteristic image with the minimum resolution, and gradually overlaps the first-level fusion characteristic image, the second-level branch characteristic image and the first-level branch characteristic image according to layers, which is equivalent to the fusion of the resolution and the characteristic information of each layer, thereby not only improving the richness of the multi-channel three-dimensional network model for extracting the characteristic information of the target object, being convenient for improving the classification precision of the target object according to richer characteristic information, but also improving the identification precision of the multi-channel three-dimensional network model for the edge of the target object by fusing the high resolution of a shallow layer, and further integrally improving the segmentation precision of the target object.

Optionally, in the embodiment of the present invention, the electronic device superimposes the primary branch feature image and the secondary branch feature image of the multiple branches onto the feature map, which is equivalent to fuse the feature information of the three-dimensional images of multiple modalities corresponding to the multiple branches together, so that the comprehensiveness and integrity of the feature information of the target object extracted by the multi-channel three-dimensional network model can be improved, and the segmentation accuracy of the target object is integrally improved.

An example of implementing an embodiment of the present application is described below.

The terminal device acquires or receives the brain pathology three-dimensional images of the plurality of modality groups to be segmented for the same target object as shown in fig. 16, and then transmits the brain pathology three-dimensional images to the electronic device. In fig. 16, the three-dimensional brain pathology images of the plurality of modality groups to be segmented include a three-dimensional brain pathology image of a CT modality and an MR _4DPWI modality belonging to the first modality group, a three-dimensional brain pathology image of an MR _ CBF modality and an MR _ CBV modality belonging to the second modality group, and a three-dimensional brain pathology image of an MR _ MTT and an MR _ Tmax modality belonging to the third modality group.

The electronic equipment carries out target object segmentation on the brain pathology three-dimensional images of a plurality of modality groups to be segmented based on the multi-channel three-dimensional network model.

Optionally, the electronic device inputs the brain pathology three-dimensional images of the plurality of modality groups to be segmented into the multi-channel three-dimensional network model, resulting in the feature maps of the respective stages as shown in fig. 15 b.

In fig. 15b, input represents a pathological three-dimensional image of the brain of a plurality of modality groups to be segmented as an input image, and each rectangular solid represents a (three-dimensional) feature map of each stage; 1/1, 1/2, 1/4 and 1/8 of each cuboid, wherein the sizes of the characteristic maps in each stage are respectively 1/1, 1/2, 1/4 and 1/8 of the size (namely the reference size) of the pathological three-dimensional image of the brain to be segmented; c (channel) =32, c =64, c =128, and c =256, and the number of channels representing each phase profile is 32, 64, 128, and 256, respectively; output represents a three-dimensional image of a division target object as an output image; the first to fourth resize overlay blocks are fusion blocks.

Since the structures of the three-dimensional networks of the branch features in the multi-channel three-dimensional network model of the embodiment of the present application are the same, and the sizes and the channel numbers of the output branch feature maps are all the same, the branch feature map of each branch is represented by the branch feature map of only one branch in fig. 15 b.

Specifically, the electronic device inputs a plurality of primary branch feature three-dimensional sub-networks into a plurality of primary branch feature three-dimensional images of a plurality of modality groups to be segmented, and a primary branch feature map of the plurality of branches is obtained, wherein c =32 and the size of the primary branch feature map is 1/1.

The electronic device inputs a primary branch feature map of each branch c =32 and 1/1 in size into a three-dimensional sub-network of each secondary branch feature to obtain a secondary branch feature map of each branch, wherein c =64 and 1/2 in size of the secondary branch feature map.

And the electronic equipment inputs the secondary branch feature map of each branch into the primary fusion feature three-dimensional sub-network to obtain a primary fusion feature map, wherein the c =128 and the size of the primary fusion feature map is 1/4.

And the electronic equipment inputs the primary fusion feature map into the secondary fusion feature three-dimensional sub-network to obtain a secondary fusion feature map, wherein c =256 and the size of the secondary fusion feature map is 1/8.

The electronic device inputs the two-stage fused feature map into a first size-enlarged superposition block (not shown in fig. 15 b), obtains a two-stage fused feature map with the channel number converted to c =128 and the size of 1/8 (namely, a cuboid group with 1/8 and c =128 is marked at the lowest part of the right branch in fig. 15 b), enlarges the two-stage fused feature map with the channel number converted to c =128 and the size of 1/8 to 1/4, obtains a one-stage size-enlarged view with c =128 and the size of 1/4, and transmits the two-stage fused feature map to a first fused block (namely, a lowest fused block in fig. 15b, which represents a second size-enlarged superposition block).

Inputting a first fusion block into a first level fusion feature map with c =128 and 1/4 size and a first level enlarged scale map with c =128 and 1/4 size by the electronic device to obtain a three-dimensional image with c =128 and 1/4 size after superposition (namely, in fig. 15b, a second cuboid group with a right branch from bottom to top is marked with 1/4 and c 128); the three-dimensional image after channel number conversion and superimposition is enlarged to 1/2, a two-stage enlarged size image with c =128 and the size of 1/2 is obtained, and the image is transmitted to a second fusion block (the middle one in fig. 15b represents a third enlarged-size superimposition block).

The electronic equipment superposes the two-stage branch feature graphs of each branch, wherein the branch is c =64 and the size is 1/2, and inputs the superposed two-stage branch feature graphs of c =64 and the size is 1/2 into a second fusion block to obtain a two-stage branch feature graph of which the channel number is converted into c = 128; superposing a superposed c =128 secondary branch feature map with the size of 1/2 and a superposed c =128 secondary enlarged size map with the size of 1/2 according to the fusion block to obtain a superposed c =128 three-dimensional image with the size of 1/2 (namely, in fig. 15b, a right branch is from bottom to top to form a third cuboid group, and the third cuboid group is marked with 1/2 and c 128); the superimposed three-dimensional image is enlarged to 1/1 to obtain a three-level enlarged size image with c =128 and a size of 1/1, and is transferred to a third fusion block (one fusion block located above in fig. 15b represents a fourth enlarged size superimposed block).

The electronic equipment superposes the primary branch feature graphs of each branch, wherein the branch is c =32 and the size is 1/1, and inputs the superposed primary branch feature graphs of c =32 and the size is 1/1 into a third fusion block to obtain a primary branch feature graph of which the channel number is converted into c = 128; superposing a superposed two-level branch feature map with c =128 and the size of 1/1 and a superposed three-level enlarged dimension map with c =128 and the size of 1/1 according to the fusion block to obtain a superposed three-dimensional image with c =128 and the size of 1/1 (namely, in fig. 15b, a right branch is from bottom to top to form a third cuboid group, and the third cuboid group is marked with 1/2 and c 128); the three-dimensional image is output as an output of a three-dimensional brain pathology image in which a lesion (belonging to the target object) is segmented, and the lesion in the three-dimensional brain pathology image in which the lesion is segmented may be as shown in the rightmost part of fig. 16.

The electronic equipment outputs the brain pathology three-dimensional image with the segmented focus to the terminal equipment for displaying, and a user can conveniently perform further operation according to the displayed segmented brain pathology three-dimensional image.

Based on the same inventive concept, the present application provides an apparatus for segmenting a target object in a three-dimensional image, as shown in fig. 17, the apparatus 1700 for segmenting a target object in a three-dimensional image may include: a branch feature extraction module 1701, a feature fusion module 1702 and a fusion magnification module 1703.

The branch feature extraction module 1701 is configured to perform feature extraction on three-dimensional images of a plurality of modality groups to be segmented respectively according to a plurality of branch feature three-dimensional networks of the multi-channel three-dimensional network model, so as to obtain branch feature maps of a plurality of branches. The multi-channel three-dimensional network model comprises a branch characteristic three-dimensional network group, a fusion characteristic three-dimensional network and a size amplification three-dimensional network which are sequentially cascaded; the branch characteristic three-dimensional network group comprises a plurality of parallel branch characteristic three-dimensional networks.

The feature fusion module 1702 is configured to perform feature extraction and fusion on the branch feature maps of the multiple branches according to the fusion feature three-dimensional network to obtain a fusion feature map.

The fusion and enlargement module 1703 is configured to enlarge the three-dimensional network according to the size, fuse and enlarge the fusion feature map and the branch feature maps of the multiple branches, and obtain a three-dimensional image of the segmented target object.

Alternatively, as shown in fig. 18, the present application provides another segmentation apparatus 1700 for a target object in a three-dimensional image, where the segmentation apparatus 1700 includes, in addition to a branch feature extraction module 1701, a feature fusion module 1702 and a fusion amplification module 1703, the feature fusion module 1702 of the segmentation apparatus 1700 includes: a feature fusion unit 17021 and a feature extraction unit 17022.

The feature fusion unit 17021 is configured to fuse the branch feature maps of the branches according to the first-level fusion feature three-dimensional sub-network, and perform feature extraction on the fused branch feature maps of the branches to obtain a first-level fusion feature map in the fusion feature map.

The feature extraction unit 17022 is configured to perform feature extraction on the first-level fusion feature map according to the second-level fusion feature three-dimensional sub-network, so as to obtain a second-level fusion feature map in the fusion feature map.

Optionally, the feature fusion unit 17021 is specifically configured to convolve the branch feature maps of the branches according to the fused three-dimensional convolution block, and fuse the branch feature maps after the convolution of the branches to obtain an original fused feature map; according to at least one three-dimensional volume block, extracting the characteristics of the original fusion characteristic graph to obtain a primary fusion characteristic graph; the first-level fused feature three-dimensional sub-network comprises a cascade of fused three-dimensional volume blocks and at least one three-dimensional volume block.

Optionally, the feature fusion unit 17021 is specifically configured to convolve the branch feature maps of the branches according to the three-dimensional convolution layers in the three-dimensional convolution units of the fused three-dimensional convolution block, where the three-dimensional convolution units are parallel to each other; and connecting the branch characteristic diagrams after the convolution of each branch in the channel dimension according to the connecting layers in the fused three-dimensional convolution blocks to obtain an original fused characteristic diagram, wherein the connecting layers are cascaded behind each three-dimensional convolution unit.

Optionally, the feature fusion unit 17021 is further configured to, before connecting the branch feature maps after the branch convolution in the channel dimension, sequentially perform normalization and nonlinear processing on the branch feature maps after the branch convolution according to a batch normalization layer and an activation function layer that are sequentially cascaded in each three-dimensional convolution unit fusing the three-dimensional convolution blocks; connecting the branch characteristic diagrams of the branches sequentially subjected to convolution, normalization and nonlinear processing on channel dimensions according to the connection layer to obtain a connection characteristic diagram; and smoothing the connection characteristic diagram according to the three-dimensional convolution layer in the fused three-dimensional convolution block to obtain an original fused characteristic diagram.

Optionally, the feature fusion unit 17021 is specifically configured to, for each cascaded 3D GCN block, use the original fusion feature map or an intermediate fusion feature map output by a previous 3D GCN block as an input feature map; according to the two-dimensional convolution unit branches of the 3D GCN block, respectively carrying out feature extraction on the input feature map in a second dimension and a third dimension to obtain a second dimension and a third dimension feature map of each branch; according to the parallel connection block of the 3D GCN block, parallel connection is carried out on the second dimension characteristic diagram and the third dimension characteristic diagram of each branch; and according to the first dimension convolution unit of the 3D GCN block, performing feature extraction on the second dimension and the third dimension feature maps obtained by parallel connection in the first dimension, and overlapping the input feature maps to obtain and output an intermediate fusion feature map or a primary fusion feature map corresponding to the 3D GCN block.

The two-level fusion characteristic three-dimensional sub-network comprises cascaded stepping three-dimensional volume blocks and at least one three-dimensional volume block; and/or, the at least one three-dimensional convolution block comprises a plurality of three-dimensional convolution blocks in cascade; the three-dimensional convolution block includes: at least one of a VoxDense block and a 3D GCN block; the 3D GCN block comprises two-dimensional convolution unit branches connected in parallel, a parallel connection block cascaded behind the two-dimensional convolution unit branches, and a first-dimensional convolution unit cascaded behind the parallel connection block; the two-dimensional convolution unit branch comprises a second-dimensional convolution unit and a third-dimensional convolution unit which are cascaded, or a third-dimensional convolution unit and a second-dimensional convolution unit which are cascaded.

Alternatively, as shown in fig. 18, the branch feature extraction module 1701 in the embodiment of the present application includes: a primary branch feature extraction unit 17011 and a secondary branch feature extraction unit 17012.

The primary branch feature extraction unit 17011 is configured to perform feature extraction on the three-dimensional image of each modality group to be segmented according to a primary branch feature three-dimensional sub-network in each branch feature three-dimensional network, so as to obtain a primary branch feature map in a branch feature map of the branch; the branch characteristic three-dimensional network comprises a primary branch characteristic three-dimensional sub-network and a secondary branch characteristic three-dimensional sub-network which are connected in series.

The secondary branch feature extracting unit 17012 is configured to perform feature extraction on the primary branch feature map according to the secondary branch feature three-dimensional sub-network in the branch feature network, so as to obtain a secondary branch feature map in the branch feature map of the branch.

Optionally, the secondary branch feature extraction unit 17012 is specifically configured to extract features of the primary branch feature map in a first two dimensions according to the first two-dimensional convolution unit to obtain a first two-dimensional feature map; performing primary superposition on the first two-dimensional feature map and the primary branch feature map; extracting the characteristics of the feature map after the first-level superposition on a second two-dimension according to a second two-dimension convolution unit to obtain a second two-dimension feature map; performing two-stage superposition on the second two-dimensional feature map, the first two-dimensional feature map and the first-stage branch feature map; according to a third two-dimensional convolution unit, extracting the characteristics of the characteristic diagram after the two-stage superposition on a third two-dimensional to obtain a third two-dimensional characteristic diagram; and superposing the third two-dimensional feature map, the first two-dimensional feature map and the first-level branch feature map to obtain a second-level branch feature map of the three-dimensional image to be segmented of the modal group.

The secondary branch characteristic three-dimensional sub-network comprises a 2D cascade density block or a cascade VoxRes block; the 2D cascaded Dense block comprises a first two-dimensional convolution unit, a second two-dimensional convolution unit and a third two-dimensional convolution unit which are cascaded.

Optionally, as shown in fig. 18, the fusion amplification module 1703 according to an embodiment of the present application includes: a first enlargement superimposition unit 17031, a second enlargement superimposition unit 17032, a third enlargement superimposition unit 17033, and a fourth enlargement superimposition unit 17034.

The first enlarging and superimposing unit 17031 is configured to perform channel number conversion and size enlargement on the second-level fusion feature map according to the first size enlarging and superimposing block, to obtain a first-level size enlarged image. The size-enlarged three-dimensional network includes first to fourth size-enlarged superposition blocks cascaded.

The second enlarging and superimposing unit 17032 is configured to enlarge and superimpose the second-order enlarged size image according to the second size, perform channel number conversion on the first-order fusion feature image, superimpose the first-order fusion feature image with the first-order enlarged size image, and enlarge the size of the superimposed three-dimensional image to obtain the second-order enlarged size image.

The third amplifying and superposing unit 17033 is configured to superpose the secondary branch feature maps of the branches; and according to the third size enlargement superposition block, performing channel number conversion on the superposed secondary branch characteristic diagram, superposing the converted secondary branch characteristic diagram with the secondary size enlarged image, and performing size enlargement on the superposed three-dimensional image to obtain a tertiary size enlarged image.

The fourth amplifying and superposing unit 17034 is configured to superpose the primary branch feature maps of the branches; and according to the fourth size enlargement superposition block, performing channel number conversion on the superposed primary branch characteristic diagram, and superposing the converted primary branch characteristic diagram with the tertiary size enlargement image to obtain a three-dimensional image of the segmented target object.

Optionally, as shown in fig. 18, an embodiment of the present application provides another apparatus 1700 for segmenting a target object in a three-dimensional image, further including: a training module 1704.

The training module 1704 is used for obtaining the multi-channel three-dimensional network model through pre-training by the following method: determining an extended sample set according to the original sample set; dividing a verification set and a training set from the extended sample set; carrying out preliminary training on the original multi-channel three-dimensional network model by using a training set to obtain a preliminarily trained multi-channel three-dimensional network model; carrying out verified target object segmentation on the verification set by using the preliminarily trained multi-channel three-dimensional network model to obtain a segmentation result; determining a three-dimensional image of each difficult sample according to the segmentation result; the problematic sample three-dimensional image comprises a sample three-dimensional image of which the area occupied by the target object is smaller than an area threshold value or the classification error rate of the target object is higher than an error rate threshold value; and training the preliminarily trained multi-channel three-dimensional network model according to the three-dimensional images of the difficult samples to obtain the selected multi-channel three-dimensional network model.

Optionally, the training module 1704 is specifically configured to perform iterative training on the preliminarily trained multi-channel three-dimensional network model according to the three-dimensional image of each problematic sample until a preset convergence condition is met; one of the iterative trainings includes: inputting a three-dimensional image of a current difficult sample into a multi-channel three-dimensional network model obtained by last training; outputting a current prediction result through a loss function layer of the multi-channel three-dimensional network model; determining the error between the current prediction result and the sample segmentation result corresponding to the current problematic sample three-dimensional image; reversely propagating the error to each hidden layer in the multi-channel three-dimensional network model, and calculating the gradient of the error reversely propagated to each hidden layer; and updating parameters of all hidden layers in the multi-channel three-dimensional network model according to the gradient to obtain the multi-channel three-dimensional network model obtained by the iterative training.

The multi-channel three-dimensional network model comprises a loss function layer cascaded after a three-dimensional network is amplified in size in a training process; the loss function layer includes a cross entropy function and an auxiliary weighted loss function.

The device 1700 for segmenting a target object in a three-dimensional image according to this embodiment may perform the method for segmenting a target object in a three-dimensional image according to any one of the above embodiments or any optional embodiments of this application, and the implementation principle and the obtained beneficial technical effects are similar to those of the method for segmenting a target object in a three-dimensional image according to this application, and are not described herein again.

Based on the same inventive concept, an embodiment of the present application provides an electronic device, as shown in fig. 19, an electronic device 1900 shown in fig. 19 includes: a processor 1901 and a memory 1903. The processor 1901 and the memory 1903 are electrically coupled, such as via bus 1902. Optionally, the electronic device 1900 further includes a network module 1904. It should be noted that, in practical applications, the network module 1904 is not limited to one, and the structure of the electronic device 1900 does not limit the embodiment of the present application.

The processor 1901 is applied to the embodiment of the present application, and is configured to implement the functions of the modules in the segmentation apparatus for the target object in the medical image shown in fig. 17 or fig. 18.

Processor 1901 may be a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a general purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or other Programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or execute the various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein. The processor 1901 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 1902 may include a path that conveys information between the aforementioned components. The bus 1902 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 1902 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 19, but it is not intended that there be only one bus or one type of bus.

The Memory 1903 may be a ROM (Read-Only Memory) or other type of static storage device that can store static information and instructions, a RAM (random access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read-Only Memory) or other optical disk storage, optical disk storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.

Optionally, the memory 1903 is used for storing application codes or operation instructions for executing the present application, and is controlled by the processor 1901 to execute. The processor 1901 is configured to execute the application program code or the operation instructions stored in the memory 1903 to implement the segmentation method for the target object in the three-dimensional image according to any one of the embodiments or any optional implementation of the present application; alternatively, the implementation principle and the obtained advantageous technical effects of the apparatus for segmenting the target object in the three-dimensional image shown in fig. 17 or fig. 18 are similar to the method for segmenting the target object in the three-dimensional image of the present application, and thus are not described herein again.

Based on the same inventive concept, embodiments of the present application provide a computer-readable storage medium, where at least one instruction, at least one program, a code set, or a set of instructions is stored in the storage medium, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the method for segmenting the target object in the three-dimensional image according to any one of the foregoing embodiments or any optional implementation manner of the present application.

The embodiment of the present application provides a computer-readable storage medium suitable for the method embodiment, and the implementation principle and the obtained beneficial technical effects are similar to those of the method for segmenting the target object in the three-dimensional image in the present application, and are not described herein again.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of execution is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for segmenting a target object in a three-dimensional image, comprising:

according to a plurality of branch characteristic three-dimensional networks of the multi-channel three-dimensional network model, respectively extracting the characteristics of three-dimensional images of a plurality of modal groups to be segmented to obtain branch characteristic graphs of a plurality of branches; the multi-channel three-dimensional network model comprises a branch characteristic three-dimensional network group, a fusion characteristic three-dimensional network and a size amplification three-dimensional network which are sequentially cascaded; the branch characteristic three-dimensional network group comprises a plurality of branch characteristic three-dimensional networks which are parallel;

according to the fusion characteristic three-dimensional network, performing characteristic extraction and fusion on the branch characteristic diagrams of the multiple branches to obtain a fusion characteristic diagram;

according to the size amplification three-dimensional network, fusing and size-amplifying the fusion characteristic diagram and the branch characteristic diagrams of the multiple branches to obtain a three-dimensional image of a segmented target object;

the three-dimensional network of the fusion characteristics comprises a first-level three-dimensional sub-network of the fusion characteristics and a second-level three-dimensional sub-network of the fusion characteristics which are cascaded; the primary fusion feature three-dimensional sub-network comprises a cascaded fusion three-dimensional volume block and at least one three-dimensional volume block; the two-level fusion feature three-dimensional sub-network comprises cascaded stepping three-dimensional volume blocks and at least one three-dimensional volume block; the at least one three-dimensional convolution block comprises a cascaded plurality of the three-dimensional convolution blocks; the three-dimensional convolution block includes: at least one of a deep stereo dense network block and a three-dimensional graph convolution network block;

wherein, according to the three-dimensional network of fusion characteristics, the method for extracting and fusing the characteristics of the branch characteristic diagrams of a plurality of branches to obtain the fusion characteristic diagram comprises the following steps:

respectively convolving the branch feature maps of the branches according to the fusion three-dimensional volume blocks in the first-level fusion feature three-dimensional sub-network, and fusing the branch feature maps after the convolution of the branches to obtain an original fusion feature map;

according to at least one three-dimensional volume block in the primary fusion feature three-dimensional sub-network, feature extraction is carried out on the original fusion feature graph to obtain the primary fusion feature graph;

and according to the secondary fusion feature three-dimensional sub-network, performing feature extraction on the primary fusion feature graph to obtain a secondary fusion feature graph in the fusion feature graph.

2. The method according to claim 1, wherein the convolving the branch feature maps of the branches according to the fused three-dimensional volume block, and fusing the branch feature maps after the convolving of the branches to obtain an original fused feature map comprises:

performing convolution on the branch characteristic diagram of each branch according to the three-dimensional convolution layer in each three-dimensional convolution unit of the fused three-dimensional convolution block, wherein the three-dimensional convolution units are parallel;

and connecting the branch feature maps after the convolution of the branches on a channel dimension according to a connecting layer in the fused three-dimensional convolution block to obtain the original fused feature map, wherein the connecting layer is cascaded behind each three-dimensional convolution unit.

3. The method according to claim 2, wherein before the convolved branch feature maps of the branches are connected in the channel dimension, the method further comprises:

and connecting the branch feature maps after the convolution of the branches on the channel dimension according to the connecting layer in the fused three-dimensional convolution block to obtain an original fused feature map, wherein the method comprises the following steps:

4. The method of claim 1, the secondary fusion feature three-dimensional subnetwork comprising a cascaded stepped three-dimensional volume block and at least one three-dimensional volume block;

and/or, the at least one three-dimensional convolution block comprises a plurality of the three-dimensional convolution blocks in cascade; the three-dimensional convolution block includes: at least one of a deep stereo dense network block and a three-dimensional graph convolution network block; the three-dimensional graph convolution network block comprises two-dimensional convolution unit branches connected in parallel, a parallel connection block cascaded behind the two-dimensional convolution unit branches, and a first-dimensional convolution unit cascaded behind the parallel connection block; the two-dimensional convolution unit branch comprises a second-dimensional convolution unit and a third-dimensional convolution unit which are cascaded or a third-dimensional convolution unit and a second-dimensional convolution unit which are cascaded.

5. The method according to claim 4, wherein said extracting features from said original fused feature map according to said at least one three-dimensional volume block to obtain said primary fused feature map comprises:

for each cascaded three-dimensional graph convolution network block, taking the original fusion feature graph or the middle fusion feature graph output by the previous three-dimensional graph convolution network block as an input feature graph;

and according to the first dimension convolution unit of the three-dimensional graph convolution network block, performing feature extraction on the second dimension and the third dimension feature graphs obtained by parallel connection on the first dimension, and overlapping the input feature graphs to obtain and output an intermediate fusion feature graph or the first-stage fusion feature graph corresponding to the three-dimensional graph convolution network block.

6. The method of claim 1, wherein the branch signature three-dimensional network comprises a cascade of a primary branch signature three-dimensional subnetwork and a secondary branch signature three-dimensional subnetwork;

and the step of respectively extracting the features of the three-dimensional images of the plurality of modality groups to be segmented according to the plurality of branch feature three-dimensional networks to obtain branch feature maps of the plurality of branches comprises the following steps:

according to the first-level branch characteristic three-dimensional sub-network in each branch characteristic three-dimensional network, performing characteristic extraction on the three-dimensional image of each modal group to be segmented to obtain a first-level branch characteristic diagram in the branch characteristic diagram of the branch;

and according to the secondary branch feature three-dimensional sub-network in the branch feature network, performing feature extraction on the primary branch feature graph to obtain a secondary branch feature graph in the branch feature graph of the branch.

7. The method of claim 6, wherein the secondary branch feature three-dimensional sub-network comprises a two-dimensional cascaded dense block or a cascaded deep stereo residual network block;

and the step of performing feature extraction on the primary branch feature map according to the secondary branch feature three-dimensional sub-network in the branch feature network to obtain a secondary branch feature map in the branch feature map of the branch comprises:

extracting the characteristics of the first-level branch characteristic diagram in a first two-dimension according to the first two-dimensional convolution unit to obtain a first two-dimensional characteristic diagram;

performing first-level superposition on the first two-dimensional feature map and the first-level branch feature map; extracting the characteristics of the feature map after the first-level superposition on a second two-dimension according to the second two-dimension convolution unit to obtain a second two-dimension feature map;

performing two-level superposition on the second two-dimensional feature map, the first two-dimensional feature map and the first-level branch feature map; according to the third two-dimensional convolution unit, extracting the characteristics of the characteristic diagram after the two-stage superposition on a third two-dimension to obtain a third two-dimensional characteristic diagram;

8. The method of claim 6, wherein the upscaled three-dimensional network comprises a cascade of first to fourth upscaled stacking blocks;

and the step of fusing and size-enlarging the fused feature map and the branch feature maps of the plurality of branches according to the size-enlarged three-dimensional network to obtain a three-dimensional image of the segmented target object, comprises the following steps:

according to the first size amplification superposition block, performing channel number conversion and size amplification on the secondary fusion characteristic diagram to obtain a primary size amplification image;

according to a second size enlargement superposition block, after channel number conversion is carried out on the primary fusion characteristic diagram, superposition is carried out on the primary fusion characteristic diagram and the primary size enlargement diagram, and size enlargement is carried out on a superposed three-dimensional image to obtain a secondary size enlargement diagram;

superposing the secondary branch characteristic graphs of all branches; according to a third size enlargement superposition block, after channel number conversion is carried out on the superposed secondary branch characteristic diagram, superposition is carried out on the superposed secondary branch characteristic diagram and the superposed three-dimensional image, and a three-dimensional enlarged size is obtained;

superposing the primary branch characteristic graphs of all branches; and according to the fourth size amplification superposition block, performing channel number conversion on the superposed primary branch characteristic diagram, and superposing the superposed primary branch characteristic diagram and the tertiary size amplification diagram to obtain a three-dimensional image of the segmented target object.

9. The method of claim 1, wherein the multi-channel three-dimensional network model is pre-trained by:

determining an extended sample set according to the original sample set;

dividing a verification set and a training set from the extended sample set;

carrying out primary training on the original multi-channel three-dimensional network model by using the training set to obtain a multi-channel three-dimensional network model after the primary training;

10. The method of claim 9, wherein the multi-channel three-dimensional network model comprises cascading layers of loss functions after the dimensionally-enlarged three-dimensional network during a training process; the loss function layer comprises a cross entropy function and an auxiliary weighting loss function;

and training the preliminarily trained multi-channel three-dimensional network model according to the three-dimensional images of the difficult samples to obtain a selected multi-channel three-dimensional network model, wherein the training comprises the following steps:

performing iterative training on the preliminarily trained multi-channel three-dimensional network model according to the three-dimensional images of the difficult samples until a preset convergence condition is met; one of the iterative trainings comprises:

determining the error between the current prediction result and the sample segmentation result corresponding to the current problematic sample three-dimensional image; reversely propagating the error to each hidden layer in the multi-channel three-dimensional network model, and calculating the gradient of the error reversely propagated to each hidden layer; and updating the parameters of all hidden layers in the multi-channel three-dimensional network model according to the gradient to obtain the multi-channel three-dimensional network model obtained by the iterative training.

11. The method of any one of claims 1-10, further comprising at least one of:

the size enlargement comprises one of upsampling, deconvolution, and interpolation.

12. An apparatus for segmenting a target object in a three-dimensional image, comprising:

the branch feature extraction module is used for respectively extracting features of three-dimensional images of a plurality of modal groups to be segmented according to a plurality of branch feature three-dimensional networks of the multi-channel three-dimensional network model to obtain branch feature maps of a plurality of branches; the multi-channel three-dimensional network model comprises a branch characteristic three-dimensional network group, a fusion characteristic three-dimensional network and a size amplification three-dimensional network which are sequentially cascaded; the branch characteristic three-dimensional network group comprises a plurality of branch characteristic three-dimensional networks which are parallel;

the fusion amplification module is used for carrying out fusion and size amplification on the fusion characteristic diagram and the branch characteristic diagrams of the plurality of branches according to the size amplification three-dimensional network to obtain a three-dimensional image of a segmented target object;

the three-dimensional network of the fusion characteristics comprises a first-level three-dimensional sub-network of the fusion characteristics and a second-level three-dimensional sub-network of the fusion characteristics which are cascaded; the primary fusion feature three-dimensional sub-network comprises a cascaded fusion three-dimensional volume block and at least one three-dimensional volume block; the two-level fusion characteristic three-dimensional sub-network comprises cascaded stepping three-dimensional volume blocks and at least one three-dimensional volume block; the at least one three-dimensional convolution block comprises a cascaded plurality of the three-dimensional convolution blocks; the three-dimensional convolution block includes: at least one of a deep stereo dense network block and a three-dimensional graph convolution network block;

wherein the feature fusion module comprises:

the feature fusion unit is used for respectively convolving the branch feature maps of the branches according to the fusion three-dimensional volume blocks in the primary fusion feature three-dimensional sub-network and fusing the branch feature maps after the convolution of the branches to obtain an original fusion feature map;

the feature fusion unit is used for extracting features of the original fusion feature map according to at least one three-dimensional volume block in the primary fusion feature three-dimensional sub-network to obtain the primary fusion feature map;

13. An electronic device, comprising:

a processor, a memory, and a bus;

the bus is used for connecting the processor and the memory;

the memory is used for storing operation instructions;

the processor is configured to execute the method for segmenting the target object in the three-dimensional image according to any one of claims 1 to 11 by calling the operation instruction.

14. A computer readable storage medium, characterized in that it stores at least one instruction, at least one program, a set of codes or a set of instructions, which is loaded and executed by a processor to implement a segmentation method of a target object in a three-dimensional image according to any one of claims 1 to 11.