CN113205523A - Medical image segmentation and identification system, terminal and storage medium with multi-scale representation optimization - Google Patents

Medical image segmentation and identification system, terminal and storage medium with multi-scale representation optimization Download PDF

Info

Publication number
CN113205523A
CN113205523A CN202110475782.4A CN202110475782A CN113205523A CN 113205523 A CN113205523 A CN 113205523A CN 202110475782 A CN202110475782 A CN 202110475782A CN 113205523 A CN113205523 A CN 113205523A
Authority
CN
China
Prior art keywords
medical image
segmentation
scale
prediction
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110475782.4A
Other languages
Chinese (zh)
Inventor
钟颖
沈海斌
黄科杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Zhejiang Lab
Original Assignee
Zhejiang University ZJU
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU, Zhejiang Lab filed Critical Zhejiang University ZJU
Priority to CN202110475782.4A priority Critical patent/CN113205523A/en
Publication of CN113205523A publication Critical patent/CN113205523A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a medical image segmentation and identification system with multi-scale representation optimization, a terminal and a storage medium. The characterization medical image preprocessing module is used for preprocessing the single-channel characterization medical image and outputting a normalization result; the segmentation recognition medical image preprocessing module is used for preprocessing the single-channel segmentation recognition medical image and outputting a normalization result; the multi-scale characterization learning deep convolutional neural network module comprises a deep convolutional neural network and a loss function, and inputs the normalized characterization medical image and outputs a network initial weight after passing through the deep convolutional neural network; the medical image segmentation and identification network module comprises a segmentation network, and is used for performing segmentation network training and prediction of a segmentation result of an input image to be detected. The invention combines the feature learning from the global state to the dense layer by layer, improves the segmentation performance by combining the spatial information of each layer, is more effective on the medical image segmentation task, avoids the generation of overlarge tensors in a projection layer and a prediction layer, and ensures the invariance of the feature scale.

Description

Medical image segmentation and identification system, terminal and storage medium with multi-scale representation optimization
Technical Field
The invention belongs to an image processing system, a terminal and a storage medium in the field of medical image computer vision, and particularly relates to a multi-scale representation optimized medical image segmentation recognition system, a terminal and a storage medium.
Background
In the field of medical imaging, a crucial step is the accurate segmentation of biomedical objects, such as organs or tissues, for diagnosis, treatment planning, prognosis, etc. Manual segmentation by medical experts is time consuming and laborious, requiring a high degree of expertise. Therefore, an accurate and stable automatic segmentation system is constructed, the burden of a doctor can be effectively reduced, and the patient can be treated more quickly.
In addition to the traditional machine learning method, more and more deep learning-based methods achieve excellent performance in medical segmentation, wherein a Convolutional Neural Network (CNN) -based method is the best medical segmentation model at present. Although these networks greatly improve the performance of medical image segmentation, they all rely on large volumes of annotated medical data for fully supervised training. Since annotating medical images is very time consuming and requires a high degree of biomedical expertise, constructing large-scale annotated medical image datasets is a difficult task. A large amount of unannotated medical data is more readily available than annotation of data.
As a method that does not require manual annotation, the emerging Self-supervised Learning (SSL) is expected to perform characterization Learning (rendering Learning) using a large number of medical images without annotation. The currently proposed self-supervision characterization learning method mostly learns Global characterization (Global Representation) from image classification tasks, and is not suitable for medical image segmentation requiring pixel-level classification. Moreover, a single global Representation, such as a Local Representation or a Dense pixel Representation (Dense Representation), cannot satisfy the requirement of the diversity of the dimensions of the organ or tissue in the medical image for the Representation hierarchy. Therefore, the invention provides a multi-scale representation learning technology for medical image segmentation, which can simultaneously learn the visual representations of multiple scales from the global state to the dense pixels so as to improve the medical image segmentation performance.
Disclosure of Invention
Aiming at the defects that only a single global representation is learned in the prior art and the requirement of multi-level visual features for medical image segmentation, the invention provides a multi-scale representation optimized medical image segmentation recognition system, a terminal and a storage medium, and the image segmentation performance is improved by learning more effective multi-scale representations.
The technical scheme adopted by the invention is as follows:
and the characterization medical image preprocessing module is used for preprocessing the input single-channel characterization medical image data, such as corresponding value range truncation and normal distribution. Inputting single-channel characterization medical image data and outputting a normalized characterization medical image; the single-channel representation medical image data refers to a medical image without annotation, and the annotation is a label, and can be a CT image, for example.
And the segmentation recognition medical image preprocessing module is used for preprocessing the input single-channel segmentation recognition medical image data, such as corresponding value range truncation and normal distribution. Inputting single-channel segmentation identification medical image data and outputting a normalized segmentation identification medical image; the single-channel segmentation recognition medical image data refers to annotated medical images, and may be, for example, CT images.
The deep convolutional neural network module comprises a deep convolutional neural network for multi-scale visual representation learning and a loss function thereof, is used for the learning of the network weight of the multi-scale representation, inputs the normalized representation medical image, and outputs the normalized representation medical image as the network initial weight after being processed by the deep convolutional neural network;
the medical image segmentation recognition network module comprises a segmentation network, and is used for training the segmentation network and predicting the segmentation result of an input image to be detected, the input image to be detected is input into the segmentation network, and the segmentation result of the image is predicted and obtained, wherein the segmentation network is obtained by initializing by adopting a trained deep convolutional neural network and training by inputting a medical image and a segmentation annotation of a training set in advance.
When the medical image segmentation system is in a training mode, the input is normalized segmentation identification medical image, segmentation annotation and network initial weight, and no output exists;
when the medical image segmentation system is in the identification mode, the medical image is input as the normalized segmentation identification medical image, and the medical image is output as the medical target segmentation identification result.
Optionally, the deep convolutional neural network, as shown in fig. 2, is composed of a View Generation Module (View Generation Module), two twin networks (parameter networks) with the same topology, an embedded pre-sampling Module, a prediction layer, and a characterization Consistency Module (reconstruction Consistency Module); the medical image is input into a view generation module to be processed to generate two first views x with overlapped partsaAnd a first view xbAnd a first view xaAnd a first view xbPosition correspondence information between
Figure BDA0003047346970000026
First view xaAnd a first view xbRespectively input into a first branch and a second branch, the first branch is formed by a twin network
Figure BDA0003047346970000022
And the prediction layer are connected in sequence, and the second branch is only composed of twin network
Figure BDA0003047346970000023
Composition, position corresponding information
Figure BDA0003047346970000024
Twin network which is processed by embedded pre-sampling module and then input into two branches
Figure BDA0003047346970000025
The output of the first branch and the second branch is processed by the characterization consistency module to output loss values.
Optionally, as shown in fig. 3, in the view generating module, canvas matching processing is provided to calculate the position correspondence between the views; canvas matching Process generates an initial coordinate matrix CoriSpatial transformation function enhanced by data
Figure BDA0003047346970000031
Obtaining a transformed coordinate matrix CaAnd CbObtaining a first view x by coordinate interpolationaAnd a first view xb(ii) a Coordinate matrix CaAnd CbMapping onto a blank canvas B results in a first view xaAnd a first view xbThe canvas between the two places is operated to obtain the information corresponding to the position
Figure BDA00030473469700000314
The specific process of the canvas matching processing is as follows:
first, an initialization coordinate matrix C is constructedoriCoordinate matrix CoriIs (H) the same as the size of the viewp,Wp),Hp,WpRespectively representing a first view xaAnd a first view xbHeight and width of (a).
Coordinate matrix CoriWherein each element comprises (A, B), wherein A is a first element parameter, B is a second element parameter, and a coordinate matrix CoriThe first element parameters A in each column are increased in an integer from left to right, and the first element parameters A of each element in the same column are the same; coordinate matrix CoriThe second element parameters B in each row are increased in an integer from top to bottom, and the second element parameters B of each element in the same row are the same; and coordinate matrix CoriThe element at the center position is (0, 0).
Randomly generating a space transformation function according to the following formula, and fitting a coordinate matrix CoriPerforming spatial transformations
Figure BDA00030473469700000315
Coordinate matrices are obtained for two views:
Figure BDA0003047346970000034
wherein the content of the first and second substances,
Figure BDA0003047346970000035
representing a first View xaA first spatial transformation function of;
Figure BDA0003047346970000036
representing a first View xbA second spatial transformation function of (a); ca、CbRepresenting a first view xaAnd a first view xbA coordinate matrix of (a);
the spatial transformation function is generated randomly, and one or more combinations of rotation, turning, scaling and elastic deformation of any angle are selected. The first spatial transformation function and the second spatial transformation function may be the same or different.
Then, performing nearest neighbor interpolation mapping on the coordinate matrixes of the two views on the original image to obtain two enhanced views, and performing coordinate mapping and bit operation on a blank canvas to respectively obtain a first view xaAnd a first view xb
xa=Map(Ca,X),xb=Map(Cb,X)
Wherein Map () represents a coordinate mapping function, CiRepresenting view xiA or b, X representing the input image;
then for the first view x according to the following formulaaAnd a first view xbObtaining position corresponding information by coordinate matrix processing
Figure BDA0003047346970000037
Figure BDA0003047346970000038
Figure BDA0003047346970000039
Where round () represents a rounding operation,
Figure BDA00030473469700000310
representing view xiThe height coordinate vector of the coordinate matrix of (a),
Figure BDA00030473469700000311
representing view xiThe width coordinate vector of the coordinate matrix of (2),
Figure BDA00030473469700000312
represents from zero to hp×hwVector of natural numbers, Ba,BbRespectively representing a first view xaAnd a first view xbCanvas of (B)iRepresenting view xiThe boolean () represents a boolean operation, the cat () represents a concatenation operation, and the L represents location correspondence information;
Figure BDA00030473469700000313
representing view xiThe coordinate position on the canvas is
Figure BDA0003047346970000041
The pixel point of (2); cat (B)a,Bb)[bool(Ba&Bb)]Representing canvas BaAnd canvas BaThe intersection of (a) and (b) performs an extraction operation.
The canvas matching processing can accurately obtain the position correspondence between the views for any complex enhancement transformation, such as rotation and elastic deformation.
Optionally, the twin network includes an encoder, a decoder and a projection layer, which are connected in sequence, and the encoder and the decoder form a U-Net network; corresponding to the size diversity of the medical image target, the encoder and the decoder correspond the size to a scale stage which comprises a plurality of scale stages connected in sequence, each scale stage consists of two continuous convolution blocks, a view is input to the first scale stage, and respective characteristic diagrams are output through each scale stage; in the encoder, the resolution scale of the feature map output at each scale stage is gradually reduced, and can be gradually reduced by half; in the decoder, the resolution scales of the feature maps output in each scale stage are gradually increased and can be gradually doubled, and each scale stage receives the feature maps with the same resolution scale output in the encoder; obtaining a global feature map after performing global pooling operation on the feature map with the minimum resolution scale in all feature maps output by the decoder at each scale stage; the global feature map and all feature maps output by the original stages of each scale of the decoder are used as feature maps before projection; the number of the projection layers d is consistent with that of the feature maps before projection, and each projection layer is formed by connecting three continuous rolling blocks; the number of the prediction layers is consistent with that of the projection layers d in the twin network, each prediction layer is formed by connecting two continuous convolution blocks, and convolution kernels of the three convolution blocks are the same; each convolution block is formed by sequentially connecting a 3D convolution layer, a normalization layer and an activation layer.
The global representation adopted by the prior art adopts global average pooling after the deepest convolution, abandons spatial information, obtains a single vector or embeds the vector as the global representation, and has limitation on tasks such as segmentation and the like which are sensitive to the spatial information. Moreover, medical imaging is characterized by wide span of dimensions of organs and tissues, such as small esophagus dimension, inferior kidney dimension, and stomach occupying a large dimension range, and the characterization of adjacent or dense single dimension is not complete enough, and the proper dimension needs to be adaptively corresponded according to the specific size of each organ. The multi-scale representation provided by the invention combines the representation learning from the whole situation to the dense layer by layer, and combines the spatial information of each layer to improve the segmentation performance.
Optionally, an embedded pre-sampling module is added to the twin network to perform pre-sampling processing; the embedded pre-sampling module corresponds to the information according to the position
Figure BDA0003047346970000042
Pre-projection feature maps obtained at twin networks of a first branch and a second branchIn, intercepting position corresponding information
Figure BDA0003047346970000043
And inputting the areas into the corresponding projection layers. The specific process of embedding the pre-sampling module is as follows:
for position corresponding information
Figure BDA0003047346970000044
Random sampling is carried out, and the matching position is selected as the E pair:
Figure BDA0003047346970000045
P′a,i=Samp(Pa,i,GCa),R′b,i=Samp(Rb,1,GCb)
where GC represents Grid coordinates of N samples after sampling, random. The Samp () function represents the upsampling of the feature map by bilinear interpolation in grid coordinates. Derived post-sampling predictive characterization P'a,iAnd projection characterization R'b,iHas a tensor size of [ B, 2048, 1, 1, N]. The embedded pre-sampling module samples in advance according to the position corresponding information L in front of the projection layer, so that the phenomenon that the extra-large tensor is generated in the projection layer and the prediction layer in the representation operation is avoided, and the training efficiency is improved.
Optionally, the loss function of the deep convolutional neural network includes a characterization consistency module, and the training is performed with the loss value minimized; in the characterization consistency module, each feature map of the multi-scale characterization output by processing the feature map before projection in the first branch through the projection layer and the prediction layer is taken as a first prediction map P0~Pn-1、PgAnd taking each multi-scale characterization feature map output by the projection layer processing of the feature map before projection in the second branch as a second prediction map R0~Rn-1、Rg(ii) a The specific calculation process for the characterization consistency module is as follows:
a first prediction graph P corresponding to the global feature graphgA second prediction graph R corresponding to the global feature graphgThe cosine similarity calculation processing is carried out to obtain the global loss Lg
Figure BDA0003047346970000051
Wherein | | | purple hair2The method comprises the following steps of (1) representing an L2 norm solving operation;
aiming at the first prediction graph P except the global feature graphgEvery other first prediction graph P0~Pn-1The first prediction graph is respectively compared with a second prediction graph R except for the global characteristic graphgOther second prediction maps R0~Rn-1In order to eliminate the condition that the same medical image target has different sizes, cosine similarity processing is carried out to obtain similarity loss Li,j
Figure BDA0003047346970000052
{Si,0,Si,1,...,Si,n-1}=softmax(k·{-Li,o,-Li,1,...,-Li,n-1})
Figure BDA0003047346970000053
Wherein k represents a parameter for adjusting the amplitude; { -Li,0,-Li,1,…,-Li,n-1Denotes the set of similarity losses at each scale, { S }i,0,Si,1,…,Si,n-1Represents the flexible maximum weight set of each scale loss; softmax () represents the flexible maximum operation. (ii) a
Then, the similarity loss L corresponding to each second prediction graph is usedi,jProcessing to obtain a flexible maximum weight Si,jThe similarity loss L corresponding to each second prediction graph is obtainedi,jBy flexible maximum weight Si,jCarrying out weighted summation to obtain the local loss of the first prediction graph, and finally carrying out weighted summation to obtain the local loss L of all the first prediction graphso~Ln-1And a global penalty LgThe addition is done to obtain the total loss. Upon reversal of the gradient, the loss of similarity will truncate the backtransmission of the second branch gradient by stopping the gradient (Stop-grad) operation.
By adopting the characterization consistency module, according to the characteristic that the same medical target has different scales caused by the difference of single-channel medical image imaging instruments, the similarity is calculated through the cross-scale feature layer, so that the characterizations with different scales are automatically matched. The prediction representation of each scale can be matched with the optimal projection representation scale, and the final model has the capability of learning scale invariance representation. Through multi-scale characterization and characterization consistency loss training, the neural network can learn abundant visual characterization for medical images without explanation.
Further, the characterization consistency module provides an SGD optimizer to perform optimization of network training.
The present invention also provides a terminal device, which includes: at least one processor, at least one memory, and computer program instructions stored in the memory that, when executed by the processor, implement the multi-scale representation-based learning-optimized medical image segmentation system.
The invention also provides a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the multi-scale representation learning optimization-based medical image segmentation system.
Compared with the prior art, the invention has the beneficial effects that:
the canvas matching processing provided by the invention can calculate the complex position corresponding relation, so that the view generation can use a more complex enhancement mode: the position correspondence between the views can be accurately obtained for any complex enhancement transformation, such as rotation and elastic deformation.
The embedded pre-sampling module provided by the invention avoids the generation of overlarge tensors in a projection layer and a prediction layer so as to carry out efficient and effective training: by sampling in advance according to the position corresponding information in front of the projection layer, the ultra-large tensor caused by the larger hidden layer dimension in the projection layer and the prediction layer is avoided, and the training process is more efficiently carried out.
The characterization consistency module provided by the invention enables multi-scale characterization to be automatically matched, and ensures the scale invariance of the characterization: and selecting the projection representation of the best matching scale for the prediction representation of each scale through the flexible maximum value weight of the cosine similarity.
The invention utilizes the deep convolutional neural network to extract the representations with different sizes and spatial scales, and compared with the global representation or single scale representation in the prior art, the method is more effective in the medical image segmentation task: the global characterization adopted by the prior art has limitation on the segmentation task, the target size in the medical image has diversity, and the characterization of adjacent or dense single scales is not complete enough. The multi-scale representation provided by the invention combines the representation learning from the whole situation to the dense layer by layer, and combines the spatial information of each layer to improve the segmentation performance.
In summary, the invention combines the feature learning from the global state to the dense layer by layer, improves the segmentation performance by combining the spatial information of each layer, is more effective on the medical image segmentation task, avoids the generation of overlarge tensors in the projection layer and the prediction layer, and ensures the invariance of the feature scale.
Drawings
Fig. 1 is a flow chart of the use of the method proposed by the present invention.
Fig. 2 is a structural diagram of a deep convolutional neural network proposed by the present invention.
FIG. 3 is a schematic diagram of the canvas matching process proposed by the present invention.
Fig. 4 is a comparison graph of the effect of the embedded pre-sampling module proposed by the present invention.
FIG. 5 is a flow chart of the operations of the token consistency module according to the present invention.
Detailed Description
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
The embodiment of the invention and the implementation process thereof are as follows:
as shown in fig. 1, the implementation of the medical image segmentation system based on multi-scale characterization learning optimization of the present invention is as follows:
the characterization medical image preprocessing module: the input single-channel representation medical image data is preprocessed, and for an original input medical image three-dimensional CT image X, the original size (D, H, W) is (100-, 512), namely the length and width in a plane is 512 multiplied by 512 pixels, and the number of slices out of the plane is about 100. Due to the characteristic of single-channel medical image imaging such as CT equipment imaging through an X-ray attenuation rate, the effective fluctuation range of pixel/voxel values is between-1000 and 325. And (3) truncating the pixel/voxel value of the CT image to be between-1000 and 325, normalizing the pixel/voxel value to be in a normal distribution range, and outputting a normalized characterization medical image.
And the segmentation recognition medical image preprocessing module is used for preprocessing the input single-channel segmentation recognition medical image data, the input size and value range, the power output and the output are the same as those of the representation medical image preprocessing module, and the normalized segmentation recognition medical image is output.
And the multi-scale representation learning depth convolution neural network module is used for learning the network weight of the multi-scale representation.
And the medical image segmentation and identification network module is used for training a segmentation network and predicting the segmentation result of the input image to be detected. The module is set to be a training mode, a trained deep convolution neural network is adopted for initialization, and medical images and segmentation annotations of a training set are input in advance for training to obtain a segmentation network. And setting the module into an identification mode, inputting the input image to be detected into the segmentation model, and predicting to obtain the segmentation result of the image.
In this embodiment, the twin network and the prediction layer of the deep convolutional neural network have the following specific structures:
1) the applied U-Net network is a 3D version, namely 3D convolution is adopted for feature extraction. The U-Net has 6 scale stages, from the original input cutting block size [48 × 192 × 192] to the lowest [6 × 6 × 6], which are respectively represented by subscripts 0 to 5, the step sizes among the stages are [1,2,2], [2,2,2] and [1,2,2], and the convolution kernels adopted by each stage are [1, 3, 3], [3, 3, 3 ]. The base number of channels for stage 0 is 32 and multiplied by 2 in order of increasing stages, the highest number of channels ending at 320, i.e., 32, 64, 128, 256, 320.
2) Each convolution block is formed by stacking a convolution layer-Normalization layer-activation function layer, wherein the Normalization layer adopts example Normalization (Instance Normalization), and the activation function adopts Leaky ReLU with negative slope of 0.01:
Figure BDA0003047346970000081
3) the projection layer is constructed using a 3D convolution block of (1 × 1 × 1) convolution kernels. The structure of the convolution block of the projection layer is similar to that of the convolution block in the U-Net, three layers of convolution blocks are adopted for stacking, and the Leaky ReLU activation function layer of the last convolution block is removed so as to keep the symmetry of the 0 value represented by the projection.
4) The 3D convolution dog paste of the prediction layer whose base convolution blocks are consistent with the projection layer, passed through the (1 × 1 × 1) convolution kernel. The prediction layer is formed by stacking two layers of convolution blocks, the last layer of convolution block only contains a convolution layer, and the normalization layer and the activation function layer are removed to ensure the stability of the training process and the 0 value symmetry of the output prediction representation space.
In this embodiment, a specific schematic diagram of the canvas matching process adopted by the view generation module is shown in fig. 3(a), and specifically follows:
1) constructing an initialized coordinate matrix CoriCoordinate matrix CoriIs the same as the size of the view, which is (192 ) in this embodiment. In one embodiment, the coordinate matrix CoriThe upper left corner (-96 ), the lower right corner (+96 ), the lower left corner (+96, -96), the upper right corner (-96, + 96).
2) The canvas matching process, illustrated in two-dimensional form, can be extended to three-dimensional form in the same manner. Book (I)In one embodiment, the out-of-plane depth dimension may be mapped by an offset d from any reference planea,dbCalculated as shown in fig. 3 (b).
3) After passing through the view generation module, the two views (48, 192, 192) with the overlapped part larger than 0.2 are enhanced, and the corresponding position relation information is obtained.
In this embodiment, the effect of the embedded pre-sampling module used in the twin network is as follows, as shown in fig. 4:
1) hidden layer dimension N due to projection layer and prediction layerhidLarger, 2048 in this embodiment. Therefore, if the training sample is sampled after the token calculation is completed, as shown in fig. 4(a), an over-tensor, such as R, is generated on the scale with larger resolution0The above-mentioned method generates a tensor with a size of 48 × 192 × 192 × 2048 × 4Bytes which is 13.5GB, and exceeds the video memory limit of a general arithmetic device. Embedded pre-sampling module samples N in advancesampOne sample point, 32 in this example. And mapping the sample points to the output of the U-Net according to the position corresponding information to obtain the pre-sampled embedding.
2) The tensor size involved in the later operation process is far smaller than that in the original process, such as R after pre-sampling0The size of "1 × 1 × 32 × 2048 × 4bytes ═ 128 Kb. The embedded pre-sampling is adopted to obtain the same output result after sampling, the operation process is greatly reduced, and the training is smoothly and efficiently carried out.
In this embodiment, the calculation process of the characterization consistency module is shown in fig. 5, which is specifically as follows:
1) in one embodiment, the first prediction graph includes { P }G,P4,P3,P2,P1,P0The second prediction map includes { R }G,R4,R3,R2,R1,R0}。
2) In the cosine similarity calculation flexible maximum weight set, the amplitude parameter k is adjusted to be specifically implemented as 5:
{Si,0,Si,1,…,Si,n-1}=softmax(5·{-Li,0,-Li,1,…,-Li,n-1))
in this embodiment, the characterization consistency module provides the following SGD optimizer specific coefficients:
1) training adopts a random gradient descent (SGD) (statistical gradient parameter) with Newton momentum (Nesterov momentum) and weight attenuation (weight decay) as an optimizer, wherein the momentum parameter momentum is 0.999, and the weight attenuation parameter is 3 e-5.
2) The network trains a total of 200 epochs, each epoch including 250 iterations. The initial learning rate lr (learning rate) was set to 0.01, and a poly learning rate decay strategy was employed:
Figure BDA0003047346970000091
where ep is the current epoch number, lrepRho is a poly attenuation coefficient, which is taken as the learning rate adopted by the current epoch, and is 0.9 in the specific implementation, namely the rate of the attenuation of the learning rate is changed from slow to fast along with the increase of the epoch number.
It will be understood by those skilled in the art that all or part of the processes in the system implementing the embodiments described above may be implemented by hardware related to instructions of a computer program, which may be stored in a non-volatile computer readable storage medium, and when executed, may include the processes of the embodiments of the systems described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, although the present invention is described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art; it will be understood that modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims (8)

1. A multi-scale characterization optimized medical image segmentation recognition system, comprising:
the characterization medical image preprocessing module is used for preprocessing the input single-channel characterization medical image data and outputting a normalized characterization medical image;
the segmentation recognition medical image preprocessing module is used for preprocessing the input single-channel segmentation recognition medical image data and outputting a normalized segmentation recognition medical image;
the deep convolutional neural network module comprises a deep convolutional neural network for multi-scale visual representation learning and a loss function thereof, is used for the learning of the network weight of the multi-scale representation, inputs the normalized representation medical image, and outputs the normalized representation medical image as the network initial weight after being processed by the deep convolutional neural network;
the medical image segmentation recognition network module comprises a segmentation network, and is used for training the segmentation network and predicting the segmentation result of an input image to be detected, the input image to be detected is input into the segmentation network, and the segmentation result of the image is predicted and obtained, wherein the segmentation network is obtained by initializing by adopting a trained deep convolutional neural network and training by inputting a medical image and a segmentation annotation of a training set in advance.
2. The system of claim 1, wherein the system comprises: the deep convolutional neural network is mainly composed of a view generation moduleThe device comprises a twin network, an embedded pre-sampling module and a prediction layer; the medical image is input into a view generation module to be processed to generate two first views x with overlapped partsaAnd a first view xbAnd obtaining a first view xaAnd a first view xbPosition correspondence information between
Figure FDA0003047346960000011
First view xaAnd a first view xbRespectively input into a first branch and a second branch, the first branch is formed by a twin network
Figure FDA0003047346960000012
And the prediction layer are connected in sequence, and the second branch is only composed of twin network
Figure FDA0003047346960000013
Composition, position corresponding information
Figure FDA0003047346960000014
Twin network which is processed by embedded pre-sampling module and then input into two branches
Figure FDA0003047346960000015
The outputs of the first and second branches are processed by a penalty function to output penalty values.
3. The system of claim 2, wherein the system comprises:
in the view generation module, an initial coordinate matrix C is generated by adopting canvas matching processingoriTwo spatial transformation functions enhanced by data
Figure FDA0003047346960000016
Obtaining a transformed coordinate matrix CaAnd CbObtaining a first view x by coordinate interpolationaAnd a first view xb(ii) a Will coordinateMatrix CaAnd CbMapping onto a blank canvas B results in a first view xaAnd a first view xbThe first view xaAnd a first view xbThe canvas is subjected to bit operation to obtain position corresponding information
Figure FDA0003047346960000017
4. The multi-scale characterization optimized medical image segmentation recognition system of claim 2, wherein the twin network comprises an encoder, a decoder and a projection layer connected in sequence; the encoder and the decoder correspond the size to a scale stage, and both comprise a plurality of scale stages which are connected in sequence; each scale stage is composed of two continuous convolution blocks, the view is input into the first scale stage, and each feature map is output through each scale stage;
in the encoder, the resolution scales of the characteristic diagrams output at each scale stage are gradually reduced, in the decoder, the resolution scales of the characteristic diagrams output at each scale stage are gradually increased, the resolution scales of the characteristic diagrams output at each scale stage in the encoder correspond to the resolution scales of the characteristic diagrams output at each scale stage in the decoder one by one, and each scale stage in the decoder receives the characteristic diagrams with the same resolution scale output in the encoder; obtaining a global feature map after performing global pooling operation on the feature map with the minimum resolution scale in all the feature maps output by the decoder at each scale stage, wherein the global feature map and all the feature maps output by the decoder at each scale stage are used as pre-projection feature maps which are input into a projection layer d;
the number of the projection layers d is consistent with that of the feature maps before projection, and each projection layer is formed by connecting three continuous rolling blocks;
the number of the prediction layers is consistent with that of the projection layers d in the twin network, and each prediction layer is formed by connecting two continuous convolution blocks; each convolution block is formed by sequentially connecting a 3D convolution layer, a normalization layer and an activation layer.
5. The system for multi-scale characterization-optimized medical image segmentation recognition according to claim 2, wherein an embedded pre-sampling module is added to the twin network for pre-sampling;
the embedded pre-sampling module corresponds to the information according to the position
Figure FDA0003047346960000021
Intercepting position correspondence information in a pre-projection feature map obtained by a twin network of a first branch and a second branch
Figure FDA0003047346960000022
And inputting the areas into the corresponding projection layers.
6. The system of claim 2, wherein the loss function of the deep convolutional neural network comprises a characterization consistency module trained with a loss minimization;
in the characterization consistency module, all feature maps output by processing the feature map before projection in the first branch through a projection layer and a prediction layer are taken as a first prediction map P0~Pn-1、PgAnd all the characteristic maps output by the projection layer processing of the characteristic map before projection in the second branch are taken as second prediction maps R0~Rn-1、Rg
A first prediction graph P corresponding to the global feature graphgA second prediction graph R corresponding to the global feature graphgThe cosine similarity calculation processing is carried out to obtain the global loss Lg
Aiming at the first prediction graph P except the global feature graphgEvery other first prediction graph P0~Pn-1The first prediction graph is respectively compared with a second prediction graph R except for the global characteristic graphgOther second prediction maps R0~Rn-1To eliminate the same medicineThe image targets have different sizes, and cosine similarity processing is carried out to obtain similarity loss Li,jThen using the similarity loss L corresponding to each second prediction graphi,jProcessing to obtain a flexible maximum weight Si,jThe similarity loss L corresponding to each second prediction graph is obtainedi,jBy flexible maximum weight Si,jCarrying out weighted summation to obtain the local loss of the first prediction graph, and finally carrying out weighted summation to obtain the local loss L of all the first prediction graphs0~Ln-1And a global penalty LgThe addition is done to obtain the total loss.
7. A terminal device, characterized in that the terminal device comprises:
at least one processor, at least one memory, and computer program instructions stored in the memory, which when executed by the processor, implement the medical image segmentation system of any one of claims 1-6.
8. A computer-readable storage medium, wherein a computer program is stored on the storage medium, and when the computer program is executed by a processor, the computer program implements the functions of the medical image segmentation system as claimed in any one of claims 1 to 6.
CN202110475782.4A 2021-04-29 2021-04-29 Medical image segmentation and identification system, terminal and storage medium with multi-scale representation optimization Pending CN113205523A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110475782.4A CN113205523A (en) 2021-04-29 2021-04-29 Medical image segmentation and identification system, terminal and storage medium with multi-scale representation optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110475782.4A CN113205523A (en) 2021-04-29 2021-04-29 Medical image segmentation and identification system, terminal and storage medium with multi-scale representation optimization

Publications (1)

Publication Number Publication Date
CN113205523A true CN113205523A (en) 2021-08-03

Family

ID=77029456

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110475782.4A Pending CN113205523A (en) 2021-04-29 2021-04-29 Medical image segmentation and identification system, terminal and storage medium with multi-scale representation optimization

Country Status (1)

Country Link
CN (1) CN113205523A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113821661A (en) * 2021-08-30 2021-12-21 上海明略人工智能(集团)有限公司 Image retrieval method, system, storage medium and electronic device
CN113902674A (en) * 2021-09-02 2022-01-07 北京邮电大学 Medical image segmentation method and electronic equipment
CN114049359A (en) * 2021-11-22 2022-02-15 北京航空航天大学 Medical image organ segmentation method
CN114792315A (en) * 2022-06-22 2022-07-26 浙江太美医疗科技股份有限公司 Medical image visual model training method and device, electronic equipment and storage medium
CN114842003A (en) * 2022-07-04 2022-08-02 杭州健培科技有限公司 Medical image follow-up target pairing method, device and application
WO2023119078A1 (en) * 2021-12-20 2023-06-29 International Business Machines Corporation Unified framework for multigrid neural network architecture
CN117438023A (en) * 2023-10-31 2024-01-23 灌云县南岗镇卫生院 Hospital information management method and system based on big data
CN117635942A (en) * 2023-12-05 2024-03-01 齐鲁工业大学(山东省科学院) Cardiac MRI image segmentation method based on edge feature enhancement

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180315193A1 (en) * 2017-04-27 2018-11-01 Retinopathy Answer Limited System and method for automated funduscopic image analysis
CN109345538A (en) * 2018-08-30 2019-02-15 华南理工大学 A kind of Segmentation Method of Retinal Blood Vessels based on convolutional neural networks
CN109711413A (en) * 2018-12-30 2019-05-03 陕西师范大学 Image, semantic dividing method based on deep learning
CN109858575A (en) * 2019-03-19 2019-06-07 苏州市爱生生物技术有限公司 Data classification method based on convolutional neural networks
KR20190119261A (en) * 2018-04-12 2019-10-22 가천대학교 산학협력단 Apparatus and method for segmenting of semantic image using fully convolutional neural network based on multi scale image and multi scale dilated convolution
US10482603B1 (en) * 2019-06-25 2019-11-19 Artificial Intelligence, Ltd. Medical image segmentation using an integrated edge guidance module and object segmentation network
CN110609320A (en) * 2019-08-28 2019-12-24 电子科技大学 Pre-stack seismic reflection pattern recognition method based on multi-scale feature fusion
CN111325750A (en) * 2020-02-25 2020-06-23 西安交通大学 Medical image segmentation method based on multi-scale fusion U-shaped chain neural network
CN111798462A (en) * 2020-06-30 2020-10-20 电子科技大学 Automatic delineation method for nasopharyngeal carcinoma radiotherapy target area based on CT image
CN111882560A (en) * 2020-06-16 2020-11-03 北京工业大学 Lung parenchymal CT image segmentation method based on weighted full-convolution neural network
CN112330682A (en) * 2020-11-09 2021-02-05 重庆邮电大学 Industrial CT image segmentation method based on deep convolutional neural network
AU2020103901A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field
CN112381846A (en) * 2020-12-11 2021-02-19 江南大学 Ultrasonic thyroid nodule segmentation method based on asymmetric network

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180315193A1 (en) * 2017-04-27 2018-11-01 Retinopathy Answer Limited System and method for automated funduscopic image analysis
KR20190119261A (en) * 2018-04-12 2019-10-22 가천대학교 산학협력단 Apparatus and method for segmenting of semantic image using fully convolutional neural network based on multi scale image and multi scale dilated convolution
CN109345538A (en) * 2018-08-30 2019-02-15 华南理工大学 A kind of Segmentation Method of Retinal Blood Vessels based on convolutional neural networks
CN109711413A (en) * 2018-12-30 2019-05-03 陕西师范大学 Image, semantic dividing method based on deep learning
CN109858575A (en) * 2019-03-19 2019-06-07 苏州市爱生生物技术有限公司 Data classification method based on convolutional neural networks
US10482603B1 (en) * 2019-06-25 2019-11-19 Artificial Intelligence, Ltd. Medical image segmentation using an integrated edge guidance module and object segmentation network
CN110609320A (en) * 2019-08-28 2019-12-24 电子科技大学 Pre-stack seismic reflection pattern recognition method based on multi-scale feature fusion
CN111325750A (en) * 2020-02-25 2020-06-23 西安交通大学 Medical image segmentation method based on multi-scale fusion U-shaped chain neural network
CN111882560A (en) * 2020-06-16 2020-11-03 北京工业大学 Lung parenchymal CT image segmentation method based on weighted full-convolution neural network
CN111798462A (en) * 2020-06-30 2020-10-20 电子科技大学 Automatic delineation method for nasopharyngeal carcinoma radiotherapy target area based on CT image
CN112330682A (en) * 2020-11-09 2021-02-05 重庆邮电大学 Industrial CT image segmentation method based on deep convolutional neural network
AU2020103901A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Image Semantic Segmentation Method Based on Deep Full Convolutional Network and Conditional Random Field
CN112381846A (en) * 2020-12-11 2021-02-19 江南大学 Ultrasonic thyroid nodule segmentation method based on asymmetric network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DAVID W.G MONTGOMERY等: "Fully automated segmentation of oncological PET volumes using a combined multiscale and statistical model", 《MEDICAL PHYSICS》 *
陈庚峰: "基于深度学习的医学图像分割方法的研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113821661B (en) * 2021-08-30 2024-04-02 上海明略人工智能(集团)有限公司 Image retrieval method, system, storage medium and electronic device
CN113821661A (en) * 2021-08-30 2021-12-21 上海明略人工智能(集团)有限公司 Image retrieval method, system, storage medium and electronic device
CN113902674A (en) * 2021-09-02 2022-01-07 北京邮电大学 Medical image segmentation method and electronic equipment
CN113902674B (en) * 2021-09-02 2024-05-24 北京邮电大学 Medical image segmentation method and electronic equipment
CN114049359A (en) * 2021-11-22 2022-02-15 北京航空航天大学 Medical image organ segmentation method
CN114049359B (en) * 2021-11-22 2024-04-16 北京航空航天大学 Medical image organ segmentation method
WO2023119078A1 (en) * 2021-12-20 2023-06-29 International Business Machines Corporation Unified framework for multigrid neural network architecture
US11983920B2 (en) 2021-12-20 2024-05-14 International Business Machines Corporation Unified framework for multigrid neural network architecture
CN114792315A (en) * 2022-06-22 2022-07-26 浙江太美医疗科技股份有限公司 Medical image visual model training method and device, electronic equipment and storage medium
CN114842003A (en) * 2022-07-04 2022-08-02 杭州健培科技有限公司 Medical image follow-up target pairing method, device and application
CN117438023B (en) * 2023-10-31 2024-04-26 灌云县南岗镇卫生院 Hospital information management method and system based on big data
CN117438023A (en) * 2023-10-31 2024-01-23 灌云县南岗镇卫生院 Hospital information management method and system based on big data
CN117635942A (en) * 2023-12-05 2024-03-01 齐鲁工业大学(山东省科学院) Cardiac MRI image segmentation method based on edge feature enhancement
CN117635942B (en) * 2023-12-05 2024-05-07 齐鲁工业大学(山东省科学院) Cardiac MRI image segmentation method based on edge feature enhancement

Similar Documents

Publication Publication Date Title
CN113205523A (en) Medical image segmentation and identification system, terminal and storage medium with multi-scale representation optimization
CN111767979B (en) Training method, image processing method and image processing device for neural network
CN110599528B (en) Unsupervised three-dimensional medical image registration method and system based on neural network
CN111627019B (en) Liver tumor segmentation method and system based on convolutional neural network
CN109146988B (en) Incomplete projection CT image reconstruction method based on VAEGAN
CN110930416A (en) MRI image prostate segmentation method based on U-shaped network
CN107203989A (en) End-to-end chest CT image dividing method based on full convolutional neural networks
CN112614169B (en) 2D/3D spine CT (computed tomography) level registration method based on deep learning network
CN113436211B (en) Medical image active contour segmentation method based on deep learning
WO2021017006A1 (en) Image processing method and apparatus, neural network and training method, and storage medium
CN113436237B (en) High-efficient measurement system of complicated curved surface based on gaussian process migration learning
CN115511767B (en) Self-supervised learning multi-modal image fusion method and application thereof
CN116664588A (en) Mask modeling-based 3D medical image segmentation model building method and application thereof
CN114548265A (en) Crop leaf disease image generation model training method, crop leaf disease identification method, electronic device and storage medium
CN110570394A (en) medical image segmentation method, device, equipment and storage medium
WO2021168920A1 (en) Low-dose image enhancement method and system based on multiple dose levels, and computer device, and storage medium
CN112529863A (en) Method and device for measuring bone density
CN117078692A (en) Medical ultrasonic image segmentation method and system based on self-adaptive feature fusion
Yin et al. CoT-UNet++: A medical image segmentation method based on contextual Transformer and dense connection
CN113269754A (en) Neural network system and method for motion estimation
CN117132650A (en) Category-level 6D object pose estimation method based on point cloud image attention network
CN116188396A (en) Image segmentation method, device, equipment and medium
JP7378694B2 (en) Lung lobe segmentation method based on digital human technology
CN116091412A (en) Method for segmenting tumor from PET/CT image
CN114821750A (en) Face dynamic capturing method and system based on three-dimensional face reconstruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210803

WD01 Invention patent application deemed withdrawn after publication