CN113205523A

CN113205523A - Medical image segmentation and identification system, terminal and storage medium with multi-scale representation optimization

Info

Publication number: CN113205523A
Application number: CN202110475782.4A
Authority: CN
Inventors: 钟颖; 沈海斌; 黄科杰
Original assignee: Zhejiang University ZJU; Zhejiang Lab
Current assignee: Zhejiang University ZJU; Zhejiang Lab
Priority date: 2021-04-29
Filing date: 2021-04-29
Publication date: 2021-08-03

Abstract

The invention discloses a medical image segmentation and identification system with multi-scale representation optimization, a terminal and a storage medium. The characterization medical image preprocessing module is used for preprocessing the single-channel characterization medical image and outputting a normalization result; the segmentation recognition medical image preprocessing module is used for preprocessing the single-channel segmentation recognition medical image and outputting a normalization result; the multi-scale characterization learning deep convolutional neural network module comprises a deep convolutional neural network and a loss function, and inputs the normalized characterization medical image and outputs a network initial weight after passing through the deep convolutional neural network; the medical image segmentation and identification network module comprises a segmentation network, and is used for performing segmentation network training and prediction of a segmentation result of an input image to be detected. The invention combines the feature learning from the global state to the dense layer by layer, improves the segmentation performance by combining the spatial information of each layer, is more effective on the medical image segmentation task, avoids the generation of overlarge tensors in a projection layer and a prediction layer, and ensures the invariance of the feature scale.

Description

Medical image segmentation and identification system, terminal and storage medium with multi-scale representation optimization

Technical Field

The invention belongs to an image processing system, a terminal and a storage medium in the field of medical image computer vision, and particularly relates to a multi-scale representation optimized medical image segmentation recognition system, a terminal and a storage medium.

Background

In the field of medical imaging, a crucial step is the accurate segmentation of biomedical objects, such as organs or tissues, for diagnosis, treatment planning, prognosis, etc. Manual segmentation by medical experts is time consuming and laborious, requiring a high degree of expertise. Therefore, an accurate and stable automatic segmentation system is constructed, the burden of a doctor can be effectively reduced, and the patient can be treated more quickly.

In addition to the traditional machine learning method, more and more deep learning-based methods achieve excellent performance in medical segmentation, wherein a Convolutional Neural Network (CNN) -based method is the best medical segmentation model at present. Although these networks greatly improve the performance of medical image segmentation, they all rely on large volumes of annotated medical data for fully supervised training. Since annotating medical images is very time consuming and requires a high degree of biomedical expertise, constructing large-scale annotated medical image datasets is a difficult task. A large amount of unannotated medical data is more readily available than annotation of data.

As a method that does not require manual annotation, the emerging Self-supervised Learning (SSL) is expected to perform characterization Learning (rendering Learning) using a large number of medical images without annotation. The currently proposed self-supervision characterization learning method mostly learns Global characterization (Global Representation) from image classification tasks, and is not suitable for medical image segmentation requiring pixel-level classification. Moreover, a single global Representation, such as a Local Representation or a Dense pixel Representation (Dense Representation), cannot satisfy the requirement of the diversity of the dimensions of the organ or tissue in the medical image for the Representation hierarchy. Therefore, the invention provides a multi-scale representation learning technology for medical image segmentation, which can simultaneously learn the visual representations of multiple scales from the global state to the dense pixels so as to improve the medical image segmentation performance.

Disclosure of Invention

Aiming at the defects that only a single global representation is learned in the prior art and the requirement of multi-level visual features for medical image segmentation, the invention provides a multi-scale representation optimized medical image segmentation recognition system, a terminal and a storage medium, and the image segmentation performance is improved by learning more effective multi-scale representations.

The technical scheme adopted by the invention is as follows:

and the characterization medical image preprocessing module is used for preprocessing the input single-channel characterization medical image data, such as corresponding value range truncation and normal distribution. Inputting single-channel characterization medical image data and outputting a normalized characterization medical image; the single-channel representation medical image data refers to a medical image without annotation, and the annotation is a label, and can be a CT image, for example.

And the segmentation recognition medical image preprocessing module is used for preprocessing the input single-channel segmentation recognition medical image data, such as corresponding value range truncation and normal distribution. Inputting single-channel segmentation identification medical image data and outputting a normalized segmentation identification medical image; the single-channel segmentation recognition medical image data refers to annotated medical images, and may be, for example, CT images.

The deep convolutional neural network module comprises a deep convolutional neural network for multi-scale visual representation learning and a loss function thereof, is used for the learning of the network weight of the multi-scale representation, inputs the normalized representation medical image, and outputs the normalized representation medical image as the network initial weight after being processed by the deep convolutional neural network;

the medical image segmentation recognition network module comprises a segmentation network, and is used for training the segmentation network and predicting the segmentation result of an input image to be detected, the input image to be detected is input into the segmentation network, and the segmentation result of the image is predicted and obtained, wherein the segmentation network is obtained by initializing by adopting a trained deep convolutional neural network and training by inputting a medical image and a segmentation annotation of a training set in advance.

When the medical image segmentation system is in a training mode, the input is normalized segmentation identification medical image, segmentation annotation and network initial weight, and no output exists;

when the medical image segmentation system is in the identification mode, the medical image is input as the normalized segmentation identification medical image, and the medical image is output as the medical target segmentation identification result.

Optionally, the deep convolutional neural network, as shown in fig. 2, is composed of a View Generation Module (View Generation Module), two twin networks (parameter networks) with the same topology, an embedded pre-sampling Module, a prediction layer, and a characterization Consistency Module (reconstruction Consistency Module); the medical image is input into a view generation module to be processed to generate two first views x with overlapped parts_aAnd a first view x_bAnd a first view x_aAnd a first view x_bPosition correspondence information between

First view x_aAnd a first view x_bRespectively input into a first branch and a second branch, the first branch is formed by a twin network

And the prediction layer are connected in sequence, and the second branch is only composed of twin network

Composition, position corresponding information

Twin network which is processed by embedded pre-sampling module and then input into two branches

The output of the first branch and the second branch is processed by the characterization consistency module to output loss values.

Optionally, as shown in fig. 3, in the view generating module, canvas matching processing is provided to calculate the position correspondence between the views; canvas matching Process generates an initial coordinate matrix C_oriSpatial transformation function enhanced by data

Obtaining a transformed coordinate matrix C_aAnd C_bObtaining a first view x by coordinate interpolation_aAnd a first view x_b(ii) a Coordinate matrix C_aAnd C_bMapping onto a blank canvas B results in a first view x_aAnd a first view x_bThe canvas between the two places is operated to obtain the information corresponding to the position

The specific process of the canvas matching processing is as follows:

first, an initialization coordinate matrix C is constructed_oriCoordinate matrix C_oriIs (H) the same as the size of the view_p，W_p)，H_p，W_pRespectively representing a first view x_aAnd a first view x_bHeight and width of (a).

Coordinate matrix C_oriWherein each element comprises (A, B), wherein A is a first element parameter, B is a second element parameter, and a coordinate matrix C_oriThe first element parameters A in each column are increased in an integer from left to right, and the first element parameters A of each element in the same column are the same; coordinate matrix C_oriThe second element parameters B in each row are increased in an integer from top to bottom, and the second element parameters B of each element in the same row are the same; and coordinate matrix C_oriThe element at the center position is (0, 0).

Randomly generating a space transformation function according to the following formula, and fitting a coordinate matrix C_oriPerforming spatial transformations

Coordinate matrices are obtained for two views:

wherein the content of the first and second substances,

representing a first View x_aA first spatial transformation function of;

representing a first View x_bA second spatial transformation function of (a); c_a、C_bRepresenting a first view x_aAnd a first view x_bA coordinate matrix of (a);

the spatial transformation function is generated randomly, and one or more combinations of rotation, turning, scaling and elastic deformation of any angle are selected. The first spatial transformation function and the second spatial transformation function may be the same or different.

Then, performing nearest neighbor interpolation mapping on the coordinate matrixes of the two views on the original image to obtain two enhanced views, and performing coordinate mapping and bit operation on a blank canvas to respectively obtain a first view x_aAnd a first view x_b：

x_a＝Map(C_a，X)，x_b＝Map(C_b，X)

Wherein Map () represents a coordinate mapping function, C_iRepresenting view x_iA or b, X representing the input image;

then for the first view x according to the following formula_aAnd a first view x_bObtaining position corresponding information by coordinate matrix processing

Where round () represents a rounding operation,

representing view x_iThe height coordinate vector of the coordinate matrix of (a),

representing view x_iThe width coordinate vector of the coordinate matrix of (2),

represents from zero to h_p×h_wVector of natural numbers, B_a，B_bRespectively representing a first view x_aAnd a first view x_bCanvas of (B)_iRepresenting view x_iThe boolean () represents a boolean operation, the cat () represents a concatenation operation, and the L represents location correspondence information;

representing view x_iThe coordinate position on the canvas is

The pixel point of (2); cat (B)_a，B_b)[bool(B_a&B_b)]Representing canvas B_aAnd canvas B_aThe intersection of (a) and (b) performs an extraction operation.

The canvas matching processing can accurately obtain the position correspondence between the views for any complex enhancement transformation, such as rotation and elastic deformation.

Optionally, the twin network includes an encoder, a decoder and a projection layer, which are connected in sequence, and the encoder and the decoder form a U-Net network; corresponding to the size diversity of the medical image target, the encoder and the decoder correspond the size to a scale stage which comprises a plurality of scale stages connected in sequence, each scale stage consists of two continuous convolution blocks, a view is input to the first scale stage, and respective characteristic diagrams are output through each scale stage; in the encoder, the resolution scale of the feature map output at each scale stage is gradually reduced, and can be gradually reduced by half; in the decoder, the resolution scales of the feature maps output in each scale stage are gradually increased and can be gradually doubled, and each scale stage receives the feature maps with the same resolution scale output in the encoder; obtaining a global feature map after performing global pooling operation on the feature map with the minimum resolution scale in all feature maps output by the decoder at each scale stage; the global feature map and all feature maps output by the original stages of each scale of the decoder are used as feature maps before projection; the number of the projection layers d is consistent with that of the feature maps before projection, and each projection layer is formed by connecting three continuous rolling blocks; the number of the prediction layers is consistent with that of the projection layers d in the twin network, each prediction layer is formed by connecting two continuous convolution blocks, and convolution kernels of the three convolution blocks are the same; each convolution block is formed by sequentially connecting a 3D convolution layer, a normalization layer and an activation layer.

The global representation adopted by the prior art adopts global average pooling after the deepest convolution, abandons spatial information, obtains a single vector or embeds the vector as the global representation, and has limitation on tasks such as segmentation and the like which are sensitive to the spatial information. Moreover, medical imaging is characterized by wide span of dimensions of organs and tissues, such as small esophagus dimension, inferior kidney dimension, and stomach occupying a large dimension range, and the characterization of adjacent or dense single dimension is not complete enough, and the proper dimension needs to be adaptively corresponded according to the specific size of each organ. The multi-scale representation provided by the invention combines the representation learning from the whole situation to the dense layer by layer, and combines the spatial information of each layer to improve the segmentation performance.

Optionally, an embedded pre-sampling module is added to the twin network to perform pre-sampling processing; the embedded pre-sampling module corresponds to the information according to the position

Pre-projection feature maps obtained at twin networks of a first branch and a second branchIn, intercepting position corresponding information

And inputting the areas into the corresponding projection layers. The specific process of embedding the pre-sampling module is as follows:

for position corresponding information

Random sampling is carried out, and the matching position is selected as the E pair:

P′_a，i＝Samp(P_a，i，GC_a)，R′_b，i＝Samp(R_b，1，GC_b)

where GC represents Grid coordinates of N samples after sampling, random. The Samp () function represents the upsampling of the feature map by bilinear interpolation in grid coordinates. Derived post-sampling predictive characterization P'_a，iAnd projection characterization R'_b，iHas a tensor size of [ B, 2048, 1, 1, N]. The embedded pre-sampling module samples in advance according to the position corresponding information L in front of the projection layer, so that the phenomenon that the extra-large tensor is generated in the projection layer and the prediction layer in the representation operation is avoided, and the training efficiency is improved.

Optionally, the loss function of the deep convolutional neural network includes a characterization consistency module, and the training is performed with the loss value minimized; in the characterization consistency module, each feature map of the multi-scale characterization output by processing the feature map before projection in the first branch through the projection layer and the prediction layer is taken as a first prediction map P₀～P_n-1、P_gAnd taking each multi-scale characterization feature map output by the projection layer processing of the feature map before projection in the second branch as a second prediction map R₀～R_n-1、R_g(ii) a The specific calculation process for the characterization consistency module is as follows:

a first prediction graph P corresponding to the global feature graph_gA second prediction graph R corresponding to the global feature graph_gThe cosine similarity calculation processing is carried out to obtain the global loss L_g：

Wherein | | | purple hair₂The method comprises the following steps of (1) representing an L2 norm solving operation;

aiming at the first prediction graph P except the global feature graph_gEvery other first prediction graph P₀～P_n-1The first prediction graph is respectively compared with a second prediction graph R except for the global characteristic graph_gOther second prediction maps R₀～R_n-1In order to eliminate the condition that the same medical image target has different sizes, cosine similarity processing is carried out to obtain similarity loss L_i，j：

{S_i，0，S_i，1，...，S_i，n-1}＝softmax(k·{-L_i，o，-L_i，1，...，-L_i，n-1})

Wherein k represents a parameter for adjusting the amplitude; { -L_i，0，-L_i，1，…，-L_i，n-1Denotes the set of similarity losses at each scale, { S }_i，0，S_i，1，…，S_i，n-1Represents the flexible maximum weight set of each scale loss; softmax () represents the flexible maximum operation. (ii) a

Then, the similarity loss L corresponding to each second prediction graph is used_i，jProcessing to obtain a flexible maximum weight S_i，jThe similarity loss L corresponding to each second prediction graph is obtained_i，jBy flexible maximum weight S_i，jCarrying out weighted summation to obtain the local loss of the first prediction graph, and finally carrying out weighted summation to obtain the local loss L of all the first prediction graphs_o～L_n-1And a global penalty L_gThe addition is done to obtain the total loss. Upon reversal of the gradient, the loss of similarity will truncate the backtransmission of the second branch gradient by stopping the gradient (Stop-grad) operation.

By adopting the characterization consistency module, according to the characteristic that the same medical target has different scales caused by the difference of single-channel medical image imaging instruments, the similarity is calculated through the cross-scale feature layer, so that the characterizations with different scales are automatically matched. The prediction representation of each scale can be matched with the optimal projection representation scale, and the final model has the capability of learning scale invariance representation. Through multi-scale characterization and characterization consistency loss training, the neural network can learn abundant visual characterization for medical images without explanation.

Further, the characterization consistency module provides an SGD optimizer to perform optimization of network training.

The present invention also provides a terminal device, which includes: at least one processor, at least one memory, and computer program instructions stored in the memory that, when executed by the processor, implement the multi-scale representation-based learning-optimized medical image segmentation system.

The invention also provides a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the multi-scale representation learning optimization-based medical image segmentation system.

Compared with the prior art, the invention has the beneficial effects that:

the canvas matching processing provided by the invention can calculate the complex position corresponding relation, so that the view generation can use a more complex enhancement mode: the position correspondence between the views can be accurately obtained for any complex enhancement transformation, such as rotation and elastic deformation.

The embedded pre-sampling module provided by the invention avoids the generation of overlarge tensors in a projection layer and a prediction layer so as to carry out efficient and effective training: by sampling in advance according to the position corresponding information in front of the projection layer, the ultra-large tensor caused by the larger hidden layer dimension in the projection layer and the prediction layer is avoided, and the training process is more efficiently carried out.

The characterization consistency module provided by the invention enables multi-scale characterization to be automatically matched, and ensures the scale invariance of the characterization: and selecting the projection representation of the best matching scale for the prediction representation of each scale through the flexible maximum value weight of the cosine similarity.

The invention utilizes the deep convolutional neural network to extract the representations with different sizes and spatial scales, and compared with the global representation or single scale representation in the prior art, the method is more effective in the medical image segmentation task: the global characterization adopted by the prior art has limitation on the segmentation task, the target size in the medical image has diversity, and the characterization of adjacent or dense single scales is not complete enough. The multi-scale representation provided by the invention combines the representation learning from the whole situation to the dense layer by layer, and combines the spatial information of each layer to improve the segmentation performance.

In summary, the invention combines the feature learning from the global state to the dense layer by layer, improves the segmentation performance by combining the spatial information of each layer, is more effective on the medical image segmentation task, avoids the generation of overlarge tensors in the projection layer and the prediction layer, and ensures the invariance of the feature scale.

Drawings

Fig. 1 is a flow chart of the use of the method proposed by the present invention.

Fig. 2 is a structural diagram of a deep convolutional neural network proposed by the present invention.

FIG. 3 is a schematic diagram of the canvas matching process proposed by the present invention.

Fig. 4 is a comparison graph of the effect of the embedded pre-sampling module proposed by the present invention.

FIG. 5 is a flow chart of the operations of the token consistency module according to the present invention.

Detailed Description

The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.

The embodiment of the invention and the implementation process thereof are as follows:

as shown in fig. 1, the implementation of the medical image segmentation system based on multi-scale characterization learning optimization of the present invention is as follows:

the characterization medical image preprocessing module: the input single-channel representation medical image data is preprocessed, and for an original input medical image three-dimensional CT image X, the original size (D, H, W) is (100-, 512), namely the length and width in a plane is 512 multiplied by 512 pixels, and the number of slices out of the plane is about 100. Due to the characteristic of single-channel medical image imaging such as CT equipment imaging through an X-ray attenuation rate, the effective fluctuation range of pixel/voxel values is between-1000 and 325. And (3) truncating the pixel/voxel value of the CT image to be between-1000 and 325, normalizing the pixel/voxel value to be in a normal distribution range, and outputting a normalized characterization medical image.

And the segmentation recognition medical image preprocessing module is used for preprocessing the input single-channel segmentation recognition medical image data, the input size and value range, the power output and the output are the same as those of the representation medical image preprocessing module, and the normalized segmentation recognition medical image is output.

And the multi-scale representation learning depth convolution neural network module is used for learning the network weight of the multi-scale representation.

And the medical image segmentation and identification network module is used for training a segmentation network and predicting the segmentation result of the input image to be detected. The module is set to be a training mode, a trained deep convolution neural network is adopted for initialization, and medical images and segmentation annotations of a training set are input in advance for training to obtain a segmentation network. And setting the module into an identification mode, inputting the input image to be detected into the segmentation model, and predicting to obtain the segmentation result of the image.

In this embodiment, the twin network and the prediction layer of the deep convolutional neural network have the following specific structures:

1) the applied U-Net network is a 3D version, namely 3D convolution is adopted for feature extraction. The U-Net has 6 scale stages, from the original input cutting block size [48 × 192 × 192] to the lowest [6 × 6 × 6], which are respectively represented by subscripts 0 to 5, the step sizes among the stages are [1,2,2], [2,2,2] and [1,2,2], and the convolution kernels adopted by each stage are [1, 3, 3], [3, 3, 3 ]. The base number of channels for stage 0 is 32 and multiplied by 2 in order of increasing stages, the highest number of channels ending at 320, i.e., 32, 64, 128, 256, 320.

2) Each convolution block is formed by stacking a convolution layer-Normalization layer-activation function layer, wherein the Normalization layer adopts example Normalization (Instance Normalization), and the activation function adopts Leaky ReLU with negative slope of 0.01:

3) the projection layer is constructed using a 3D convolution block of (1 × 1 × 1) convolution kernels. The structure of the convolution block of the projection layer is similar to that of the convolution block in the U-Net, three layers of convolution blocks are adopted for stacking, and the Leaky ReLU activation function layer of the last convolution block is removed so as to keep the symmetry of the 0 value represented by the projection.

4) The 3D convolution dog paste of the prediction layer whose base convolution blocks are consistent with the projection layer, passed through the (1 × 1 × 1) convolution kernel. The prediction layer is formed by stacking two layers of convolution blocks, the last layer of convolution block only contains a convolution layer, and the normalization layer and the activation function layer are removed to ensure the stability of the training process and the 0 value symmetry of the output prediction representation space.

In this embodiment, a specific schematic diagram of the canvas matching process adopted by the view generation module is shown in fig. 3(a), and specifically follows:

1) constructing an initialized coordinate matrix C_oriCoordinate matrix C_oriIs the same as the size of the view, which is (192 ) in this embodiment. In one embodiment, the coordinate matrix C_oriThe upper left corner (-96 ), the lower right corner (+96 ), the lower left corner (+96, -96), the upper right corner (-96, + 96).

2) The canvas matching process, illustrated in two-dimensional form, can be extended to three-dimensional form in the same manner. Book (I)In one embodiment, the out-of-plane depth dimension may be mapped by an offset d from any reference plane_a，d_bCalculated as shown in fig. 3 (b).

3) After passing through the view generation module, the two views (48, 192, 192) with the overlapped part larger than 0.2 are enhanced, and the corresponding position relation information is obtained.

In this embodiment, the effect of the embedded pre-sampling module used in the twin network is as follows, as shown in fig. 4:

1) hidden layer dimension N due to projection layer and prediction layer_hidLarger, 2048 in this embodiment. Therefore, if the training sample is sampled after the token calculation is completed, as shown in fig. 4(a), an over-tensor, such as R, is generated on the scale with larger resolution₀The above-mentioned method generates a tensor with a size of 48 × 192 × 192 × 2048 × 4Bytes which is 13.5GB, and exceeds the video memory limit of a general arithmetic device. Embedded pre-sampling module samples N in advance_sampOne sample point, 32 in this example. And mapping the sample points to the output of the U-Net according to the position corresponding information to obtain the pre-sampled embedding.

2) The tensor size involved in the later operation process is far smaller than that in the original process, such as R after pre-sampling₀The size of "1 × 1 × 32 × 2048 × 4bytes ═ 128 Kb. The embedded pre-sampling is adopted to obtain the same output result after sampling, the operation process is greatly reduced, and the training is smoothly and efficiently carried out.

In this embodiment, the calculation process of the characterization consistency module is shown in fig. 5, which is specifically as follows:

1) in one embodiment, the first prediction graph includes { P }_G，P₄，P₃，P₂，P₁，P₀The second prediction map includes { R }_G，R₄，R₃，R₂，R₁，R₀}。

2) In the cosine similarity calculation flexible maximum weight set, the amplitude parameter k is adjusted to be specifically implemented as 5:

{S_i，0，S_i，1，…，S_i，n-1}＝softmax(5·{-L_i，0，-L_i，1，…，-L_i，n-1))

in this embodiment, the characterization consistency module provides the following SGD optimizer specific coefficients:

1) training adopts a random gradient descent (SGD) (statistical gradient parameter) with Newton momentum (Nesterov momentum) and weight attenuation (weight decay) as an optimizer, wherein the momentum parameter momentum is 0.999, and the weight attenuation parameter is 3 e-5.

2) The network trains a total of 200 epochs, each epoch including 250 iterations. The initial learning rate lr (learning rate) was set to 0.01, and a poly learning rate decay strategy was employed:

where ep is the current epoch number, lr_epRho is a poly attenuation coefficient, which is taken as the learning rate adopted by the current epoch, and is 0.9 in the specific implementation, namely the rate of the attenuation of the learning rate is changed from slow to fast along with the increase of the epoch number.

It will be understood by those skilled in the art that all or part of the processes in the system implementing the embodiments described above may be implemented by hardware related to instructions of a computer program, which may be stored in a non-volatile computer readable storage medium, and when executed, may include the processes of the embodiments of the systems described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, although the present invention is described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art; it will be understood that modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims

1. A multi-scale characterization optimized medical image segmentation recognition system, comprising:

the characterization medical image preprocessing module is used for preprocessing the input single-channel characterization medical image data and outputting a normalized characterization medical image;

the segmentation recognition medical image preprocessing module is used for preprocessing the input single-channel segmentation recognition medical image data and outputting a normalized segmentation recognition medical image;

2. The system of claim 1, wherein the system comprises: the deep convolutional neural network is mainly composed of a view generation moduleThe device comprises a twin network, an embedded pre-sampling module and a prediction layer; the medical image is input into a view generation module to be processed to generate two first views x with overlapped parts_aAnd a first view x_bAnd obtaining a first view x_aAnd a first view x_bPosition correspondence information between

Composition, position corresponding information

The outputs of the first and second branches are processed by a penalty function to output penalty values.

3. The system of claim 2, wherein the system comprises:

in the view generation module, an initial coordinate matrix C is generated by adopting canvas matching processing_oriTwo spatial transformation functions enhanced by data

Obtaining a transformed coordinate matrix C_aAnd C_bObtaining a first view x by coordinate interpolation_aAnd a first view x_b(ii) a Will coordinateMatrix C_aAnd C_bMapping onto a blank canvas B results in a first view x_aAnd a first view x_bThe first view x_aAnd a first view x_bThe canvas is subjected to bit operation to obtain position corresponding information

4. The multi-scale characterization optimized medical image segmentation recognition system of claim 2, wherein the twin network comprises an encoder, a decoder and a projection layer connected in sequence; the encoder and the decoder correspond the size to a scale stage, and both comprise a plurality of scale stages which are connected in sequence; each scale stage is composed of two continuous convolution blocks, the view is input into the first scale stage, and each feature map is output through each scale stage;

in the encoder, the resolution scales of the characteristic diagrams output at each scale stage are gradually reduced, in the decoder, the resolution scales of the characteristic diagrams output at each scale stage are gradually increased, the resolution scales of the characteristic diagrams output at each scale stage in the encoder correspond to the resolution scales of the characteristic diagrams output at each scale stage in the decoder one by one, and each scale stage in the decoder receives the characteristic diagrams with the same resolution scale output in the encoder; obtaining a global feature map after performing global pooling operation on the feature map with the minimum resolution scale in all the feature maps output by the decoder at each scale stage, wherein the global feature map and all the feature maps output by the decoder at each scale stage are used as pre-projection feature maps which are input into a projection layer d;

the number of the projection layers d is consistent with that of the feature maps before projection, and each projection layer is formed by connecting three continuous rolling blocks;

the number of the prediction layers is consistent with that of the projection layers d in the twin network, and each prediction layer is formed by connecting two continuous convolution blocks; each convolution block is formed by sequentially connecting a 3D convolution layer, a normalization layer and an activation layer.

5. The system for multi-scale characterization-optimized medical image segmentation recognition according to claim 2, wherein an embedded pre-sampling module is added to the twin network for pre-sampling;

the embedded pre-sampling module corresponds to the information according to the position

Intercepting position correspondence information in a pre-projection feature map obtained by a twin network of a first branch and a second branch

And inputting the areas into the corresponding projection layers.

6. The system of claim 2, wherein the loss function of the deep convolutional neural network comprises a characterization consistency module trained with a loss minimization;

in the characterization consistency module, all feature maps output by processing the feature map before projection in the first branch through a projection layer and a prediction layer are taken as a first prediction map P₀～P_n-1、P_gAnd all the characteristic maps output by the projection layer processing of the characteristic map before projection in the second branch are taken as second prediction maps R₀～R_n-1、R_g；

A first prediction graph P corresponding to the global feature graph_gA second prediction graph R corresponding to the global feature graph_gThe cosine similarity calculation processing is carried out to obtain the global loss L_g；

Aiming at the first prediction graph P except the global feature graph_gEvery other first prediction graph P₀～P_n-1The first prediction graph is respectively compared with a second prediction graph R except for the global characteristic graph_gOther second prediction maps R₀～R_n-1To eliminate the same medicineThe image targets have different sizes, and cosine similarity processing is carried out to obtain similarity loss L_i，jThen using the similarity loss L corresponding to each second prediction graph_i，jProcessing to obtain a flexible maximum weight S_i，jThe similarity loss L corresponding to each second prediction graph is obtained_i，jBy flexible maximum weight S_i，jCarrying out weighted summation to obtain the local loss of the first prediction graph, and finally carrying out weighted summation to obtain the local loss L of all the first prediction graphs₀～L_n-1And a global penalty L_gThe addition is done to obtain the total loss.

7. A terminal device, characterized in that the terminal device comprises:

at least one processor, at least one memory, and computer program instructions stored in the memory, which when executed by the processor, implement the medical image segmentation system of any one of claims 1-6.

8. A computer-readable storage medium, wherein a computer program is stored on the storage medium, and when the computer program is executed by a processor, the computer program implements the functions of the medical image segmentation system as claimed in any one of claims 1 to 6.