CN117952995A

CN117952995A - Cardiac image segmentation system capable of focusing, prompting and optimizing

Info

Publication number: CN117952995A
Application number: CN202410353490.7A
Authority: CN
Inventors: 张彩明; 房乐鑫; 李瑛�; 李雪梅
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2024-03-27
Filing date: 2024-03-27
Publication date: 2024-04-30
Anticipated expiration: 2044-03-27
Also published as: CN117952995B

Abstract

The invention relates to the technical field of image segmentation, and discloses a heart image segmentation system capable of focusing prompt optimization, which is used for acquiring heart images to be segmented, performing image enhancement processing on the heart images to be segmented, and performing image blocking processing on the enhanced images; inputting the segmented image into a trained heart image segmentation model to obtain a segmented heart region image; performing prompt coding on the enhanced image after training to obtain coded prompts; task focusing is carried out on the coded prompt and the segmented image, and a task focusing prompt is generated; extracting a key feature map from the segmented image based on the task focusing prompt; the segmentation pre-measurement head segments the key feature map to obtain a segmented heart region image. The invention is a heart image segmentation system with high efficiency of parameters and calculation, promotes early diagnosis, treatment planning and curative effect evaluation of heart diseases, and is suitable for clinical environment with limited calculation resources.

Description

Cardiac image segmentation system capable of focusing, prompting and optimizing

Technical Field

The invention relates to the technical field of image segmentation, in particular to a heart image segmentation system capable of focusing prompt and tuning.

Background

The statements in this section merely relate to the background of the present disclosure and may not necessarily constitute prior art.

With the rapid development of medical imaging technology, computed Tomography (CT) has become an indispensable tool in clinical diagnosis and disease research, especially in the field of cardiology. CT images can provide high resolution views of cardiac structures, which are critical for diagnosis, treatment planning, and efficacy assessment of cardiac disease. However, cardiac segmentation is a challenging task due to the complexity of the cardiac structure and differences in the contrast of the cardiac tissue from surrounding tissue in CT images.

In recent years, along with the rapid improvement of the hardware performance of a computer and the development of deep learning, viT serves as the first work of applying a pure transducer architecture to image classification, and provides a new solution for the construction of a heart segmentation system. Combining ViT research results, such as application publication number: CN116228791A, CN116630234a, et al, demonstrated significant progress. However, with the continued increase in ViT performance, the parameters of the corresponding model also increased sharply from the initial 5M at ViT (s/16) to 1843M at ViT (G/14), and the calculated amount also increased sharply from 2.0GFLOPs at ViT (s/28) to 2859.9GFLOPs at ViT (G/14). The enormous amount of parameters and computation place a great burden on the training process of the heart segmentation system, which is a not insignificant challenge for a clinical environment with limited computational power.

Under the background, developing a CT image heart segmentation system with efficient parameters and efficient calculation has important research and application values. First, efficient parameters means that the model can reduce the number of parameters while maintaining high segmentation accuracy, which facilitates rapid training and updating of the model, reducing the need for storage space. Secondly, computational efficiency means that the model can process images quickly in the reasoning phase, which is crucial to improve the efficiency and response speed of the clinical workflow, especially when a diagnostic decision needs to be made quickly in an emergency. In addition, an efficient cardiac segmentation system can facilitate early diagnosis and treatment planning of cardiac disease, and doctors can better understand the extent of lesions and the effect of treatment through accurate reconstruction and analysis of cardiac structures. However, facing how to develop such a system, the following problems currently exist: (1) ViT have strong capabilities in general visual feature extraction, but their performance advantages benefit from pre-training over large amounts of natural image data. However, there are large differences between cardiac CT images and natural images, including differences in color, texture, contrast, etc., so the direct use of feature representations in pre-trained models is often not suitable for cardiac segmentation tasks. (2) The existing parameter efficient transfer learning method can only reduce the number of trainable parameters and cannot realize efficient calculation, so that a large amount of calculation burden is caused. (3) The existing parameter efficient transfer learning method cannot be used for focusing and aligning efficiently aiming at the huge difference between the heart image and the natural image, so that the performance is low.

Disclosure of Invention

In order to solve the defects of the prior art, the invention provides a heart image segmentation system capable of focusing prompt optimization, which realizes a CT image heart segmentation system with high efficiency of parameters and high computation so as to promote early diagnosis, treatment planning and curative effect evaluation of heart diseases and simultaneously adapt to clinical environments with limited computation resources.

A focused hinted optimized cardiac image segmentation system comprising: a preprocessing module configured to: acquiring a heart image to be segmented, performing image enhancement processing on the heart image to be segmented, and performing image blocking processing on the enhanced image; an image segmentation module configured to: inputting the segmented image into a trained heart image segmentation model to obtain a segmented heart region image; wherein the trained cardiac image segmentation model comprises: the N encoders are sequentially connected, the input end of each encoder is connected with the output end of the prompt focusing module, and the output end of the last encoder is connected with the input end of the segmentation prediction head module; a prompt focusing module configured to: performing prompt coding on the enhanced image to obtain a coded prompt; task focusing is carried out on the coded prompt and the segmented image, and a task focusing prompt is generated; an encoder configured to: extracting a key feature map from the segmented image based on the task focusing prompt; a segmentation pre-header module configured to: and (5) segmenting the key feature map to obtain segmented heart region images.

The technical scheme has the following advantages or beneficial effects: the invention essentially provides a high-efficiency focusing prompt optimizing technology which is specially designed for CT heart image segmentation tasks.

(1) The inherent difference between the heart CT image and the natural image is overcome: the invention effectively solves the problem that the heart CT image and the natural image have obvious difference in the image domain. The method is realized through cooperation of a prompt focusing module generation module and a prompt-global interaction mechanism. Through the prompt focusing module generation module, the system can generate prompts related to cardiac CT image data. Through a prompt-global interaction mechanism, parameters irrelevant to heart segmentation in parameters obtained by natural image pre-training can be removed, the possibility of negative migration of model performance is reduced, and the model is better adapted to the characteristics of heart CT images.

(2) Parameter high efficiency: different from the traditional training mode of directly using pre-training parameters to initialize and retrain all parameters, the method generates the prompt aiming at the specific medical image segmentation task through the prompt focusing module, only trains the prompt parameters in the model training process, freezes the pre-training parameters, and enables the model to greatly reduce the parameter number on the premise of not sacrificing the segmentation precision. The optimization not only accelerates the training and updating process of the model, but also obviously reduces the storage requirement of the model, so that the model is more suitable for the clinical environment with limited resources.

(3) Calculation efficiency: by introducing the prompt-global interaction module, the method and the device remarkably improve the speed of processing the image by the model in the reasoning stage. The module optimizes the recognition and learning process of the model to the heart CT image characteristics through an efficient global information and prompt information interaction mechanism, so that the model can complete the image segmentation task in a short time.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

Fig. 1 is a functional block diagram of a system according to the present invention.

Fig. 2 is an internal structure diagram of a heart image segmentation model according to the present invention.

Fig. 3 is a block diagram showing the internal functions of the focusing module according to the present invention.

Fig. 4 is an internal structure diagram of a hint encoding unit according to the present invention.

Fig. 5 is a schematic diagram of the internal structure of a first encoder according to the present invention.

Fig. 6 is an internal structure diagram of a prompt and global feature interaction layer proposed by the present invention.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

In a first embodiment, as shown in fig. 1, the present embodiment provides a heart image segmentation system with focusing prompt tuning, including: a preprocessing module configured to: acquiring a heart image to be segmented, performing image enhancement processing on the heart image to be segmented, and performing image blocking processing on the enhanced image; an image segmentation module configured to: and inputting the segmented image into a trained heart image segmentation model to obtain a segmented heart region image.

As shown in fig. 2, the trained cardiac image segmentation model includes: the N encoders are sequentially connected, the input end of each encoder is connected with the output end of the prompt focusing module, and the output end of the last encoder is connected with the input end of the segmentation prediction head module; a prompt focusing module configured to: performing prompt coding on the enhanced image to obtain a coded prompt; task focusing is carried out on the coded prompt and the segmented image, and a task focusing prompt is generated; an encoder configured to: extracting a key feature map from the segmented image based on the task focusing prompt; a segmentation pre-header module configured to: and (5) segmenting the key feature map to obtain segmented heart region images.

Further, the acquiring of the cardiac image to be segmented is collecting CT image data from a medical imaging device. These images provide detailed views of the heart and its surrounding structures, providing the basis for subsequent analysis.

Further, the image enhancement processing of the cardiac image to be segmented specifically includes: for each incoming CT imageAnd corresponding tag/>It is first determined whether to perform random rotation and flipping. The decision is made by random number/>The control is carried out such that,For random numbers between 0 and 1 generated randomly, if/>Then random rotation and flipping operations are performed; Otherwise, performing a random rotation operation/>, on the image and the tag; Wherein,Representing random rotation and flip operations,/>Representing a random rotation operation.

The beneficial effects of the technical scheme are as follows: the image is enhanced by random rotation or flipping to enhance the contrast between the heart region and surrounding tissue.

Further, the image blocking processing for the enhanced image specifically includes: in the process of blocking the heart CT image, convolution operation is used for realizing the purpose of image blocking; assume an input imageIs/>Wherein/>Is the channel number,/>And/>The height and width of the image, respectively, using a convolution kernel size/>And a convolution layer of step size 16 to extract the globally coded tokens for the image block: /(I)。/>Globally encoding tokens for image blocks,/>Representing convolution operations,/>Representing the parameters to be learned by the convolutional layer.

The beneficial effects of the technical scheme are as follows: the enhanced image is segmented into image blocks for input into a subsequent model.

Further, the N encoders connected in sequence means a first encoder, a second encoder, a third encoder, a fourth encoder, a fifth encoder, a sixth encoder, a seventh encoder, an eighth encoder, a ninth encoder, a tenth encoder, an eleventh encoder, and a twelfth encoder connected in sequence.

Further, as shown in fig. 3, the prompt focusing module includes: the prompt coding unit and the prompt generating unit are connected in sequence; the prompt coding unit is used for realizing the prompt coding of the enhanced image to obtain a coded prompt; the prompt generation unit is used for carrying out task focusing on the coded prompts and the segmented images and generating task focusing prompts.

It will be appreciated that the prompt encoding unit encodes prompts associated with the input data to enable the model to better understand and adapt to the medical image characteristics. The prompt generation unit is used for generating a prompt focusing on the heart segmentation task, and reducing the interference of irrelevant information by focusing on the prompt of the specific task so as to improve the performance of the model on the specific task.

Further, as shown in fig. 4, the hint encoding unit includes: the first light-weight convolution layer, the asymmetric convolution layer and the second light-weight convolution layer are sequentially connected; wherein the first light-weight convolution layer and the second light-weight convolution layer are implemented by depth separable convolution (DEPTHWISE SEPARABLE CONVOLUTION); a depth separable convolution comprising: depth-wise convolution (DEPTHWISE CONVOLUTION) and point-wise convolution (pointwise convolution).

First, given an input imagePooling and feature extraction by a first lightweight convolution layer to obtain extracted features/>：/>; Wherein/>Is a lightweight convolution operation,/>Is a convolution parameter,/>Is the convolution kernel size and step size. Next, an asymmetric convolution layer is applied to capture the spatial features/>：; Wherein/>And/>Are respectively/>And/>A parameter of a convolution kernel convolution; finally, acquiring the cue codes/>, through the second light convolution layer:; Wherein/>Is a convolution parameter,/>Is the convolution kernel size and step size.

This series of operations will input an imageConversion to a set of consecutive/>-Dimensional hint encodingEach code represents a portion of a feature of the image, data-driven hints for downstream tasks. Set/>、/>Prompt length/>。

It should be appreciated that the depth separable convolution (DEPTHWISE SEPARABLE CONVOLUTION) is a well-known technique in the art and will not be described further herein. The beneficial effects of the technical scheme are as follows: in the data dependent hint encoding stage, a lightweight convolutional block is used to initialize downstream data driven hints. This approach may result in a hint code associated with the input cardiac data, as compared to a random initialization strategy.

It should be appreciated that the asymmetric convolution layer captures spatial features by using convolution kernels of different shapes while reducing the number of parameters and computational complexity of the model. Asymmetric convolution includes convolution kernels in the horizontal and vertical directions, e.g., 1 x 5 and 5 x 1, which can capture features in the horizontal and vertical directions, respectively.

Further, the prompt generation unit is configured to perform task focusing on the coded prompt and the segmented image, and generate a task focusing prompt, and specifically includes: first, theInput global encoding token of layer/>Length is/>Expressed asWherein/>；/>Represents the/>Layer/>Personal token,/>Representation/>Real space of dimension,/>Is the characteristic dimension of the token, N represents a natural number set.

Employing an attention-based focusing strategy to facilitate input cuesAnd input markers/>Interaction between them, thereby generating task focus cues/>。

Task focus cuesThe calculation method of (2) is as follows: /(I); Wherein/>Representation/>And (3) operating.

Thus, from the pre-trained model) Visual features associated with downstream tasks are learned.

Further, the N encoders connected in sequence are respectively: the first encoder, the second encoder, the third encoder, the fourth encoder, the fifth encoder, the sixth encoder, the seventh encoder, the eighth encoder, the ninth encoder, the tenth encoder, the eleventh encoder and the twelfth encoder are sequentially connected; the internal structure of the above twelve encoders is the same.

As shown in fig. 5, the first encoder includes: the system comprises a first normalization layer, a prompt and global feature interaction layer, a first adder, a second normalization layer, a multi-layer perceptron and a second adder which are connected in sequence; the input end of the first normalization layer is used for inputting a global coding tokenThe input end of the first adder is also input with a global coding token/>The input end of the prompt and global feature interaction layer is used for inputting task focusing prompts/>; The input end of the second adder is connected with the output end of the first adder, and the output end of the second adder is the output end of the first encoder.

Further, the first normalization layer normalizes the input global code token to obtain a normalized global code token, inputs the normalized global code token to a prompt and global feature interaction layer, and carries out interactive learning on heart image features and task focusing prompt features to obtain interactive features; summing the interactive features and the global code token to obtain a first summation result; normalizing the first summation result to obtain a normalized summation result; carrying out nonlinear expression on the normalized summation result by adopting a multi-layer perceptron to obtain a nonlinear expression result; and adding the nonlinear expression result with the first summation result again to obtain a second summation result, and taking the second summation result as an output value of the first encoder.

Further, the firstAn encoder for: receiving a globally encoded token/>And task focus cues/>；/>The value range of (2) is 1-12; wherein the global encoding token passes through a first normalization layer: /(I); Wherein,Representing a first normalization layer; the obtained eigenvector/>And task focus cues/>Inputting the prompt and the global feature interaction layer; the prompt and global feature interaction layer comprises the following working processes: /(I)；；/>；/>; Wherein,Is a weight matrix of queries, keys, values,/>Representing an output value of the normalization layer; is the output of the task focus hint after interaction with the global coded token.

Outputting prompts and global features in interaction layerWith global coded token/>After residual connection, input to the second normalization layer/>：/>; Wherein/>Representing an output value of the second normalization layer; /(I)Representing a normalization layer; then processing by a multi-layer perceptron: /(I)Wherein/>Representing the output value of the multi-layer perceptron; will/>And/>Residual connection is performed to obtain an encoder output:。

It should be appreciated that unlike global feature exchange implemented by multiple layers of attention layers in Vision Transformer (ViT), participation in parameters in the pre-trained model that are not related to downstream tasks in the model learning process is minimized, thereby reducing negative migration from the natural image domain to the cardiac image domain. In a specific implementation, a computationally efficient prompt and global feature interaction layer is provided in place of the ViT attention interaction layer.

Further, as shown in fig. 6, the prompt and global feature interaction layer includes: a first linear layer, a second linear layer, and a third linear layer in parallel; the input end of the first linear layer is used for inputting a task focusing prompt; the input end of the second linear layer is used for inputting a task focusing prompt; the input end of the third linear layer is used for inputting a global coding token; the output end of the first linear layer is connected with the input end of the first multiplier; the output end of the second linear layer and the output end of the third linear layer are connected with the input end of the second multiplier; the output end of the second multiplier is connected with the input end of the activation function layer; the output end of the activation function layer is connected with the input end of the first multiplier; the output of the first multiplier outputs the interaction characteristic.

Cut apart the head module of prediction, the working process includes: wherein/> Parameters representing upsampling convolution,/>Representing upsampling operations,/>Representing convolutional layer parameters,/>A convolution operation is represented and is performed,Representing an activation function; the segmentation pre-header module includes a plurality of convolution layers and up-sampling operations to match its size to the input image.

In the training process, the pre-training parameters of twelve encoders are frozen, only a small amount of parameters of the prompt encoding unit, the prompt generating unit and the segmentation head prediction module are reserved, and the segmentation model training is completed until the error converges.

Further, the step of inputting the segmented image into a trained heart image segmentation model to obtain a segmented heart region image, and the training process of the model comprises the following steps: constructing a training set and a testing set; the training set and the test set are heart images with known heart image segmentation results; inputting the training set into a heart image segmentation model, and training the model to obtain a heart image segmentation model after preliminary training; and inputting the test set into the heart image segmentation model after preliminary training, testing the model, and when the test evaluation indexes all meet the set threshold range, indicating that the training and the testing are finished, thereby obtaining the heart image segmentation model after training.

Further, the test evaluation index specifically includes: DICE coefficient, average cross-over ratio (mIoU), 95% Hausdorff distance (95 HD) and Average Surface Distance (ASD).

The following are the definition and calculation methods of these indexes: the DICE coefficient (also called F1 score) is an index that measures the similarity of two samples, and is commonly used to evaluate the accuracy of image segmentation. For heart segmentation tasks, the DICE coefficient calculation formula is: ; wherein/> Representing the predicted segmentation result,/>Representing a true segmentation tag,/>Is the number of pixels intersected by the two,/>And/>The number of pixels in the prediction and the real segmentation, respectively.

The average intersection ratio is another important index for evaluating the model performance in the image segmentation task, and calculates the ratio of the average intersection to the union between the predicted segmentation and the real segmentation: ; wherein/> Is the number of categories/>And/>Respectively for the/>The prediction segmentation result and the real label of the class.

The Hausdorff distance is an indicator that measures the maximum possible distance between two sets of points. In image segmentation, a 95% Hausdorff distance (95 HD) is typically used to measure the distance between the predicted and true boundaries, ignoring the largest 5% outlier to reduce the effect of noise. The calculation formula is as follows:； ; wherein/> Is a dot/>To the point/>Euclidean distance of/>Representing the 95 percentile of the minimum distances.

Further, the Average Surface Distance (ASD), which is an average of the surface distances between the estimated segmentation and the real segmentation, is an average of the distances between all pairs of points:。

the indexes comprehensively evaluate the performance of the model on the cardiac CT image segmentation task, including accuracy, consistency and similarity between the prediction boundary and the real boundary.

Model training is performed by freezing part of the pre-training parameters. The general knowledge of the pre-training model is reserved, meanwhile, the number of training parameters is reduced, the training process is accelerated, and the risk of overfitting is reduced.

And generating and storing a segmentation result graph, and displaying an evaluation index. The segmentation effect of the model is intuitively displayed, so that a doctor can conveniently make diagnosis and treatment decisions, and meanwhile, quantitative evaluation of the model performance is provided, and further optimization and adjustment of the model are facilitated.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A focused prompt optimizing heart image segmentation system, comprising:

A preprocessing module configured to: acquiring a heart image to be segmented, performing image enhancement processing on the heart image to be segmented, and performing image blocking processing on the enhanced image;

An image segmentation module configured to: inputting the segmented image into a trained heart image segmentation model to obtain a segmented heart region image;

Wherein the trained cardiac image segmentation model comprises: the N encoders are sequentially connected, the input end of each encoder is connected with the output end of the prompt focusing module, and the output end of the last encoder is connected with the input end of the segmentation prediction head module;

A prompt focusing module configured to: performing prompt coding on the enhanced image to obtain a coded prompt; task focusing is carried out on the coded prompt and the segmented image, and a task focusing prompt is generated;

An encoder configured to: extracting a key feature map from the segmented image based on the task focusing prompt;

A segmentation pre-header module configured to: and (5) segmenting the key feature map to obtain segmented heart region images.

2. The heart image segmentation system with focusing prompt tuning as claimed in claim 1, wherein the image enhancement processing is performed on the heart image to be segmented, specifically comprising:

For each input cardiac image And corresponding tag/>First by random number/>Control determines whether to perform random rotation and flipping,/>For random numbers between 0 and 1 generated randomly, if/>Then random rotation and flipping operations are performed; Otherwise, performing a random rotation operation/>, on the image and the tag; Wherein,Representing random rotation and flip operations,/>Representing a random rotation operation.

3. The cardiac image segmentation system with focus-able prompt tuning as set forth in claim 1, wherein the image segmentation process for the enhanced image specifically comprises:

in the process of blocking the heart image, convolution operation is used for achieving the purpose of image blocking;

assume an input image Is/>Wherein/>Is the channel number,/>And/>The height and width of the image, respectively, using a convolution kernel size/>And a convolution layer of step size 16 to extract the globally coded tokens for the image block: /(I)；/>Globally encoding tokens for image blocks,/>Representing convolution operations,/>Representing the parameters to be learned by the convolutional layer.

4. The cardiac image segmentation system with focus-able prompt tuning of claim 1, wherein the prompt focusing module comprises: the prompt coding unit and the prompt generating unit are connected in sequence;

The prompt coding unit is used for realizing the prompt coding of the enhanced image to obtain a coded prompt;

the prompt generation unit is used for carrying out task focusing on the coded prompts and the segmented images and generating task focusing prompts.

5. The cardiac image segmentation system with focused prompt tuning as recited in claim 4 in which the prompt encoding unit comprises: the first light-weight convolution layer, the asymmetric convolution layer and the second light-weight convolution layer are sequentially connected;

First, given an input image Pooling and feature extraction by a first lightweight convolution layer to obtain extracted features/>：

；

Wherein,Is a lightweight convolution operation,/>Is a convolution parameter,/>Is the convolution kernel size and step size;

next, an asymmetric convolution layer is applied to capture the spatial features ：

；

Wherein,And/>Are respectively/>And/>A parameter of a convolution kernel convolution;

finally, obtaining the prompt code through the second light convolution layer :

；

Wherein,Is a convolution parameter,/>Is the convolution kernel size and step size.

6. The cardiac image segmentation system with focus-adjustable cue optimization as set forth in claim 4, wherein the cue generation unit is configured to perform task focusing on the encoded cue and the segmented image to generate a task focusing cue, and specifically comprises:

First, the Input global encoding token of layer/>Length is/>Expressed as/>；Represents the/>Layer/>Personal token,/>Representation/>Real space of dimension,/>Is the feature dimension of the token, N represents a natural number set;

Employing an attention-based focusing strategy to facilitate input cues And input markers/>Interaction between them, thereby generating task focus cues/>；

Task focus cuesThe calculation method of (2) is as follows:

；

Wherein, Representation/>And (3) operating.

7. The cardiac image segmentation system with focusing prompt tuning as set forth in claim 1, wherein the N encoders connected in sequence are respectively: the first encoder, the second encoder, the third encoder, the fourth encoder, the fifth encoder, the sixth encoder, the seventh encoder, the eighth encoder, the ninth encoder, the tenth encoder, the eleventh encoder and the twelfth encoder are sequentially connected; the internal structures of the twelve encoders are the same, and the first encoder comprises:

The system comprises a first normalization layer, a prompt and global feature interaction layer, a first adder, a second normalization layer, a multi-layer perceptron and a second adder which are connected in sequence; the input end of the first normalization layer is used for inputting a global coding token The input end of the first adder is also input with a global coding token/>The input end of the prompt and global feature interaction layer is used for inputting task focusing prompts/>; The input end of the second adder is connected with the output end of the first adder, and the output end of the second adder is the output end of the first encoder.

8. The focus-able, prompt-optimized heart image segmentation system as set forth in claim 7, further comprisingAn encoder for:

Receiving globally encoded tokens And task focus cues/>；/>The value range of (2) is 1-12;

Wherein the global encoding token passes through a first normalization layer:

；

Wherein, Representing a first normalization layer;

the obtained characteristic vector And task focus cues/>Inputting the prompt and the global feature interaction layer;

the prompt and global feature interaction layer comprises the following working processes:

；

Wherein, Is a weight matrix of queries, keys, values,/>Representing an output value of the normalization layer;

the task focusing prompt is output after interaction with the global coding token;

outputting prompts and global features in interaction layer With global coded token/>After residual connection, input to the second normalization layer/>：

；

Wherein,Representing an output value of the second normalization layer; /(I)Representing a normalization layer;

Then processing by a multi-layer perceptron:

，

Wherein, Representing the output value of the multi-layer perceptron;

Will be And/>Residual connection is performed to obtain an encoder output:

。

9. A heart image segmentation system with focused prompt tuning as in claim 7 or 8, wherein the prompt and global feature interaction layer comprises:

A first linear layer, a second linear layer, and a third linear layer in parallel; the input end of the first linear layer is used for inputting a task focusing prompt; the input end of the second linear layer is used for inputting a task focusing prompt; the input end of the third linear layer is used for inputting a global coding token; the output end of the first linear layer is connected with the input end of the first multiplier; the output end of the second linear layer and the output end of the third linear layer are connected with the input end of the second multiplier; the output end of the second multiplier is connected with the input end of the activation function layer; the output end of the activation function layer is connected with the input end of the first multiplier; the output of the first multiplier outputs the interaction characteristic.

10. The heart image segmentation system with focusing prompt optimization as set forth in claim 1, wherein the segmentation prediction head module comprises:

；

Wherein, Parameters representing upsampling convolution,/>Representing upsampling operations,/>Representing convolutional layer parameters,/>Representing convolution operations,/>Representing an activation function.