CN117314932A

CN117314932A - Token pyramid-based pancreatic bile duct segmentation method, model and storage medium

Info

Publication number: CN117314932A
Application number: CN202311169169.5A
Authority: CN
Inventors: 曾宪晖; 蒋卫丽; 袁湘蕾; 李佳文
Original assignee: West China No 4 Hospital Of Sichuan University West China Occupational Hospital Of Sichuan University
Current assignee: West China No 4 Hospital Of Sichuan University West China Occupational Hospital Of Sichuan University
Priority date: 2023-09-12
Filing date: 2023-09-12
Publication date: 2023-12-29
Anticipated expiration: 2043-09-12
Also published as: CN117314932B

Abstract

The invention discloses a method, a model and a storage medium for dividing a pancreatic bile duct based on a token pyramid, which comprise construction of a pancreatic bile duct data set and data augmentation; training a pre-constructed pancreatic bile duct segmentation model; training and judging the pancreatic bile duct data; training new pancreatic bile duct data by utilizing a final pancreatic bile duct segmentation model to obtain a final segmentation result; from the image processing point of view, the invention provides a novel characteristic pyramid structure which can dynamically integrate local and global dependency relations, guide the neural network to more accurately output the pancreatic bile duct characteristics with scale perception, further improve the generalization capability of the model and effectively assist doctors to cope with the difficult problem of blind intubation.

Description

Token pyramid-based pancreatic bile duct segmentation method, model and storage medium

Technical Field

The invention relates to the technical field of image processing of deep learning, in particular to a token pyramid-based pancreatic bile duct segmentation method, a token pyramid-based pancreatic bile duct segmentation model and a token pyramid-based pancreatic bile duct segmentation storage medium.

Background

Endoscopic retrograde cholangiopancreatography (endoscopic retrograde cholangiopancreatography, ERCP) is an important tool for the treatment of biliary and pancreatic diseases. During ERCP, the endoscope is advanced to the duodenal segment to find the duodenal papilla, and a sphincter cutter or catheter is used to perform intubation into the common bile duct or pancreatic duct.

However, ERCP is difficult to operate and has a long learning time, and a beginner is usually required to perform an operation under the supervision and guidance of a doctor with abundant experience. Among these, the critical step is intubation, the correct intubation being critical to the success of the procedure. However, anatomical variations in the nipple, uncertainty in the running of bile and pancreatic ducts behind the nipple, blind insertion by the physician while intubating (the nipple is visible only under direct view of the endoscope, and the fluoroscopic judgment is opened after insertion), etc., can lead to problems with "blind" intubations such as prolonged intubate, repeated intubations, erroneous entry (e.g., the bile duct should be entered but the pancreatic duct is entered and contrast agent injected), etc., and are highly correlated with the occurrence of surgical complications.

Disclosure of Invention

The invention aims to provide a method for segmenting a pancreatic bile duct based on a token pyramid, a model and a storage medium. The network can divide the pancreatic bile duct, thereby providing judgment basis for doctors.

In order to achieve the above purpose, the present invention provides the following technical solutions:

in one aspect, a method for pancreatic bile duct segmentation based on token pyramid includes

Collecting pancreatic bile duct data, and preprocessing the pancreatic bile duct data to obtain a pancreatic bile duct data set;

training a pre-constructed pancreatic bile duct segmentation model by using a pancreatic bile duct data set;

inputting the image of the pancreatic bile duct to be segmented into a trained pancreatic bile duct segmentation model to obtain a pancreatic bile duct segmentation map.

In a preferred embodiment, the pancreatic bile duct data preprocessing comprises three-dimensional labeling of the pancreatic bile duct data, and normalization processing and data augmentation are carried out on the labeled pancreatic bile duct data.

In a preferred embodiment, training a pre-constructed pancreatic bile duct segmentation model with a pancreatic bile duct dataset comprises:

generating a series of local scale tokens through a pancreatic bile duct data set, assimilating all scale tokens to obtain a feature pyramid, and extracting global scale perception semantics G from the feature pyramid;

the local scale token generates local attention features through a convolution layer and batch normalization;

the global scale perception semantic G sequentially generates semantic weights after up-sampling, convolution, batch normalization and sigmoid layers;

the local attention feature and the semantic weight are multiplied element by element to obtain a gating integration feature F ⁱ ；

Integrating different scale gating features F ⁱ And (3) obtaining a final pancreatic bile duct data segmentation map through convolution and normalization after the high-scale characteristic information and the low-scale characteristic information are adjusted to be consistent.

In a preferred embodiment, assimilating all scale tokens to obtain a feature pyramid comprises:

a series of local scale tokens { T } ¹ ，T ² ，…，T ^N Average pooling to the same target size;

using filter dynamics L ⁱ To modulate the multi-scale features to obtain modulation features T of the ith layer ^i' The calculation formula is as follows: ti (Ti) ^' ＝pool(T ⁱ )⊙L ⁱ Wherein pool is average pooling, L ⁱ Is a learnable filter;

adopting cascading multi-scale dynamic feature aggregation to obtain a feature pyramid Z,the calculation formula is as follows: z=concat (T ^1' ，T ^2' ，…，T ^N' ) Wherein T is ^1' ,T ^2' ,...,T ^N' The modulation characteristics are 1 st, 2 nd, … th and N th, and Concat is the characteristic and spliced element by element.

In a preferred embodiment, the integration feature F is gated ⁱ The calculation formula is as follows:

wherein the local scale token { T } ¹ ，T ² ，…，T ^N Global semantics G, conv_bn is a 1 x 1 convolution layer and batch normalization. upsampling is an upsampling. F (F) ⁱ ' is a local feature subjected to semantic weight modulation,for multiplication element by element>Is added element by element.

In a preferred embodiment, after outputting the final pancreatic bile duct data segmentation map, training the model by using a dice loss sparse and cross entropy loss function to obtain a final pancreatic bile duct segmentation model.

In a second aspect, a token pyramid-based pancreatic bile duct segmentation model includes:

the token pyramid transducer module is used for extracting and splicing different scale feature layers and capturing context-related scale perception information; comprising a plurality of encoded blocks and a plurality of stacked Transformer blocks, wherein each encoded block comprises a plurality of three-dimensional volumes and at least one maximum pooling;

the pancreatic bile duct gating integration module is used for dynamically fusing the local token and the global semantic information through a gating structure;

and the pancreatic bile duct global integration segmentation module is used for effectively fusing segmentation results of decoding blocks with different scales.

In a third aspect, an electronic device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the pancreatic bile duct segmentation method described above when executing the computer program.

In a fourth aspect, a computer readable storage medium stores a computer program which, when executed by a processor, implements the steps of the pancreatic bile duct segmentation method described above.

Compared with the prior art, the invention has the following beneficial effects:

(1) According to the invention, by introducing the token pyramid method, the features of the pancreatic bile duct data are deeply enriched, and tokens from different scales are fused into input, so that highly enriched scale perception semantic information is obtained.

(2) The invention utilizes the outstanding capability of the transducer in the aspect of remote self-attention to construct a strong layering characteristic system, which has a vital role in the field of pancreatic bile duct segmentation.

(3) The invention introduces a pancreatic bile duct gating integration module, and precisely controls the information transmission direction through an intelligent gating function. Meanwhile, the method effectively merges the local features and the global features under each scale, thereby skillfully avoiding the loss of feature information.

(4) From the view point of image processing, the invention provides a novel characteristic pyramid structure which can dynamically integrate local and global dependency relations and guide a neural network to more accurately output the pancreatic bile duct characteristics with scale perception. This further enhances the generalization ability of the model, strongly assisting the physician in coping with "blind" cannula challenges.

(5) Aiming at the model which has completed training, the accurate segmentation of the pancreatic bile duct can be rapidly realized, not only the manpower and material resources required by marking the pancreatic bile duct are saved, but also the learning efficiency of Endoscopic Retrograde Cholangiography (ERCP) is greatly improved, the success rate of intubation is further improved, and the incidence rate of operation complications is reduced.

Drawings

For a clearer description of embodiments of the invention or of the prior art, the drawings that are necessary for the description of the embodiments or of the prior art will be briefly described, it being apparent that the drawings in the description below are some of the embodiments of the invention and that, without the inventive effort, further drawings may be obtained according to these drawings, for a person skilled in the art, in which:

fig. 1 is a flowchart of a pancreatic bile duct segmentation method according to the present embodiment.

Fig. 2 is a schematic diagram of a token pyramid transducer module provided in this embodiment.

Fig. 3 is a schematic diagram of a pancreatic bile duct gating integration module according to the present embodiment.

Fig. 4 is a schematic diagram of a pancreatic bile duct global integration segmentation module provided in the present embodiment.

Fig. 5 is a schematic diagram of an electronic device according to the present embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to fig. 1 to 3, and the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by those skilled in the art without making inventive efforts are within the scope of protection of the present invention.

FIG. 1 is a flow chart of a method of pancreatic bile duct segmentation based on a token pyramid; the method comprises the following steps:

firstly, collecting data of a labeled pancreatic bile duct, labeling the data, and obtaining a pancreatic bile duct data set by utilizing random rotation, random inversion and random contrast enhancement;

constructing a pancreatic duct segmentation model of a token pyramid-based transducer gating network, wherein the model comprises a token pyramid transducer module, a pancreatic duct gating integration module and a pancreatic duct global integration segmentation module;

and thirdly, realizing automatic segmentation of the pancreatic bile duct by using the pancreatic bile duct segmentation model of the token pyramid-based transducer gating network constructed in the steps one to two.

In a preferred embodiment, in the first step, the data set includes a pancreatic bile duct and a label data set thereof, and the three-dimensional labels of the pancreatic bile duct are manually marked, and the pancreatic bile duct data are read, normalized and amplified, and the data amplification includes random rotation, random inversion and random contrast enhancement.

In a preferred embodiment, in the second step, the token pyramid transducer module includes extraction and concatenation of feature layers of different scales, and the transducer structure is used to capture context-related scale-aware information; the pancreatic bile duct gating integration module comprises dynamic fusion of local tokens and global semantic information through a gating structure; the pancreatic bile duct global integration segmentation module comprises effective fusion of segmentation results of decoding blocks with different scales.

In a preferred embodiment, the specific construction process of the segmentation model is as follows:

step 1: the data volume of the model is enhanced by preprocessing the pancreatic bile duct data set through random rotation, random inversion, random contrast enhancement and the like, so that the overfitting capacity of the model is reduced, the generalization of the model is enhanced, and then the model is sent into a segmentation network.

Step 2: the token pyramid module contains 4 coded blocks, where each module contains 2 three-dimensional convolution and one max pooling operation. The convolutions of the 4 modules are spliced together to form a token pyramid, so that rich scale perception semantic information is obtained. Specifically, the input image is pancreatic bile duct X as input, our token Jin Dada first passes the image through 4 encoded blocks, where each encoded block contains 2 three-dimensional convolution and one max pooling operation, and generates a series of scale tokens { T ] ¹ ，T ² ，...，T ^N Where N is the total number of coded blocks. Then we will run a series of scale tokens { T ] ¹ ，T ² ，…，T ^N Average pooling (pool) to the same target size, e.gCoding block bisection for each scaleThe contribution values of the cuts are different, for this purpose we use a learning filter dynamics L ⁱ The multi-scale features are modulated, and the final feature pyramid Z is obtained through cascading multi-scale dynamic feature aggregation, and the calculation formula is as follows:

step 3: feature pyramids are fed into several stacked transfomer blocks to extract a global scale-aware semantic extractor, in the present invention the number of transfomers is 2. The transducer consists of a multi-head attention module, a feed forward network and a recursive connection.

Step 4: the pancreas bile duct gating integration module obtains a local scale token { T }, which is obtained by a token pyramid ¹ ，T ² ，…，T ^N Global semantics G obtained by the } and the transducer as input. Wherein T is locally marked ⁱ Local features of interest are generated by 1 x 1 convolution layer transfer and batch normalization. The global semantics are up-sampled and then input to a 1 multiplied by 1 convolution layer, and then the semantics weights are generated through a batch normalization layer and a sigmoid layer. Wherein the semantic weight and the local attention feature are multiplied element by element and then are subjected to 1X 1 convolution layer with global semantics, and then the global attention features subjected to batch normalization are added element by element, so that the gating integrated feature F with local and global dependency relationship is obtained ⁱ The calculation formula is as follows:

wherein, conv_bn is 1×1×1 convolution layer and batch normalization. upsampling is an upsampling. F (F) ⁱ ' is a local feature subjected to semantic weight modulation,for multiplication element by element>Is added element by element.

Step 5: in order to integrate gaps of multi-scale gating integration features, the invention sends gating integration features with different scales into a pancreatic bile duct global integration segmentation module. The module upsamples the high-scale feature information to be consistent with the low-scale feature information, then through 2 3 x 3 convolution layers and batch normalization, and obtaining a final segmentation map.

Step 7: the input segmentation map is modeled with a dice loss sparsity and a cross entropy loss function (dice loss function is the similarity between two segmentation samples, and cross entropy loss function is the accuracy of classification of each pixel point in the two samples).

Step 7: on the model prediction after training, the pancreatic bile duct segmentation result is automatically obtained according to the pre-trained model.

In a preferred embodiment, the fourth step specifically includes the following steps:

step 41, data collection and augmentation;

step 42, network training;

step 43, new data prediction and model evaluation.

In step 41, preprocessing methods such as random rotation, random inversion, random contrast enhancement and the like are used for amplifying the pancreatic bile duct data.

Wherein, in step 42, the amplified pancreatic bile duct data, 128 x 128 is randomly cropped from the image, and (5) normalizing and sending into a model. The model was trained using an SGD optimizer with an initial learning rate of 0.001 and a weight decay of 2e-4. The invention uses the ReduceLROnPlateau mechanism, the coefficient is 0.5, the endurance and cooling time is 3, and the minimum learning rate is 1e-8. During training, batch size was set to 2 and num-works was set to 12, and a total of 120 iterative learning was performed per experiment. After each iteration learning, the segmentation model judges the evaluation result of segmentation, if the current error is smaller than the error of the previous iteration, the current segmentation model is saved, and then training is continued until the maximum iteration times are reached.

In step 43, a plurality of loss integration is used for measurement, and a segmentation model with an optimal evaluation index is stored.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.

Fig. 2-4 are schematic diagrams of a token pyramid transducer module, a pancreatic bile duct gating integration module, and a pancreatic bile duct global integration segmentation module provided in this embodiment, where the pancreatic bile duct segmentation model includes: the token pyramid transducer module is used for extracting and splicing different scale feature layers and capturing context-related scale perception information; comprising a plurality of encoded blocks and a plurality of stacked Transformer blocks, wherein each encoded block comprises a plurality of three-dimensional volumes and at least one maximum pooling;

the pancreatic bile duct gating integration module is used for dynamically fusing the local token and the global semantic information through a gating structure; and the pancreatic bile duct global integration segmentation module is used for effectively fusing segmentation results of decoding blocks with different scales.

The embodiment provides a pancreatic bile duct segmentation model of a Transformer gating network based on a token pyramid module, a Transformer module, a pancreatic bile duct gating integration module and a pancreatic bile duct global integration segmentation module. The token pyramid rapidly generates pyramid features through multi-scale features, so that scale semantic perception is obtained, and three-dimensional more discernable features can be provided for the pancreatic bile duct. The transducer module can sense semantic information in a scale as input, learn long-distance information and improve the overall information capturing capability of the pancreatic bile duct. The pancreatic bile duct gating integration module provides gating information according to the segmentation target, controls the flow of the information, and enables the information favorable for segmentation to flow to the segmentation module. The pancreatic bile duct global integration segmentation module is used for outputting segmentation from different scales and integrating the segmentation information step by step so as to improve generalization of the model.

Fig. 5 is a schematic diagram of an electronic device according to the present embodiment; the electronic device comprises a memory, a processor, and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of the pancreatic bile duct segmentation method; alternatively, the processor may perform the functions of the modules/units in the above-described apparatus embodiments when executing the computer program.

The processor may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

The memory may be an internal storage unit of the electronic device, for example, a hard disk or a memory of the electronic device. The memory may also be an external storage device of the electronic device, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device. The memory may also include both internal storage units and external storage devices of the electronic device. The memory is used to store computer programs and other programs and data required by the electronic device.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present invention may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A pancreatic bile duct segmentation method based on a token pyramid is characterized by comprising the following steps of

2. The method of claim 1, wherein the pre-processing of the pancreatic bile duct data comprises three-dimensional labeling of the pancreatic bile duct data, normalizing the labeled pancreatic bile duct data and augmenting the data.

3. The method of pancreatic bile duct segmentation according to claim 1, wherein training the pre-constructed pancreatic bile duct segmentation model with a pancreatic bile duct dataset comprises:

4. The method of pancreatic bile duct segmentation according to claim 3, wherein assimilating all scale tokens to obtain a feature pyramid comprises:

using filter dynamics L ⁱ To modulate the multi-scale features to obtain modulation features T of the ith layer ^i′ The calculation formula is as follows: t (T) ^i′ ＝pool(T ⁱ )⊙L ⁱ Wherein pool is average pooling, L ⁱ Is a learnable filter;

adopting cascading multi-scale dynamic feature aggregation to obtain a feature pyramid Z, wherein the calculation formula is as follows: z=concat (T ^1′ ，T ^2′ ，…T ^N′ ) Wherein T is ^1' ,T ^2' ,…,T ^N' And is 1 st, 2 nd.

5. The method of claim 3, wherein the feature F is a gating integration feature ⁱ The calculation formula is as follows:

wherein the local scale token { T } ¹ ，T ² ，…，T ^N All }, allThe office semantic meaning G, conv_bn is 1×1×1 convolution layer and batch normalization. upsampling is an upsampling. F (F) ⁱ ' is a local feature subjected to semantic weight modulation,for multiplication element by element>Is added element by element.

6. The method of claim 3, wherein the final pancreatic bile duct segmentation model is obtained by training the model with a race loss sparse and cross entropy loss function after outputting the final pancreatic bile duct data segmentation map.

7. A segmentation model for a token pyramid-based segmentation method of the pancreatic bile duct according to any of claims 1-6, comprising

8. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when the computer program is executed.

9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 7.