CN112734748B

CN112734748B - Image segmentation system for hepatobiliary and biliary calculi

Info

Publication number: CN112734748B
Application number: CN202110083764.1A
Authority: CN
Inventors: 蔡念; 陈芝涛; 罗智浩; 何兆泉; 王平; 王晗; 陈梅云
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2021-01-21
Filing date: 2021-01-21
Publication date: 2022-05-17
Anticipated expiration: 2041-01-21
Also published as: CN112734748A

Abstract

The application discloses an image segmentation system for hepatobiliary and biliary calculi.A small target area is subjected to feature learning in a down-sampling stage and feature fusion in an up-sampling stage in a first encoder-decoder, and low-level multi-scale time domain context features are extracted in the down-sampling stage and are used for subsequent layer-by-layer feature fusion; in the second encoder-decoder module, the low-layer multi-scale time domain context characteristics and the high-layer bidirectional time domain context characteristics are fused in the up-sampling stage, and a prediction sequence diagram of the original resolution size is obtained, so that more accurate position and classification information is obtained; the segmentation network model is trained through an improved loss function, so that the attention of time domain context information among the sequence image slices is increased, and the sequence image slices are more suitable for processing the sequence images. Therefore, the technical problem that the prior art cannot simultaneously consider the precision and the high efficiency for segmenting the hepatobiliary calculus and the biliary calculus in the CT image of the hepatobiliary calculus is solved.

Description

Image segmentation system for hepatobiliary and biliary calculi

Technical Field

The application relates to the technical field of medical image segmentation, in particular to an image segmentation system for hepatobiliary and biliary calculi.

Background

Hepatobiliary lithiasis is a common and frequently encountered disease in hepatobiliary surgery, and is well developed in east Asia regions. Surgical treatment of hepatobiliary stones presents a number of difficulties and challenges, including difficulty in removing the hepatobiliary stones, among others. The calculus removing treatment of hepatobiliary calculus mainly depends on minimally invasive surgery. Lithotomy depends largely on the preoperative analysis of CT scan slices of a patient. The most important link is to segment the bile duct and the gall-stone in the CT image so as to know the specific distribution of the bile duct and the gall-stone and facilitate the surgical intervention.

At present, the technologies for segmenting the CT image of hepatobiliary lithiasis mainly include: by means of full convolution neural networks, such as U-Net networks and M-Net networks; or the CT image is segmented by a 3D convolutional network.

However, since neither the U-Net network nor the M-Net network can directly process the 3D medical image, but the 3D image is cut into a plurality of 2D image slices and then respectively sent to the convolution network, so that the related information between the slices in the 3D image is ignored, the spatial structure information in the CT data is lost, under-segmentation is easily caused, and the hepatobiliary duct and biliary duct stones in the CT image are not accurately segmented. The 3D convolution network directly processes the voxels in the CT image, so that the number of parameters in the whole network model is extremely large, the calculation load is greatly increased, and the segmentation efficiency of hepatobiliary stones and biliary calculi in the CT image is remarkably reduced.

Disclosure of Invention

The embodiment of the application provides an image segmentation system for hepatobiliary and biliary calculi, which solves the technical problem that in the prior art, the accuracy and the high efficiency cannot be considered simultaneously for hepatobiliary and biliary calculi segmentation in a CT image of hepatobiliary calculi.

In view of the above, the present application provides, in a first aspect, an image segmentation system for hepatobiliary and biliary stones, the system comprising:

segmenting the network model; the segmentation network model includes: a first encoder-decoder module, a second encoder-decoder module, an optimization module;

the first encoder-decoder module is to:

respectively extracting the features of each image of the CT sequence image to obtain a plurality of first multi-scale feature maps and high-level feature maps; the context information features of the first multi-scale feature map are subjected to learning processing again to generate a multi-scale context feature map; performing time domain context information learning on the high-level feature map to generate a high-level bidirectional context feature map; splicing the multi-scale context feature map and the high-level bidirectional context feature map to generate a first context fusion feature map; the CT sequence images are continuous and adjacent CT sequence images of hepatobiliary lithiasis;

the second encoder-decoder module is to:

splicing the multi-scale context feature map and the first context fusion feature map to generate a second context fusion feature map; performing time domain context information learning on the second context fusion feature map to generate a high-level bidirectional context fusion feature map; after the multi-scale context feature map and the high-level bidirectional context fusion feature map are connected, a sequence probability map is output;

the optimization module is configured to:

and calculating a loss function value of the sequence probability chart and a label sequence chart corresponding to the sequence probability chart based on a preset loss function, judging whether the loss function value is smaller than a preset value, if so, obtaining an optimal segmentation network model, otherwise, after updating the characteristic parameters of the segmentation network model according to the loss function value, triggering the first encoder-decoder module and the second encoder-decoder module.

Optionally, the first encoder-decoder module specifically includes: the device comprises a first encoder, a ConvLSTM module, a first BiConvLSTM module and a first decoder;

the first encoder is to: respectively extracting features of small target areas in each image of the CT sequence image to obtain a plurality of first multi-scale feature maps and high-level feature maps, sending the first multi-scale feature maps to the ConvLSTM module, and sending the high-level feature maps to the first BiConvLSTM module;

the ConvLSTM module is configured to: relearning the context information features of the first multi-scale feature map to generate a multi-scale context feature map, and sending the multi-scale context feature map to the first decoder and the second encoder-decoder module;

the first BiConvLSTM module is configured to: performing two times of time domain context information learning in opposite directions on the high-level feature map to generate a high-level bidirectional context feature map, and sending the high-level bidirectional context feature map to the first decoder;

the first decoder is to: skip-splicing the multi-scale context feature map and the high-level bi-directional context feature map to generate a first context fusion feature map, and sending the first context fusion feature map to the second encoder-decoder module.

Optionally, the second encoder-decoder module specifically includes: the second encoder, the second BiConvLSTM module and the second decoder;

the second encoder is to: performing skip splicing on the multi-scale context feature map and the first context fusion feature map to generate a second context fusion feature map, and sending the second context fusion feature map to the second BiConvLSTM module;

the second BiConvLSTM module is configured to: performing two times of time domain context information learning in opposite directions on the second context fusion feature map to generate a high-level bidirectional context fusion feature map, and sending the high-level bidirectional context fusion feature map to the second decoder;

the second decoder is to: and after the multi-scale context feature map and the high-level bidirectional context fusion feature map are connected, a sequence probability map is output through an activation function.

Optionally, the activation function is: sigmoid function.

Optionally, the method further comprises: an input module;

the input module is used for: controlling the number of sequences of the CT sequence images of consecutive and adjacent hepatobiliary lithiasis such that the CT sequence images are input to the first encoder-decoder module.

Optionally, the first encoder is composed of a number of first convolution layers and a first pooling layer, wherein a size of a gaussian convolution kernel of the first convolution layer is 1 × 1.

Optionally, the first decoder is composed of a number of first feature fusion layers, a second convolution layer, and an upsampling layer, wherein a size of a gaussian convolution kernel of the second convolution layer is 3 × 3.

Optionally, the second encoder is composed of a number of second feature fusion layers, a third convolution layer, and a second pooling layer, wherein the size of the sparse convolution kernel of the third convolution layer is 5 × 5.

Optionally, the second decoder is composed of several feature fusion layers, a fourth convolution layer, and an upper convolution layer, wherein the size of the sparse convolution kernel of the fourth convolution layer is 7 × 7.

Optionally, the preset loss function is:

wherein the right side of the preset loss function equation is divided into two parts;

the method comprises the following steps of firstly, obtaining a common bootstrap cross entropy loss function, wherein N and K respectively represent the number of image pixels and the number of pixel categories; in parentheses, y_iJ denotes this condition belonging to class j, p_ijRepresents the measured probability of the ith pixel belonging to the jth class, where t is a threshold value with a value range of (0, 1)](ii) a Condition (y) in parentheses_i＝j)∩(p_ij≤t_j) When it is established, 1{ (y)_i＝j)∩(p_ij≤t_j) Equal to 1, otherwiseEqual to 0;

secondly, a weighted value based on the relevance, namely w;

wherein n represents the number of slice frames (positive integer) of the input sequence images, and s represents the slice position at the current time (s ≦ n); c represents the similarity between two different image slices, such as the common cosine similarity and euclidean distance similarity.

According to the technical scheme, the method has the following advantages:

the application provides an image segmentation system for hepatobiliary and biliary calculi. Due to the close information connection between different image slices of the same case, aiming at the characteristic; the application designs a novel time domain context information correlation mechanism which mainly comprises a first coder-decoder module of low-level multi-scale time domain context information, a second coder-decoder module of high-level bidirectional time domain context information and an improved loss function based on sequence image slice correlation. In a first encoder-decoder, feature learning is carried out on a small target region in a down-sampling stage, feature fusion is carried out in an up-sampling stage, and low-layer multi-scale time domain context features are extracted in the down-sampling stage and are used for subsequent layer-by-layer feature fusion; in the second encoder-decoder module, the low-layer multi-scale time domain context features and the high-layer bidirectional time domain context features are fused in the up-sampling stage, and a prediction sequence diagram of the original resolution size is obtained, so that the image receptive field is increased, and more accurate position and classification information is obtained; and finally, training the segmentation network model through an improved loss function to increase the attention of time domain context information among sequence image slices, and performing numerical weight analysis on the relevance degree of the time domain context information to make the time domain context information more suitable for processing sequence images. Compared with the existing hepatobiliary duct and calculus segmentation network, the segmentation system has higher segmentation precision and efficiency, so that the technical problem that the accuracy and the high efficiency cannot be simultaneously considered for hepatobiliary duct and biliary tract calculus segmentation in a CT image of hepatobiliary duct lithiasis in the prior art is solved.

Drawings

Fig. 1 is a system architecture diagram of an image segmentation system for hepatobiliary and biliary calculi provided in an embodiment of the present application;

fig. 2 is a block diagram of a ConvLSTM module provided in an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a system architecture diagram of an image segmentation system for hepatobiliary and biliary calculi according to an embodiment of the present application.

An image segmentation system for hepatobiliary and biliary calculi provided in an embodiment of the present application includes: segmenting the network model; the segmentation network model comprises: a first encoder-decoder module, a second encoder-decoder module, an optimization module.

In order to improve the accuracy of segmentation of gallstones and hepatobiliary ducts and effectively utilize context information among CT image slices, the invention provides a segmentation method based on deep sequence learning, and designs a novel time domain context information correlation mechanism (segmentation network model).

It should be noted that the segmentation network model is composed of two encoders and two decoders, and the optimization module is a virtual function module in the segmentation network model that calculates the loss value based on the improved loss function. As shown in fig. 1, the arrows in different gray levels in the figure are represented as: upper convolution, pooling, sparse convolution, one-way convolution, two-way convolution, feature map fusion, and the like.

The specific engineering process for segmenting each module in the network model is as follows:

the first encoder-decoder module is to:

respectively extracting the features of each image of the CT sequence image to obtain a plurality of first multi-scale feature maps and high-level feature maps; the context information features of the first multi-scale feature map are subjected to learning processing again to generate a multi-scale context feature map; performing time domain context information learning on the high-level feature map to generate a high-level bidirectional context feature map; splicing the multi-scale context feature graph and the high-level bidirectional context feature graph to generate a first context fusion feature graph; wherein, the CT sequence images are continuous and adjacent CT sequence images of hepatobiliary lithiasis.

The second encoder-decoder module is to:

splicing the multi-scale context feature graph and the first context fusion feature graph to generate a second context fusion feature graph; performing time domain context information learning on the second context fusion feature graph to generate a high-level bidirectional context fusion feature graph; and after connecting the multi-scale context feature map and the high-level bidirectional context fusion feature map, outputting a sequence probability map.

The optimization module is to:

and calculating a sequence probability graph and a loss function value of a label sequence graph corresponding to the sequence probability graph based on a preset loss function, judging whether the loss function value is smaller than a preset value, if so, obtaining an optimal segmentation network model, otherwise, after updating the characteristic parameters of the segmentation network model according to the loss function value, triggering a first coder-decoder module and a second coder-decoder module.

It should be noted that, the process of updating the feature parameters of the split network model is automatically completed by the split network model, specifically, each time the loss function value is calculated, the split network model aims to reduce the loss function value, and the Adam optimizer and the like calculate the parameter value which is larger and larger each time, and actively update the model parameters.

For the task of the sequence image, the common loss function only focuses on the information inside the 2D image slice, and gives the same weight to the image context information at different slice distances, which is not in accordance with the distribution rule of the context information of the sequence image slice. Aiming at the defects of the existing loss function, a relevance weighted bootstrap cross entropy function is designed, namely the preset loss function of the application, so that the relevance weighted bootstrap cross entropy function is more suitable for processing sequence tasks, a sequence probability graph and a loss function value optimization model of a label sequence graph corresponding to the sequence probability graph are calculated through the preset loss function, preset values which are set by technicians in the field are obtained, and therefore the optimal segmentation network model is obtained and used for segmenting hepatobiliary stones and biliary calculi in images.

Wherein the preset loss function is:

wherein, the right side of the preset loss function equation is divided into two parts;

the method comprises the following steps that (1) a common bootstrap cross entropy loss function is adopted, wherein N and K respectively represent the number of image pixels and the number of pixel categories; in parentheses, y_iJ denotes this condition belonging to class j, p_ijRepresents the measured probability of the ith pixel belonging to the jth class, where t is a threshold value with a value range of (0, 1)](ii) a Condition (y) in parentheses_i＝j)∩(p_ij≤t_j) When it is established, 1{ (y)_i＝j)∩(p_ij≤t_j) Equal to 1, otherwise equal to 0;

secondly, a weighted value based on the relevance, namely w;

The application provides an image segmentation system for hepatobiliary and biliary calculi. Due to the close information relation among different case image slices, aiming at the characteristic; the application designs a novel time domain context information correlation mechanism which mainly comprises a first coder-decoder module of low-level multi-scale time domain context information, a second coder-decoder module of high-level bidirectional time domain context information and an improved loss function based on sequence image slice correlation. In a first encoder-decoder, feature learning is carried out on a small target region in a down-sampling stage, feature fusion is carried out in an up-sampling stage, and low-layer multi-scale time domain context features are extracted in the down-sampling stage and are used for subsequent layer-by-layer feature fusion; in the second encoder-decoder module, the low-layer multi-scale time domain context features and the high-layer bidirectional time domain context features are fused in the up-sampling stage, and a prediction sequence diagram of the original resolution size is obtained, so that the image receptive field is increased, and more accurate position and classification information is obtained; the segmentation network model is trained through an improved loss function, so that the attention of time domain context information among sequence image slices is increased, and the degree of relevance of the space information is subjected to numerical weight analysis, so that the method is more suitable for processing sequence images. Compared with the existing hepatobiliary duct and calculus segmentation network, the segmentation system has higher segmentation precision and efficiency, so that the technical problem that the accuracy and the high efficiency cannot be simultaneously considered for hepatobiliary duct and biliary tract calculus segmentation in a CT image of hepatobiliary duct lithiasis in the prior art is solved.

Further, the first encoder-decoder module in the present application specifically includes: the device comprises a first encoder, a ConvLSTM module, a first BiConvLSTM module and a first decoder.

Since the CT image is a series of 2D series images, ConvLSTM has a greater adaptability to 2D image sequence data. Compared with the standard LSTM, ConvLSTM replaces matrix multiplication with convolution operation, which is very effective for solving the problem of sequence image prediction and is beneficial to learning the temporal context information in the sequence images. The structure of ConvLSTM is shown in FIG. 2, and the expression is as follows:

i_t＝σ(x_t*W_xi+h_t-1*W_hi+b_i)

f_t＝σ(x_t*W_xf+h_t-1*W_hf+b_f)

c_t＝c_t-1⊙f_t+i_t⊙tanh(x_t*W_xo+h_t-1*W_ho+b_o)

o_t＝σ(x_t*W_xo+h_t-1*W_ho+b_o)

h_t＝o_t⊙tanh(c_t)；

wherein, the operation is a convolution operation,

hadamard product operation, namely multiplication of corresponding elements of a matrix; w is the network parameter and b is the bias term. One ConvLSTM has three gates in total, input gate i_tForgetting door f_tAnd an output gate o_t。x_t、c_t、h_tAre the input, the active state and the hidden state at time t.

The operation of each specific block in the first encoder-decoder is as follows:

the first encoder is for: respectively extracting features of small target areas in each image of the CT sequence image to obtain a plurality of first multi-scale feature maps and high-level feature maps, sending the first multi-scale feature maps to a ConvLSTM module, and sending the high-level feature maps to a first BiConvLSTM module.

It should be noted that, the first encoder of the embodiment of the present application includes several first convolution layers and a first pooling layer, where the size of the gaussian convolution kernel of the first convolution layer is 1 × 1.

The ConvLSTM module is used for: and the context information characteristic of the first multi-scale characteristic diagram is subjected to learning processing again to generate a multi-scale context characteristic diagram, and the multi-scale context characteristic diagram is sent to the first decoder and the second encoder-decoder module.

It should be noted that, for the extraction of spatial information features between CT sequence image slices, the present application utilizes the characteristics of ConvLSTM to accomplish this task. Since the difference between image slice frames is different at different scales, the application uses ConvLSTM at different scales to obtain multi-scale temporal context information. And flowing the information rich in the edge details and the multi-scale time domain context information layer by layer, and fusing the information rich in the edge details and the multi-scale time domain context information with the subsequent information features.

The first BiConvLSTM module is used for: and performing two times of time domain context information learning in opposite directions on the high-level feature map to generate a high-level bidirectional context feature map, and sending the high-level bidirectional context feature map to a first decoder.

It should be noted that, in general, the feature map of higher layer in the network is considered to contain the high-level semantic information of the image, and it is often the global feature of the image that is captured, which means that the high-level feature semantic information is very rich, and the application uses biconvlst, which is composed of two ConvLSTM with opposite directions, which allows the network model to learn more temporal context information. Connecting the results of the forward and backward ConvLSTM learning, and then applying the tanh function to obtain the corresponding result.

The first decoder is for: and carrying out skip splicing on the multi-scale context feature map and the high-level bidirectional context feature map to generate a first context fusion feature map, and sending the first context fusion feature map to a second encoder-decoder module.

It should be noted that the first decoder in the embodiment of the present application is composed of several first feature fusion layers, a second convolution layer, and an upsampling layer, where the size of the gaussian convolution kernel of the second convolution layer is 3 × 3.

Further, the second encoder-decoder module of the present application specifically includes: a second encoder, a second BiConvLSTM module and a second decoder.

The operation of each of the second encoder-decoder modules is as follows:

the second encoder is for: and carrying out skip splicing on the multi-scale context feature map and the first context fusion feature map to generate a second context fusion feature map, and sending the second context fusion feature map to a second BiConvLSTM module.

It should be noted that the second encoder according to the embodiment of the present application is composed of a plurality of second feature fusion layers, a third convolution layer, and a second pooling layer, where the size of the sparse convolution kernel of the third convolution layer is 5 × 5.

The second BiConvLSTM module is used for: and performing two times of time domain context information learning in opposite directions on the second context fusion characteristic graph to generate a high-level bidirectional context fusion characteristic graph, and sending the high-level bidirectional context fusion characteristic graph to a second decoder.

The second decoder is for: and after connecting the multi-scale context feature map and the high-level bidirectional context fusion feature map, outputting a sequence probability map by an activation function.

It should be noted that the second decoder in the embodiment of the present application is composed of several feature fusion layers, a fourth convolution layer, and an upper convolution layer, where the size of the sparse convolution kernel of the fourth convolution layer is 7 × 7. Meanwhile, in the embodiment of the present application, the activation function is set as a Sigmoid function, and a person skilled in the art can set the activation function according to an actual situation, which is not limited herein.

Furthermore, when the image is segmented, the numerical value of the segmentation network model for controlling the input of the CT sequence image is divided, and the image segmentation system for the hepatobiliary duct and the biliary tract calculus is also provided with an input module: the input module is used for: the number of sequences of CT sequence images of consecutive and adjacent hepatobiliary calculosis is controlled such that the CT sequence images are input to the first encoder-decoder module.

It can be understood that the general idea of the image segmentation system for hepatobiliary and biliary calculi of the present application is as follows: the method comprises the steps of training a novel segmentation network model of a time domain context information correlation mechanism designed by the application by taking CT sequence images of hepatobiliary lithiasis as training data, calculating loss values through a loss function improved by the application, and updating model parameters until parameters meeting preset conditions are obtained, so that an optimal segmentation network model is obtained and is used for segmenting hepatobiliary calculosis and biliary calculi of hepatobiliary calculosis images.

The terms "first," "second," "third," "fourth," and the like in the description of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An image segmentation system for hepatobiliary and biliary calculi, comprising: segmenting the network model; the segmentation network model includes: a first encoder-decoder module, a second encoder-decoder module, an optimization module;

the first encoder-decoder module is to:

the second encoder-decoder module is to:

splicing the multi-scale context feature map and the first context fusion feature map to generate a second context fusion feature map; performing time domain context information learning on the second context fusion characteristic diagram to generate a high-level bidirectional context fusion characteristic diagram; after the multi-scale context feature map and the high-level bidirectional context fusion feature map are connected, a sequence probability map is output;

the optimization module is configured to:

2. The system of claim 1, wherein the first encoder-decoder module comprises: the device comprises a first encoder, a ConvLSTM module, a first BiConvLSTM module and a first decoder;

the first encoder is to: respectively extracting features of a target area in each image of the CT sequence image to obtain a plurality of first multi-scale feature maps and high-level feature maps, sending the first multi-scale feature maps to the ConvLSTM module, and sending the high-level feature maps to the first BiConvLSTM module;

the first decoder is to: skip-splicing the multi-scale context feature map and the high-level bidirectional context feature map to generate a first context fusion feature map, and sending the first context fusion feature map to the second encoder-decoder module.

3. The system of claim 2, wherein the second encoder-decoder module comprises: the second encoder, the second BiConvLSTM module and the second decoder;

the second decoder is to: and after the multi-scale context feature map and the high-level bidirectional context fusion feature map are connected, outputting a sequence probability map through an activation function.

4. The system of claim 3, wherein the activation function is: sigmoid function.

5. The system for image segmentation of hepatobiliary and biliary stones of claim 1, further comprising: an input module;

6. The system of claim 2, wherein the first encoder is comprised of a number of first convolution layers and a first pooling layer, wherein the first convolution layers have a gaussian convolution kernel size of 1 x 1.

7. The system of claim 2, wherein the first decoder comprises a plurality of first feature fusion layers, a second convolution layer and an upsampling layer, and wherein the second convolution layer has a gaussian convolution kernel size of 3 x 3.

8. The system of claim 3, wherein the second encoder is comprised of a plurality of second feature fusion layers, a third convolution layer, and a second pooling layer, wherein the size of the sparse convolution kernel of the third convolution layer is 5 x 5.

9. The system of image segmentation of hepatobiliary and biliary stones of claim 3, wherein the second decoder is composed of several feature fusion layers, a fourth convolution layer, and an upper convolution layer, wherein the size of the sparse convolution kernel of the fourth convolution layer is 7 x 7.

10. The system of claim 1, wherein the preset loss function is:

the method comprises the following steps of firstly, obtaining a common bootstrap cross entropy loss function, wherein N and K respectively represent the number of image pixels and the number of pixel categories; in parentheses, y_iJ denotes this condition belonging to class j, p_ijRepresenting the measured probability of the ith pixel belonging to the jth class, where t_jIs a threshold value with a value range of (0, 1)](ii) a When the condition (y) is in parentheses_i＝j)∩(p_ij≤t_j) When established, 1{ (y)_i＝j)∩(p_ij≤t_j) Equal to 1, otherwise equal to 0, y_iIn order for the value of the tag to be true,

a predicted value of the segmentation network model;

secondly, a weighted value based on the relevance, namely w;

wherein n represents the number of slice frames of the input sequence image, n is a positive integer, s represents the slice position at the current moment, and s is less than or equal to n; c represents the similarity between two different image slices.