CN115187571A

CN115187571A - Medical image segmentation method for introducing prior knowledge based on reward function

Info

Publication number: CN115187571A
Application number: CN202210893407.6A
Authority: CN
Inventors: 李雄飞; 杨飞扬; 张小利; 黄萨; 朱芮; 于爽; 宋涵; 矫鑫瑶
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2022-07-27
Filing date: 2022-07-27
Publication date: 2022-10-14

Abstract

The invention provides a medical image segmentation method introducing prior knowledge based on a reward function, which comprises the following steps: acquiring a published medical CT image dataset; designing and obtaining a reward function based on the image data set; introducing a foreground-background ratio of the CT image in the image data set into the reward function to obtain an improved reward function; introducing the approximate difficulty of a priori knowledge example into the improved reward function to obtain a final reward function; and replacing the reward function in the reinforcement learning network with the final reward function, performing reinforcement network training, and finally outputting a final segmentation result graph through the segmentation network. The method adopts the design of introducing external prior knowledge into the reward function, so that the model learning process is greatly influenced towards a beneficial direction, each pixel in the image not only considers the state of the pixel, but also considers the states of other adjacent pixels, the strategy is updated in the constraint direction, and the segmentation result is similar to the real lesion.

Description

Medical image segmentation method for introducing prior knowledge based on reward function

Technical Field

The invention belongs to the technical field of medical image segmentation, and particularly relates to a medical image segmentation method based on introduction of prior knowledge into a reward function.

Background

Medical images play a crucial role in the diagnosis of many important diseases, such as structural and functional analysis, diagnosis and treatment, etc., where tumor lesion segmentation is of great significance. From the perspective of practical clinical applications, medical images are aimed at assisting physicians in clinical diagnosis. However, a model that can only predict whether there is a lesion in a CT image does not satisfy the requirements of a doctor. In addition, there is a need to present the slice location of the lesion, the confidence of the single slice segmentation, and whether benign or malignant tumors can be distinguished without the puncture test. The traditional manual segmentation and annotation method has low efficiency of segmenting the medical image focus, and the performance of the traditional manual segmentation and annotation method depends on the experience of an annotator and the knowledge in the medical field to a great extent.

In recent years, deep learning rises rapidly in the field of computer vision and is gradually closely related to medical image segmentation, but medical image segmentation not only requires knowledge in the field of computer vision, but also requires prior knowledge in the field of medicine as theoretical support for model design. Most existing methods are learned from natural image segmentation methods, and medical features are ignored. Furthermore, the team annotating the medical image includes experts, trained personnel and laypersons, which can affect the reliability of the data annotation and limit the effectiveness of the segmentation model in real scenarios. Aiming at the problem, the invention provides a medical image segmentation method introducing prior knowledge based on a reward function so as to solve the problems in the prior art.

Disclosure of Invention

In order to solve the technical problems, the invention provides a medical image segmentation method introducing prior knowledge based on a reward function, the medical image segmentation method introducing the prior knowledge based on the reward function adopts the design of introducing external prior knowledge into the reward function, so that the external prior knowledge greatly influences the model learning process towards a beneficial direction, each pixel in an image not only considers the state of the pixel, but also considers the states of other adjacent pixels, and thus each pixel which is regarded as an intelligent body is more like the judgment of an analog expert and updates the strategy in the constraint direction.

In order to achieve the above object, the present invention provides a medical image segmentation method based on a reward function and introducing prior knowledge, comprising the following steps:

s1, acquiring a public medical CT image data set;

s2, designing and obtaining a reward function based on the image data set;

s3, introducing a foreground-background ratio of the CT image in the image data set into the reward function to obtain an improved reward function;

s4, introducing the approximate difficulty of a priori knowledge example into the improved reward function to obtain a final reward function;

and S5, replacing the reward function in the reinforcement learning network with the final reward function, carrying out reinforcement network training, and finally outputting a final segmentation result graph through the segmentation network.

Preferably, in S2, in the process of designing and obtaining the reward function, the reward P is given _i Designed as the difference between the predicted value and the actual label for each pixel in the image data set,

preferably, the method for designing and obtaining the reward function comprises:

treating each pixel in the image dataset as an agent;

designing and obtaining local rewards of the single agents based on cross entropy gain;

deriving the reward function based on the local reward.

Preferably, the feedback r of a single said agent is _i Is defined as follows

Wherein gamma denotes a discount factor, V(s) ^(t) ) Representing inputs s of a network ^(t) Expected total reward of, R _i Indicating the initial reward, P, for a single agent _i Indicating the redefined prize, y _i E (-1, + 1) represents the pixel label, t represents the step t, r _i Represented as the final actual reward for each agent.

Preferably, in S3, the reward function after introducing the foreground-background ratio is improved based on the following formula

Wherein R is ^(t) Representing the average of the total expected reward for all pixels,

represents the reward of the step t, and N represents the total pixel number;

since the medical image segmentation can be regarded as a pixel binary classification, the number of pixels in the region of interest is far less than that of other parts, and the ratio of foreground to background affects the final segmentation effect, the above formula is replaced by an improved reward function formula, that is, the formula is replaced by the improved reward function formula

Wherein alpha is a hyper-parameter (-1, 0), N _Front Foreground region representing a prediction map, N _Back Representing the background area of the prediction map.

Preferably, in S4, the approximation difficulty of the prior knowledge example is introduced as the prior knowledge based on a setting of a reinforcement learning action space, where a specific action should be set to { mask, retain }, so as to cover parts that have a bad influence on the segmentation, retain parts that are beneficial to the segmentation, and allow the agent to adjust action selection to different degrees under different conditions.

Preferably, the method for introducing the approximate difficulty of the prior knowledge instance into the improved reward function comprises:

setting the image dataset (= { (x) {) ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _N ,y _N )}，x _i Representing an input pixel, y _i E { -1, +1} represents the pixel label, input pixel x for each instance _i (i =1,2, \8230;, N) corresponds to a partially receptive window region F _i ＝{(x _p ,y _p ) I P =1,2, \ 8230;, P }, then the ADI function is expressed as

|F _i | represents the number of pixels in the partial receptive field,

if F _i Tags and x of all examples in (1) _i Same, then

If F is true _i Class label and x for all instances in _i In a different way

Is established, therefore, will

The replacement is as follows

Where gamma represents a discount factor, where gamma represents,

the prize, V(s), representing the step t ^(t) ) Input s representing a network ^(t) The expected total prize.

Preferably, in S5, the method for obtaining the final segmentation result map includes:

continuously taking the output of the reinforced network training as the input of the segmentation network to form a segmentation model;

and combining the segmentation model by a CNN-Transformer network as an encoder, and cascading an upsampler to perform positioning and segmentation on the CT image to obtain a final segmentation result graph.

Compared with the prior art, the invention has the following advantages and technical effects:

the method adopts the design of introducing external prior knowledge into the reward function, so that the reward function greatly influences the model learning process towards a beneficial direction, and each pixel in the image not only considers the state of the pixel, but also considers the states of other adjacent pixels; each pixel considered as an agent is more like an analog expert judgment and the strategy is updated in the direction of constraint, and the segmentation result of the method is similar to the real lesion.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:

FIG. 1 is a flow chart of a medical image segmentation method based on reward function to introduce prior knowledge according to the present invention;

FIG. 2 is a schematic diagram of a segmentation model network structure according to the present invention;

FIG. 3 is a schematic diagram of a verification experiment process according to the present invention.

Detailed Description

It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

According to fig. 1 and fig. 2, the present embodiment provides a medical image segmentation method based on a reward function and introducing a priori knowledge, including the following steps:

step one, a public medical CT image data set (such as a kidney cancer CT data set KiTS 19) is obtained.

And step two, designing and obtaining a reward function based on the image data set of the previous step.

In this embodiment, in designing the reward function, each pixel is regarded as an agent, and the local reward design logic for a single agent is based on the cross-entropy gain.

Specifically, P will be awarded _i Designed as the difference between the predicted value of the pixel and the actual label,

to update the model in the direction of the constraint, the aforementioned scores are referred to as compare and override, and for the benchmark, the agent's probability is closer to the true pixel label and the agent will obtain a positive reward, and vice versa. Feedback r of each agent _i Is defined as follows

Then there is

Where gamma denotes a discount factor, V(s) ^(t) ) Input s representing a network ^(t) Expected total reward of, R _i Indicating the initial reward, P, for a single agent _i Indicating the redefined prize, y _i E (-1, + 1) represents the pixel label, t represents the step t, r _i Represented as the final actual reward for each agent.

And step three, introducing the foreground-background ratio of the CT image in the image data set into the reward function to obtain an improved reward function.

Rather than modifying the loss function during the network fitting process, in this embodiment, external knowledge is introduced into the feedback (reward) function design of the reinforcement learning network. In order to relieve the problem of unbalanced area ratio of a foreground area and a background area, the foreground-background ratio of a medical image is introduced into a global reward function as prior knowledge, and the reward function after the introduction of the foreground-background ratio of the image is improved based on the following formula

Wherein R is ^(t) Represents the average of the total expected reward for all pixels,

indicates the reward for the step t and N indicates the total number of pixels.

Since the medical image segmentation can be regarded as a pixel binary classification, the number of pixels in the region of interest is much smaller than that of other parts, and the ratio of foreground to background affects the final segmentation effect. Based on this, the above formula is replaced by an improved reward function formula, i.e. by

Where α ∈ (-1, 0) represents a hyperparameter, we use the function f (x) = x +0.2/x, which is a unimodal function with a minimum value of

N _Front Foreground region representing a prediction map, N _Back Representing the background area of the prediction map to encourage the lesion area to account for 40% -50% of the total image, and as the representation learning of the pixel level RL (reinforcement learning network), after the patch which is not beneficial to segmentation or irrelevant to the task is shielded, the foreground-background ratio of the prediction map will gradually approach the ratio which is most beneficial to segmentation;

and step four, introducing the approximate difficulty of the prior knowledge example into the improved reward function to obtain a final reward function.

In the embodiment, based on the setting of the reinforcement learning action space, the approximation difficulty of the example is introduced as the prior knowledge to explore the consideration degree of the adjacent pixel values, so that each agent can not only notice itself, but also notice the strategies and values of other agents in the adjacent range.

The approximation difficulty of the example is introduced as the priori knowledge, and is based on the setting of a reinforcement learning action space, the specific action should be set to { mask, retain }, the parts which have bad influence on the segmentation are covered, the parts which are beneficial to the segmentation are reserved, and the intelligent agent is allowed to adjust the action selection to different degrees under different conditions. For example, when the predicted value of the current patch is closer to the tag, a better action may be considered for selection.

Secondly, the way of introducing the approximate difficulty of the prior knowledge example into the reward function is as follows:

setting a CT image dataset D = { (x) ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _N ,y _N )}，x _i Representing an input pixel, y _i E { -1, +1} represents the pixel label, input pixel x for each instance _i (i =1,2, \8230;, N) corresponds to a partial window area F _i ＝{(x _p ,y _p ) P =1,2, \8230 |, P }. ADI function expressed as

|F _i | represents the number of pixels in the partial receptive field. Here, the first and second liquid crystal display panels are,

obviously, if F _i Tags and x of all examples in (1) _i Are the same as each other, then

If F is true _i Class label and x for all instances in _i In a different way

This is true. Therefore, the present embodiment will be described

The replacement is as follows

Where gamma represents a discount factor, where gamma represents,

the prize, V(s), representing the step t ^(t) ) Input s representing a network ^(t) The expected total reward of;

and step five, replacing the reward function in the reinforcement learning network with the final reward function, performing reinforcement network training, and finally outputting a final segmentation result graph through the segmentation network.

In the embodiment, the reward function in the reinforcement learning network is replaced by the final reward function introducing external priori knowledge, the reinforcement network is guided to be trained by the reward function, and then the final segmentation result graph is output through the segmentation network.

Specifically, after an image is input, after the image is trained by an enhanced network, the output of the image is continuously used as the input of a segmentation network, a segmentation model formed by combining a CNN-Transformer network is used as an encoder and a cascade upsampler to realize accurate positioning and accurate segmentation of a focus, a final segmentation result graph is obtained, and the specific structure of the segmentation model is shown in FIG. 2.

As shown in fig. 3, a validation test of a medical image segmentation method based on a reward function and introducing a priori knowledge is provided for the embodiment, and the validation test includes:

step one, selecting a data set. Two-dimensional renal tumor CT data sets are selected for experiments, namely KiTS19 and JLUKT data sets.

KiTS19 challenge data: 210 cases of renal tumors with clinical background, CT semantic segmentation and surgical outcome. During 2010-2018, all patients who received nephrectomy or radical nephrectomy at the Minnesota university medical center were included in the database, from which comprehensive clinical outcomes of 300 patients who received nephrectomy were randomly drawn. We segmented only the whole kidney tumor, the number of training samples was 168, and the number of test samples was 42.

JLUKT renal tumor data: in 2009-2020, white claim en second clinical hospital, jilin university, performed CT enhancement examinations on different renal cancer patients and included their data in the database. CT data for 61 renal tumors are provided. The number of training samples is 49, and the number of test samples is 12.

And step two, preprocessing data. In the acquired two-dimensional renal tumor CT data set, each case comprises a plurality of pieces of continuous slice data, most of the slices do not contain the focus, only a few slices have the focus, and in addition, each CT slice has one corresponding label data. Before the experiment, preprocessing and data enhancement operation are required to be carried out on the data set to ensure that the experiment has higher accuracy, including operations of image binaryzation, expansion and corrosion, histogram equalization, random rotation and translation and the like;

step three, setting up an experiment. The experiment is completed on a hardware platform with an Ubuntu operating system display card NVIDIA RTX3090, the most popular deep learning framework pytorch at present is adopted, tool packages such as matchlotlib, re and pydicom are mainly used, and the final experiment is completed by combining pycharm. The parameters in the experiment were set as follows:

iteration times are as follows: 100 epochs

An optimizer: adam

Learning rate: 10 ^-2

Batch size processing: 10

The number of rounds is as follows: 50

Length per round t _ max:4

Action space n _ action:2

Discount factor γ:0.95

Parameters of the initial pure segmentation model are used for initializing the reinforced model in the experiment, so that the model is converged more quickly. A down-sampling part of the segmentation model is designed by a transform coder, an up-sampling part outputs a final segmentation mask by multi-step decoding of hidden features, a skip-connection is also used for forming a U-shaped structure, so that an MSA structure and a CNN structure based on the transform are combined to bring out the best of the two, for different data sets, an Nvidia RTX3090 GPU is used, and the training time is varied from several hours to three days. In our model, the predicted inference time of each step is 422ms;

and step four, evaluating indexes. The Dice similarity coefficient is mainly used as a main evaluation index of the tumor segmentation performance of the kidney cancer CT image, and the Hausdorff coefficient is also used as an auxiliary evaluation index of the kidney cancer CT image.

The nature of the Dice coefficient is to measure the similarity of two set samples, which is a set similarity measurement function. The Dice coefficient originally originated from the two-classification problem, and was then widely used in medical image segmentation. In the medical image segmentation task, the overlapping proportion between the final segmented result and the real labeling area (Ground Truth) is mainly calculated, the value range is 0-1, the segmentation result is 1 in the best case, and is 0 in the worst case. If the area enclosed by the segmentation result is denoted as a and the area enclosed by the group Truth is denoted as B, the evaluation index Dice coefficient can be expressed as:

the Hausdorff index, hausdorff distance, is a measure describing the degree of similarity between two sets of points. The Dice coefficient is sensitive to the partitioned internal filling, while the Hausdorffdistance is sensitive to the partitioned boundary, and the smaller the calculated value is, the better the calculated value is, the definition of the factor is as follows:

and step five, evaluating the segmentation result. The loss of training different models is compared. ADI was found to accelerate the early convergence speed of the model, which takes into account the states of neighboring pixels. Secondly, combining foreground-background ratio with ADI increases the model exploration space. This results in a slower convergence rate of the model early in the training, but the segmentation results of the experiment can be improved. The performance of reinforcement learning using ADI is superior to the version without ADI because the average reward of each agent is redefined, the local reward of the model is clearer after the ADI function is introduced, and the score of the model is increased along with the embedding of external knowledge, which also means that the final segmentation result is more accurate and more suitable for the situation of a real lesion.

The above description is only for the preferred embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A medical image segmentation method for introducing prior knowledge based on a reward function is characterized by comprising the following steps:

s1, acquiring a public medical CT image data set;

s2, designing and obtaining a reward function based on the image data set;

2. A medical image segmentation method introducing a-priori knowledge based on a reward function according to claim 1,

in the S2, in the process of designing and obtaining the reward function, the reward P is given _i Designed as the difference between the predicted value and the actual label for each pixel in the image data set,

3. a medical image segmentation method introducing a-priori knowledge based on a reward function according to claim 2,

the method for designing and obtaining the reward function comprises the following steps:

treating each pixel in the image dataset as an agent;

designing and obtaining local rewards of the single agent based on the cross entropy gain;

the reward function is derived based on the local reward.

4. A medical image segmentation method introducing prior knowledge based on a reward function according to claim 3,

feedback r of individual said agents _i Is defined as follows

Where gamma denotes a discount factor, V(s) ^(t) ) Input s representing a network ^(t) Expected total reward of, R _i Indicating the initial reward, P, for a single agent _i Indicating the redefined prize, y _i E (-1, + 1) denotes the pixel label, t denotes the step t, r _i Represented as the final actual reward for each agent.

5. A medical image segmentation method with a priori knowledge introduced based on a reward function according to claim 1,

in S3, the reward function after introducing the foreground-background ratio is improved based on the following formula

represents the reward of the step t, and N represents the total pixel number;

6. A medical image segmentation method with a priori knowledge introduced based on a reward function according to claim 1,

in S4, the approximation difficulty introduced into the priori knowledge example as the priori knowledge is set based on a reinforcement learning action space, and a specific action should be set to { mask, retain }, so as to cover parts that do not affect the segmentation well, retain parts that are beneficial to the segmentation, and allow the agent to adjust the action selection to different degrees under different conditions.

7. A medical image segmentation method with a priori knowledge introduced based on a reward function according to claim 6,

the method for introducing the approximate difficulty of the prior knowledge example into the improved reward function comprises the following steps:

setting the image dataset D = { (x) ₁ ，y ₁ )，(x ₂ ，y ₂ )，...，(x _N ，y _N )}，x _i Representing an input pixel, y _i E { -1, +1} represents the pixel label, input pixel x for each instance _i (i =1,2,. Ang., N) corresponds to a partial receptive window region F _i ＝{(x _p ，y _p )|p＝1，2, P }, then the ADI function is expressed as

|F _i | represents the number of pixels in the partial receptive field,

if F _i Tags and x of all examples in (1) _i Same, then

If F is true _i Class labels and x for all instances in _i In a different way

Is established, therefore, will

The replacement is as follows

Where gamma represents a discount factor, where gamma represents,

indicates the prize, V(s), of the step t ^(t) ) Representing inputs s of a network ^(t) The expected total prize.

8. A medical image segmentation method introducing a-priori knowledge based on a reward function according to claim 1,

in S5, the method for obtaining the final segmentation result map includes:

combining the segmentation model by a CNN-Transformer network as an encoder, and cascading an upsampler to perform positioning and segmentation of the CT image to obtain a final segmentation result graph.