CN114897922A

CN114897922A - Histopathology image segmentation method based on deep reinforcement learning

Info

Publication number: CN114897922A
Application number: CN202210352213.5A
Authority: CN
Inventors: 姚孟佼; 高翔
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2022-04-03
Filing date: 2022-04-03
Publication date: 2022-08-12
Anticipated expiration: 2042-04-03
Also published as: CN114897922B

Abstract

The invention discloses a histopathology image segmentation method based on deep reinforcement learning, which comprises the following steps of: inputting a tissue pathology image; preprocessing the histopathology image; acquiring a segmentation result probability map of the histopathology image by using a segmentation algorithm based on a Full Convolution Network (FCN); setting an initial segmentation threshold of the histopathological image as an initial action of the DQN algorithm; taking the current state as the input of a deep Q network, and taking Q values corresponding to all actions as the output; continuously performing iterative optimization on the probability graph of the rough segmentation result of the histopathology image; and after the optimal segmentation result is obtained, evaluating the segmentation performance of the algorithm by using evaluation indexes IoU, Recall and F-score of the segmentation algorithm. The invention improves the convergence rate of the algorithm and the segmentation precision of the tissue pathological image, and achieves the purpose of improving the cancer diagnosis efficiency and accuracy of a pathologist.

Description

Histopathology image segmentation method based on deep reinforcement learning

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a tissue pathology image segmentation method.

Background

Histopathological image segmentation is one of the important steps for automatic diagnosis, treatment and prognosis of diseases such as cancer. By extracting disease features from the histopathological images through segmentation, a pathologist can be assisted in judging whether diseases exist in a sample or not and judging the types and the severity of the diseases. Currently, the segmentation of histopathological images still presents the following challenges: (1) the data amount is small, and pathological images manually marked by experts are scarce; (2) the size of the histopathology image is large, the resolution ratio is high, and certain requirements are made on the processing speed of the model; (3) the tissue pathology image has fuzzy boundary, complex gradient and high requirement on the segmentation accuracy of the algorithm.

The document "Chinese patent with application publication No. CN 101042771A" discloses a medical image segmentation method based on reinforcement learning. According to the method, a Q-learning algorithm is used for segmenting the cell nucleus image, the edge ratio and the area ratio of a target area in the image and a corresponding area in manual labeling are jointly used as states, the characteristics of the image are learned by continuously adjusting a threshold value, only a small amount of training data are needed, interaction with a user is not needed, and a good effect is achieved. However, the method has limitations because the traditional algorithm is used for segmenting the gray level image of the target object; while the Q-learning algorithm uses tables to store Q values, there are problems with using tables to store Q values when the problem is very complex and there are countless multiple states and behaviors: (1) insufficient memory; (2) looking up states in a large table is very time consuming and the algorithm converges slowly.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a histopathology image segmentation method based on deep reinforcement learning, which comprises the following steps: inputting a tissue pathology image; preprocessing the histopathology image; acquiring a segmentation result probability map of the histopathology image by using a segmentation algorithm based on a Full Convolution Network (FCN); setting an initial segmentation threshold of the histopathological image as an initial action of the DQN algorithm; taking the current state as the input of a deep Q network, and taking Q values corresponding to all actions as the output; continuously performing iterative optimization on the probability graph of the rough segmentation result of the histopathology image; and after the optimal segmentation result is obtained, evaluating the segmentation performance of the algorithm by using evaluation indexes IoU, Recall and F-score of the segmentation algorithm. The invention improves the convergence rate of the algorithm and the segmentation precision of the tissue pathological image, and achieves the purpose of improving the cancer diagnosis efficiency and accuracy of a pathologist.

The technical scheme adopted by the invention for solving the technical problem comprises the following steps:

step 1: preprocessing the labeled histopathology images, and performing color standardization by using uniform mean values and standard deviations aiming at the histopathology images in the same data set so as to convert all the images in the data set into the same RGB color space distribution;

step 2: training the tissue pathology image by using a segmentation algorithm based on a full convolution neural network (FCN), and then acquiring a neural network rough segmentation result probability map of the tissue pathology image by using the trained full convolution neural network (FCN);

and step 3: setting an initial segmentation threshold value to be 128 as an initial action of the deep reinforcement learning DQN algorithm;

setting a threshold action set A, selecting corresponding actions according to an epsilon-greedy strategy, and performing iterative optimization on a tissue pathology image neural network rough segmentation result probability graph: selecting an integer in the action set A as the change amount of the division threshold value, wherein the result of the change of the division threshold value must be in the range of 0 to 255; using an intersection ratio IoU of the segmentation result image and the manual annotation image to represent return, calculating the return difference between the segmentation result image of the segmentation threshold value of the next state and the segmentation result image of the segmentation threshold value of the current state, wherein the return difference value is greater than 0 to represent that the segmentation precision is improved;

and 4, step 4: calculating the intersection ratio IoU of the edge of the segmentation result characteristic diagram of the histopathology image under the current threshold value and the Sobel operator edge detection extraction edge and the intersection ratio IoU of the segmentation result under the current threshold value and the area of the region of interest extracted by OTSU algorithm threshold value segmentation, and taking IoU of the edge and IoU of the area as the input of a depth Q network in the DQN algorithm;

and 5: taking IoU of the edge and IoU of the area obtained in the step 4 as the input of the depth Q network, and taking the Q values corresponding to all the thresholds in the threshold action set A as the output; randomly extracting a part of samples from the experience playback set, and calculating a target Q value through an updating formula of a Q-learning algorithm;

the deep Q network is used as a network model of the action value function, the weighting parameter of the network is theta, and Q (s, a; theta) is used for simulating the action value function Q ^* (s, a), where s represents the current state and a represents the current action, i.e.:

Q(s,a；θ)＝Q ^* (s,a) (1)

the empirical playback set is samples from the interaction of the agent with the environment for each time step DQN algorithm

The training randomly selects a part of samples from the experience playback set for updating, wherein the vector

Which is indicative of the current state of the device,

representing the next state, scalar a _j And r _j Representing actions and returns, respectively, a Boolean value is _ end _j Indicating whether the state is a termination state;

step 6: calculating a loss function according to a Mean Square Error (MSE) between a target Q value and a Q value output by the depth Q network; the method comprises the steps of initializing weight parameters in a depth Q network randomly by using truncated normal distribution, activating a hidden layer by a ReLU activation function, performing gradient updating by using an Adam optimizer until 1000 iterations are performed to obtain an optimal segmentation result, and evaluating the segmentation performance of the algorithm by using evaluation indexes IoU, Recall and F-score of the segmentation algorithm.

Further, the dataset is a multi-organ nuclear segmentation MoNuSeg dataset.

Further, the formula for obtaining the target Q value by using the updated formula of the Q-learning algorithm is:

wherein y is _j Denotes the target Q value, r _j Representing a return, gamma representing a discount factor,

a Q value representing the next state and action;

further, in step 6, the DQN algorithm uses the Q value calculated by the depth Q network as a prediction, and calculates the loss function L (θ) using the mean square error MSE, which is expressed as follows:

the invention has the following beneficial effects:

the invention divides the segmentation task of the histopathology image into two stages. In the first stage, a neural network rough segmentation result probability map of the histopathology image is obtained by using a segmentation algorithm based on a Full Convolution Network (FCN); in the second stage, the deep reinforcement learning DQN algorithm is used for carrying out iterative optimization on the probability graph of the result of the coarse segmentation of the neural network, the neural network is used for replacing a Q table to carry out dimensionality reduction, the convergence speed of the algorithm and the segmentation precision of the histopathology image are improved, and the purpose of improving the cancer diagnosis efficiency and accuracy of a pathologist is achieved.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a diagram of the overall architecture of the two-stage segmentation model in the present invention.

Fig. 3 is a block diagram of a deep Q network according to the present invention.

Fig. 4 is a result of segmentation of a histopathological image in an embodiment of the invention.

Detailed Description

The invention is further illustrated with reference to the following figures and examples.

The invention provides a histopathology image segmentation method based on deep reinforcement learning, which is characterized in that a neural network rough segmentation result probability graph is subjected to iterative optimization through a deep reinforcement learning DQN algorithm, a Q value obtained by target Q-learning calculation is used as a training label of a neural network, and the deep Q network is trained through the ideas of value function approximation and experience playback, so that the segmentation precision of a target region is improved; meanwhile, the DQN algorithm uses a neural network to replace a Q table for dimension reduction, and compared with the Q-learning algorithm, the convergence rate and the segmentation efficiency are improved.

As shown in fig. 2, the present invention is divided into two processing stages: a first stage of acquiring a neural network rough segmentation result probability map of the histopathology image by using a segmentation algorithm based on a full convolution neural network (FCN); and in the second stage, carrying out iterative optimization on the probability graph of the result of the coarse segmentation of the neural network by using a DQN algorithm.

As shown in fig. 1, a histopathology image segmentation method based on deep reinforcement learning includes the following steps:

and 5: using IoU of the edge and IoU of the area obtained in the step 4 as the input of the depth Q network, and using Q values corresponding to all thresholds in the threshold action set A as the output; randomly extracting a part of samples from the experience playback set, and calculating a target Q value through an updating formula of a Q-learning algorithm;

Q(s,a；θ)＝Q ^* (s,a) (1)

the empirical playback set is a sample obtained from interaction of an agent with the environment at each time step DQN algorithm

Which is indicative of the current state of the device,

Further, the dataset is a multi-organ nuclear segmentation MoNuSeg dataset.

Further, the formula for obtaining the target Q value by using the updated formula of the Q-learning algorithm is as follows:

representing the Q value corresponding to the next state and action;

further, the deep Q network structure in the DQN algorithm in step 5 is two layers of fully connected neural networks, w1 and w2 represent the weight dimensions of the first and second layers of the network, as shown in fig. 3. Wherein the weight dimension of the first layer of the network is [ state _ dim,20] and the weight dimension of the second layer is [20, action _ dim ]. state _ dim represents the dimension of the network input state, and the size of the state _ dim which can be defined by the state is 2; action _ dim represents the dimension of all actions output by the network, and the action _ dim is defined by the action as the number of elements | A | of the action set A.

the specific embodiment is as follows:

in the implementation example of the invention, a histopathology image in a multi-organ nuclear segmentation (MoNuSeg) data set is taken as a research object, and a Full Convolution Network (FCN) -based image segmentation algorithm in a first stage adopts a full resolution convolution neural network (FullNet) to obtain a neural network rough segmentation result probability map of the histopathology image in the MoNuSeg data set; and in the second stage, carrying out iterative optimization on the probability graph of the result of the FullNet network rough segmentation by using a DQN algorithm.

1. Preprocessing a histopathology image in a MoNuSeg data set, carrying out color standardization by using a uniform mean value and a standard deviation, wherein the mean value of RGB three channels is [ 0.744089940.538063490.66497889 ], the standard deviation is [ 0.158117030.197059410.15046222 ], and carrying out morphological open operation denoising by using a filter with the size of 5 x 5;

2. in the first stage, a FullNet network is used for training the histopathology images in the MoNuSeg data set, and then a trained model is used for obtaining a neural network rough segmentation result probability graph of the histopathology images in the data set;

3. the initial segmentation threshold T is set to 128 as the initial action of the deep reinforcement learning DQN algorithm. Setting a threshold action set A [ -50, -10, -5, -1,0,1,5,10,50], selecting an action according to an epsilon-greedy strategy, and performing iterative optimization on a FullNet network rough segmentation result probability map. Selecting an integer from the action set A as a current segmentation threshold, namely selecting an increment of the segmentation threshold, wherein a result after threshold adjustment must be in a range of 0 to 255; the return Reward is represented by the intersection ratio (IoU) of the segmentation result image and the manual labeling image under the current threshold value T, and the return deviation R corresponding to the threshold value next _ T of the next state and the threshold value T of the current state is used for measuring together, as shown in the following formula. If the difference value is greater than 0, the algorithm advances towards the direction of improving the segmentation precision, and the convergence can be faster;

R＝Reward[next_T]-Reward[T] (4)

4. calculating IoU of the segmentation result feature graph of the FullNet network under the current threshold value and the Sobel operator edge detection extraction edge and IoU of the intersection ratio of the segmentation result under the current threshold value and the area of the region of interest extracted by OTSU algorithm threshold value segmentation, taking IoU of the edge and IoU of the area together as the input of a depth Q network in the DQN algorithm, and continuously performing iterative optimization;

5. an agent in the DQN algorithm continuously explores the environment at first, training of the deep Q network is started after the experience pool is accumulated to a certain degree, IoU of the lower edge of the current threshold T and IoU of the area serve as the input of the deep Q network, and Q values corresponding to all thresholds in the threshold action set A serve as the output. Randomly extracting a small part of samples from the experience playback set, and calculating a target Q value through an updating formula of a Q-learning algorithm;

the training parameters of the deep Q network are set as follows:

(1) the size of the epsilon is set to 1000, and the number of times step of executing an action in each epsilon is set to 100;

(2) the discount factor γ is set to 0.9, and the size replay _ size of the empirical playback set is set to 10000;

(3) the number of samples batch _ size in the randomly drawn empirical playback set is set to 32;

(4) in order for the algorithm to converge well, the exploration rate epsilon needs to decrease as the iteration progresses, and the initial value is 0.5, and the end value is 0.3.

6. And solving the loss function according to the Mean Square Error (MSE) between the target Q value obtained by empirical playback calculation and the Q value obtained by Q network output. The weight parameters in the depth Q network are randomly initialized by using truncated normal distribution, the hidden layer is activated by using a ReLU activation function, gradient updating is carried out by using an Adam optimizer until 1000 iterations are carried out to obtain the optimal segmentation result threshold T _a . Evaluating the neural network segmentation results before and after the iterative optimization by using the DQN algorithm by using evaluation indexes IoU, Recall and F-score of the segmentation algorithm, wherein T _b And the segmentation threshold value of the FullNet network rough segmentation result before the iterative optimization of the DQN algorithm is represented, and the size of the segmentation threshold value is 128. The results of the experiment are shown in table 1.

TABLE 1 results of the experiment

The result shows that the histopathology image segmentation method based on the deep reinforcement learning provided by the invention obtains certain promotion on IoU, Precision and Dice coefficients. The visual segmentation result is shown in fig. 4, and through observation, after the segmentation result of the first stage of the FullNet network is iteratively optimized by using the DQN algorithm, the segmentation precision of the histopathology image is obviously improved, the histopathology image is closer to the manual labeling of a pathologist, and the purpose of improving the cancer diagnosis efficiency and accuracy of the pathologist is achieved.

Claims

1. A histopathology image segmentation method based on deep reinforcement learning is characterized by comprising the following steps:

and 2, step: training the tissue pathology image by using a segmentation algorithm based on a full convolution neural network (FCN), and then acquiring a neural network coarse segmentation result probability map of the tissue pathology image by using the trained full convolution neural network (FCN);

Q(s,a；θ)＝Q ^* (s,a) (1)

Which is indicative of the current state of the device,

2. The histopathological image segmentation method based on deep reinforcement learning of claim 1, wherein the data set is a multi-organ nuclear segmentation MoNuSeg data set.

3. The histopathological image segmentation method based on deep reinforcement learning of claim 1, wherein the formula for obtaining the target Q value by using the updated formula of the Q-learning algorithm is as follows:

indicating the Q value for the next state and action.

4. The histopathological image segmentation method based on deep reinforcement learning of claim 1, wherein the DQN algorithm in step 6 takes the Q value calculated by the depth Q network as a prediction, and uses the mean square error MSE to calculate the loss function L (θ), which is expressed as follows: