CN101042771A

CN101042771A - Medicine image segmentation method based on intensification learning

Info

Publication number: CN101042771A
Application number: CN200710021810.5A
Authority: CN
Inventors: 高阳; 朱亮
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2007-04-29
Filing date: 2007-04-29
Publication date: 2007-09-26

Abstract

The invention discloses a medical image segmentation method based on reinforcement learning. Through different image samples, according to the defined state, action and reward, learning is carried out through the interaction between reinforcement learning and the environment, and the optimal behavior strategy is learned by trial and error. . Finally, a new image segmentation method based on reinforcement learning is formed, which uses the learned knowledge to segment similar medical images. The invention has the advantages of being able to effectively distinguish the nucleus and the cytoplasm, carry out self-adaptive and incremental learning, and accurately segment images.

Description

Medical Image Segmentation Method Based on Reinforcement Learning

技术领域technical field

本发明涉及一种基于强化学习的、增量的医学图像分割方法。The invention relates to an incremental medical image segmentation method based on reinforcement learning.

背景技术Background technique

对图像目标的精确分割是医学图像模式识别的重要基础之一。在肺癌细胞识别的应用中，大多数的肺癌细胞图像中细胞核与胞浆之间对比度较低，细胞核边缘与背景之间的边界模糊，再加上背景杂质噪声的影响，这都使得很难对肺癌细胞图像进行较精确的分割。The precise segmentation of image objects is one of the important foundations of medical image pattern recognition. In the application of lung cancer cell recognition, in most lung cancer cell images, the contrast between the nucleus and the cytoplasm is low, the boundary between the nucleus edge and the background is blurred, and the influence of background impurities and noise makes it difficult to identify Lung cancer cell images are more accurately segmented.

传统的图像分割方法主要分为基于阈值的分割和基于梯度的分割。前者对于具有多峰灰度直方图的图像，不能准确的分割出目标区域。而后者对于目标和背景灰度接近的情况，同样不能很好的分割出目标区域。除此之外，由于肺癌细胞病理图像的图像采集有很大的差异性，这导致普通图象分割方法很难适应如此复杂的环境。Traditional image segmentation methods are mainly divided into threshold-based segmentation and gradient-based segmentation. The former cannot accurately segment the target area for images with multi-peak grayscale histograms. The latter also cannot segment the target area very well for the case where the target and background gray levels are close. In addition, due to the great difference in image acquisition of lung cancer cytopathological images, it is difficult for ordinary image segmentation methods to adapt to such a complex environment.

目前，强化学习已被广泛应用于预测，智能控制，图像处理等诸多领域。与传统图像分割方法相比，基于强化学习的医学图像分割方法具有增量学习能力，能够适应复杂环境，对医学肺癌细胞图像做出正确的分割。At present, reinforcement learning has been widely used in prediction, intelligent control, image processing and many other fields. Compared with traditional image segmentation methods, medical image segmentation methods based on reinforcement learning have incremental learning capabilities, can adapt to complex environments, and make correct segmentation of medical lung cancer cell images.

发明内容Contents of the invention

发明目的：本发明的目的是针对现有技术的不足，提供一种能够做出正确的分割的基于强化学习的医学图像分割方法。Purpose of the invention: The purpose of the present invention is to address the deficiencies of the prior art and provide a medical image segmentation method based on reinforcement learning that can make correct segmentation.

技术方案：本发明通过不同的图像样本，根据定义的状态，动作和奖赏，通过强化学习与环境的交互进行学习，采用试错的方式学习最优行为策略。最终形成一种新的基于强化学习的图像分割方法，利用已经学习到的知识对相似的医学图像进行分割。该方法包括以下步骤：(1)初始化Q矩阵，Q矩阵以二维数组形式记录在当前状态以及所有后继状态下以策略π去选择动作做获得的累积奖赏；(2)对新样本图像进行sobel(一种常用的边缘检测算子)边缘检测，得到边缘图像；(3)对新样本图像利用类间方差最大法进行分割，得到包含细胞核和胞浆的二值图像；(4)定义状态为当前阈值分割结果的目标轮廓边缘和sobel边缘检测得到的边缘相重叠的比值以及当前阈值分割结果的目标区域面积与类间方差最大法分割出来的目标区域面积重合的比值；定义动作为当前阈值增加或减少动作A_i代表的灰度级，A＝[-30-10-5-1 0 1 5 10 30]；定义奖赏为当前分割出的目标区域与图像实际最优分割的符合程度；(5)计算0-255每个分割阈值对应的状态和奖赏，如此每个阈值与状态和奖赏对应(6)重复步骤(7)，直到最近10次的平均Q矩阵前后更新的均方差小于0.005；(7)给定一个初始阈值，重复步骤(8)～(10)，直到阈值变化到最优分割阈值；(8)根据阈值得到当前状态；(9)采用ε-greedy策略(ε-greedy策略以1-ε的概率选择Q-矩阵中奖赏最大的动作，以ε的概率选择其他动作)选择动作，改变阈值；(10)根据改变阈值后的新阈值得到对应的反馈奖赏更新Q矩阵；(11)对新样本图像重复(2)～(10)。Technical solution: The present invention uses different image samples, according to the defined state, action and reward, learns through the interaction between reinforcement learning and the environment, and learns the optimal behavior strategy by trial and error. Finally, a new image segmentation method based on reinforcement learning is formed, which uses the learned knowledge to segment similar medical images. The method includes the following steps: (1) Initialize the Q matrix, and the Q matrix is recorded in the form of a two-dimensional array in the current state and all subsequent states in the current state and all subsequent states to select actions to obtain cumulative rewards; (2) Sobel the new sample image (A commonly used edge detection operator) edge detection to obtain the edge image; (3) Segment the new sample image using the method of maximum variance between classes to obtain a binary image containing the nucleus and cytoplasm; (4) Define the state as The ratio of the overlap between the target contour edge of the current threshold segmentation result and the edge obtained by sobel edge detection and the overlap ratio of the target area area of the current threshold segmentation result and the target area segmented by the method of maximum variance between classes; define the action as the current threshold increase Or reduce the gray level represented by the action A _i , A=[-30-10-5-1 0 1 5 10 30]; define the reward as the degree of conformity between the currently segmented target area and the actual optimal segmentation of the image; (5 ) Calculate the state and reward corresponding to each segmentation threshold of 0-255, so that each threshold corresponds to the state and reward (6) Repeat step (7) until the mean square error of the latest 10 average Q matrix updates is less than 0.005; ( 7) Given an initial threshold, repeat steps (8) to (10) until the threshold changes to the optimal segmentation threshold; (8) Get the current state according to the threshold; (9) Adopt ε-greedy strategy (ε-greedy strategy with The probability of 1-ε selects the action with the largest reward in the Q-matrix, and selects other actions with the probability of ε) Select an action and change the threshold; (10) Get the corresponding feedback reward to update the Q matrix according to the new threshold after changing the threshold; (11 ) Repeat (2)-(10) for the new sample image.

有益效果：本发明的显著优点是：能够有效的区分细胞核和胞浆，自适应的，增量的进行学习，并准确对图像进行分割。Beneficial effect: the remarkable advantage of the present invention is that it can effectively distinguish the nucleus and the cytoplasm, learn adaptively and incrementally, and accurately segment the image.

附图说明Description of drawings

图1是本发明方法的框架模型。Fig. 1 is the framework model of the method of the present invention.

图2是本发明方法中的模块组成结构图。Fig. 2 is a structural diagram of modules in the method of the present invention.

图3是本发明方法的流程图。Fig. 3 is a flowchart of the method of the present invention.

图4是sobel算子进行边缘检测得到边缘图像。Figure 4 is the edge image obtained by the sobel operator for edge detection.

图5是类间方差最大分割的结果图像。Figure 5 is the resulting image of the between-class variance maximum segmentation.

图6是最优分割结果Figure 6 is the optimal segmentation result

具体实施方式Detailed ways

如图1所示，本发明方法的框架模型。As shown in Figure 1, the framework model of the method of the present invention.

如图2所示，本发明方法包含状态感知模块、动作选择模块、策略更新模块、奖赏感知模块和图像分割模块。As shown in Figure 2, the method of the present invention includes a state perception module, an action selection module, a policy update module, a reward perception module and an image segmentation module.

本发明方法流程如图3所示，下面详细说明：The process flow of the present invention is as shown in Figure 3, described in detail below:

步骤1，初始化Q矩阵，Q矩阵以二维数组形式记录在当前状态以及所有后继状态下以策略π去选择动作做获得的累积奖赏。Step 1. Initialize the Q matrix. The Q matrix records in the form of a two-dimensional array the cumulative rewards obtained in the current state and all subsequent states to select actions with strategy π.

步骤2，对新样本图像采用sobel算子(一种常用的边缘检测算子)进行边缘检测，得到边缘图像。边缘检测结果图像如图4所示。Step 2, use the sobel operator (a commonly used edge detection operator) to perform edge detection on the new sample image to obtain an edge image. The image of the edge detection result is shown in Figure 4.

步骤3，对新样本图像进行类间方差最大分割，得到包含细胞核和胞浆的二值图像。类间方差最大分割的结果图像如图5所示。Step 3: Carry out maximum variance segmentation between classes on the new sample image to obtain a binary image including nucleus and cytoplasm. The resulting image of the between-class variance maximum segmentation is shown in Fig. 5.

步骤4，定义状态S为当前阈值分割结果的目标轮廓边缘和sobel边缘检测得到的边缘相重叠的比值E以及当前阈值分割结果的目标区域面积与类间方差最大法分割出来的目标区域面积重合的比值F，即S＝(E×F)；定义动作为当前阈值增加或减少动作A_i代表的灰度级，A＝[-30-10-5-1 0 1 5 10 30]；定义奖赏R为当前分割出的目标区域与图像实际最优分割的符合程度。Step 4, define the state S as the ratio E of the overlap between the target contour edge of the current threshold segmentation result and the edge detected by sobel edge detection, and the overlap of the target area area of the current threshold segmentation result and the target area segmented by the maximum variance method between classes Ratio F, that is, S=(E×F); define the action as the gray level represented by the current threshold increase or decrease action A _i , A=[-30-10-5-1 0 1 5 10 30]; define the reward R is the degree of conformity between the currently segmented target area and the actual optimal segmentation of the image.

$E E. = = \frac{| | {Edge Edge}_{T T} I I {Edge Edge}_{S S} | |}{| | {Edge Edge}_{S S} | |} - - - - - - ((11))$

Edge_T为当前分割的边缘，Edge_s为边缘检测提取的边缘。Edge _T is the edge of the current segmentation, and Edge _s is the edge extracted by edge detection.

$F f = = \frac{| | {Front Front}_{T T} I I {Front Front}_{OSTU OSTU} | |}{| | {Front Front}_{OSTU OSTU} | |} - - - - - - ((22))$

Front_T为当前分割的目标区域，Front_OSTU为采用类间方差最大法(OSTU)分割出来的目标区域。Front _T is the target area currently segmented, and Front _OSTU is the target area segmented by the method of maximal variance between classes (OSTU).

$R R = = 100100 \times \times \frac{| | {B B}_{O o} I I {B B}_{T T} | | + + | | {F f}_{O o} I I {F f}_{T T} | |}{| | {B B}_{O o} + + {F f}_{O o} | |} - - - - - - ((33))$

B_O为最优分割的背景，F_O为最优分割的前景目标。B_T为当前分割的背景，F_T为当前分割的前景目标。B _O is the optimally segmented background, and F _O is the optimally segmented foreground object. _BT is the background of the current segmentation, and _FT is the foreground target of the current segmentation.

步骤5，根据(1)、(2)、(3)式计算0-255每个分割阈值对应的状态和奖赏，如此每个阈值与状态和奖赏对应。Step 5. Calculate the state and reward corresponding to each segmentation threshold of 0-255 according to formulas (1), (2), and (3), so that each threshold corresponds to the state and reward.

步骤6，重复步骤(7)，直到最近10次的平均Q矩阵更新前后的均方差小于0.005。Step 6, repeat step (7), until the mean square error of the last 10 average Q matrix updates before and after updating is less than 0.005.

步骤7，给定一个初始阈值，重复步骤(8)～(10)，直到阈值变化到最优分割阈值。Step 7. Given an initial threshold, repeat steps (8) to (10) until the threshold changes to the optimal segmentation threshold.

步骤8，根据当前阈值得到当前状态。Step 8, get the current state according to the current threshold.

步骤9，采用ε-greedy策略(ε-greedy策略以1-ε的概率选择Q-矩阵中奖赏最大的动作，以ε的概率选择其他动作)选择动作，改变分割阈值。Step 9: Use the ε-greedy strategy (the ε-greedy strategy selects the action with the largest reward in the Q-matrix with the probability of 1-ε, and selects other actions with the probability of ε) to select actions and change the segmentation threshold.

步骤10，根据改变阈值后的新阈值得到对应的反馈奖赏更新Q矩阵。更新公式如式(4)：s为当前状态，a为对应s的动作，s’为执行动作a后的下一状态，a’为对应s’动作。Step 10, according to the new threshold after changing the threshold, the corresponding feedback reward is obtained to update the Q matrix. The update formula is as in formula (4): s is the current state, a is the action corresponding to s, s' is the next state after executing action a, and a' is the action corresponding to s'.

$Q Q ((s the s,, a a)) &LeftArrow; &LeftArrow; Q Q ((s the s,, a a)) + + α α [[r r + + γ γ \underset{{a a}^{' '}}{max max} Q Q (({s the s}^{' '},, {a a}^{' '})) - - Q Q ((s the s,, a a))]] - - - - - - ((44))$

步骤11，对新样本图像重复步骤2至10。Step 11, repeat steps 2 to 10 for the new sample image.

Claims

1, a kind of medical image cutting method based on intensified learning is characterized in that this method may further comprise the steps:

(1) initialization Q matrix, Q matrix are recorded under current state and all the follow-up states with the two-dimensional array form and go to select to move the accumulation award of doing acquisition with tactful π;

(2) adopt the sobel operator to carry out rim detection to the new samples image, obtain edge image;

(3) the new samples image is carried out the inter-class variance maximum fractionation, obtain comprising the bianry image of nucleus and endochylema;

(4) definition status S is the ratio F that current Threshold Segmentation result's objective contour edge and equitant ratio E in edge that the sobel rim detection obtains and current Threshold Segmentation result's target area area overlaps with target area area that the maximum method of inter-class variance splits, i.e. S=(E * F); The definition action increases or reduces the gray level of action Ai representative, A=[-30-10-5-1 015 10 30 for current threshold value]; Definition award R is the matching degree that current target area that is partitioned into and image actual optimum are cut apart;

(5) state and the award of each segmentation threshold correspondence of calculating 0-255, so each threshold value is corresponding with state and award;

(6) repeating step (7), the mean square deviation before and after 10 times the average Q matrix update is less than 0.005 up to date;

(7) a given initial threshold, the optimum segmentation threshold value is arrived up to changes of threshold in repeating step (8)～(10);

(8) obtain current state according to current threshold value;

(9) adopt ε-greedy policy selection action, change segmentation threshold;

(10) obtain corresponding feedback award according to the new threshold value after the change threshold value and upgrade the Q matrix;

(11) to new samples image repeating step 2 to 10.