CN113312978A

CN113312978A - Method and system for accurately identifying and segmenting target under microscopic image

Info

Publication number: CN113312978A
Application number: CN202110482525.3A
Authority: CN
Inventors: 肖立
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2021-08-27

Abstract

The invention discloses a method and a system for accurately identifying and segmenting targets under microscopic images, wherein the method comprises the following steps: identifying a target in a microscopic image based on a Focal local target detection algorithm to obtain a target substance; performing semantic segmentation on the target substance based on a semantic segmentation algorithm, and acquiring an image area only containing the target substance from the microscopic image; and step three, visualizing the image area. The method carries out single target identification and high-quality reproduction in the microscopic image based on deep learning target detection and semantic segmentation, and solves the difficult problems of time and labor waste in manual microscopic image target identification and fine judgment.

Description

Method and system for accurately identifying and segmenting target under microscopic image

Technical Field

The invention relates to the technical fields of computer vision image processing, microscopic image analysis and the like, in particular to a method and a system for accurately identifying and segmenting a target under a microscopic image.

Background

Object recognition in microscopic images has important applications in biology and medicine. The existing microscopic image analysis mainly depends on manual judgment, and is time-consuming and labor-consuming. For example, karyotyping requires the identification and arrangement of all 46 chromosomes, while blood cell examination requires the identification and statistical averaging of all classes of bone marrow cells in multiple fields of view.

The manual analysis method is the most direct one, however, since a single field of view under a microscopic image usually has tens to hundreds of targets, each analysis usually needs to count a plurality of fields of view to achieve statistical accuracy, and is time-consuming and labor-consuming. Some subtle lesion features are difficult to find effectively because the targets tend to come together.

The prior art has the problems and needs to be solved urgently.

Disclosure of Invention

The invention aims to provide a method and a system for accurately identifying and segmenting a target under a microscopic image, which are used for identifying and reproducing a single target in the microscopic image with high quality based on deep learning target detection and semantic segmentation and solve the problem that manual microscopic image target identification and fine judgment are time-consuming and labor-consuming.

In order to achieve the above object, the present invention provides a method for accurately identifying and segmenting a target under a microscopic image, comprising:

identifying a target in a microscopic image based on a Focal local target detection algorithm to obtain a target substance;

performing semantic segmentation on the target substance based on a semantic segmentation algorithm, and acquiring an image area only containing the target substance from the microscopic image; and

and step three, visualizing the image area.

The method described above, wherein before the step one is executed, further comprising: and performing unified pretreatment on the microscopic image.

The method, wherein the uniform preprocessing is performed by randomly flipping the microscopic image at a probability of 50%, randomly rotating between-5 ° and 5 °, and randomly varying the brightness and contrast.

In the first step, the step of identifying the target in a microscopic image based on the Focal mass target detection algorithm includes:

constructing a network model for target area detection;

identifying the microscopic image according to the network model to obtain a candidate rectangular frame;

and obtaining a rectangular frame corresponding to the target substance according to the candidate rectangular frame.

In the second step, the semantic segmentation of the target substance based on the semantic segmentation algorithm includes:

constructing a semantic segmentation network by adding deep persistence into a standard Unet;

combining the cross entropy loss function with the bias-dice loss function to obtain a loss function;

and performing semantic segmentation on the target substance according to the semantic segmentation network and the loss function.

In order to achieve the above object, the present invention further provides a system for accurately identifying and segmenting a target under a microscopic image, comprising:

the target identification module is used for identifying a target in a microscopic image based on a Focal local target detection algorithm to obtain a target substance;

the semantic segmentation module is used for performing semantic segmentation on the target substance based on a semantic segmentation algorithm and acquiring an image area only containing the target substance from the microscopic image; and

and the visualization module is used for visualizing the image area.

The system, wherein, still include: and the preprocessing module is used for uniformly preprocessing the microscopic image.

The system of (1), wherein the pre-processing module processes the microscopic image using random flipping, random rotation between-5 ° and random shading and contrast variation at a probability of 50%.

The system of (a), wherein the object recognition module further comprises:

the model construction module is used for constructing a network model for target area detection;

the area prediction module is used for identifying the microscopic image according to the network model to obtain a candidate rectangular frame;

and the target area module is used for obtaining a rectangular frame corresponding to the target substance according to the candidate rectangular frame.

The system of (a), wherein the semantic segmentation module further comprises:

the segmentation network module is used for building a semantic segmentation network in a manner of adding deep persistence into the standard Unet;

the loss function module is used for combining the cross entropy loss function with the bias-dice loss function to obtain a loss function;

and the segmentation module is used for performing semantic segmentation on the target substance according to the semantic segmentation network and the loss function.

Compared with the prior art, the invention has the technical effects that:

1. the invention utilizes a target detection algorithm based on Focal local to position and judge the type of the target in the microscopic image, thereby solving the problem of influence of sample unbalance on the detection precision.

2. According to the invention, the image area corresponding to each rectangular frame is extracted, and the image area only containing the target substance part is segmented for visualization by utilizing a semantic segmentation algorithm, so that the operation of manual image matting by a doctor is omitted, and the time and the energy of the doctor are saved. And the single target presentation is more beneficial for a doctor to observe the target fine lesion, and the diagnosis precision is improved.

Drawings

FIG. 1 is a flow chart of a method for accurately identifying and segmenting a target under a microscopic image according to the present invention;

FIG. 2 is a diagram of the precise identification and segmentation of a target under a microscopic image according to the present invention;

FIG. 3 is a schematic diagram of a standard Unet incorporating the deep super vision architecture of the present invention;

FIG. 4 is a block diagram of a system for accurately identifying and segmenting objects under a microscopic image in accordance with the present invention.

Detailed Description

The invention is described in detail below with reference to the drawings and specific examples, but the invention is not limited thereto.

Fig. 1 is a flow chart of the method for accurately identifying and segmenting the target under the microscopic image according to the present invention.

Identification and fine segmentation in microscopic images is a common requirement in the study of biomedical images. If the research on the cell biological phenomenon needs to finely identify organelles and judge abnormal conditions, a single chromosome needs to be counted and finely divided to be presented in a report when the karyotype analysis is carried out, and the cell category under a single visual field needs to be counted when the bone marrow cell analysis is carried out in the blood examination.

These processes currently rely primarily on manual analysis and are inefficient. Through research, the defect can be realized by a deep learning target detection and analysis method. When a deep learning algorithm is investigated, the Focal local and a current main deep learning target detection model are well combined, and the influence of sample imbalance on detection precision is solved. The visual effect can be improved by adjusting the punishment degree of the target accuracy and the recall degree in the microscopic image in the design of the loss function of the segmentation model.

The method selects Focal local to be applied to a deep learning target detection framework, and ensures that the model can accurately identify the sample under the condition of unbalance. And applying Bce-bias-dice loss to make the model add punishment to the pixel points with missing prediction, thereby ensuring that the original information of the sample is kept on the visual map as much as possible.

Therefore, the microscopic image diagnosis method based on the target detection and semantic segmentation algorithm provided by the invention mainly aims to utilize the deep learning technology to help doctors, complete the complex work of detecting and segmenting the microscopic image target substance, and save the time and energy of the doctors. In the flow chart shown in fig. 1, the following steps are included:

step 10, identifying a target in a microscopic image based on a Focal local target detection algorithm to obtain a target substance;

in one embodiment, the step requires that the input microscopic image is uniformly preprocessed, and then the target in the microscopic image is accurately identified by using a target detection algorithm based on Focal loss. The target detection algorithm classifies the position and size of the target substance and determines the category to which each target substance belongs.

Step 20, performing semantic segmentation on the target substance based on a semantic segmentation algorithm, and acquiring an image area only containing the target substance from the microscopic image;

in one embodiment, after the target detection process is completed, the rectangular box corresponding to each target substance is detected. According to the invention, the image region corresponding to each rectangular frame is extracted, and the image region only containing the target substance part is segmented by utilizing a semantic segmentation algorithm for subsequent visualization, so that the operation of manual image matting by a doctor is omitted, and the time and the energy of the doctor are saved. And the single target presentation is more beneficial for a doctor to observe the target fine lesion, and the diagnosis precision is improved.

Step 30, visualizing the image area.

The method firstly completes the training of a target detection algorithm and a semantic segmentation algorithm, and then applies a trained model to predict the category of a target substance in a microscopic image and segment the target substance, as shown in FIG. 2.

The method for accurately identifying and segmenting the target under the microscopic image of the invention is further described by combining an embodiment with fig. 1 and 2, and mainly comprises the following four steps:

step 100, training data is preprocessed.

For all pictures, gaussian filtering is first used to filter out some of the noise in the picture, thereby reducing the error introduced by the image noise. For training pictures, the invention processes the pictures in a way of random inversion with a probability of 50%, random rotation between-5 degrees and random brightness and contrast change, thereby achieving the effect of data augmentation.

And 200, constructing a target detection model.

(201) VarifocalNet network

A VarifocalNet network, called VFNet for short, is a network structure for target region detection based on an FPN model. The model uses a VarifocalNet network to predict candidate rectangular boxes (bounding boxes) of the target region.

When the approximation degree of the candidate rectangular frame and the gt frame (ground-route), namely the prediction accuracy, is calculated, an Intersection over Union (IoU) is adopted, and the specific calculation method is as follows:

wherein IoU (A, B) indicates the overlapping degree of A and B, which reflects the accuracy of model prediction, A, B corresponds to the predicted candidate rectangular box region and gt box region, respectively.

The classification scores of the traditional target detection methods are often difficult to accurately measure the prediction quality. The confidence S of the candidate box that the prediction is accurate is low, so that it is not selected when Non-Maximum Suppression (Non-Maximum Suppression) is present.

The VarifocalNet network is adopted to combine the quality prediction of the target position with the classification score, and IoU-related classification score (IACS) is used as the detection score.

The Loss function adopted by the network training is variable local, and the Loss function is evolved according to the local. Focal local processes positive and negative samples are the same. The varifocalcanet network modifies the Focal local, and the expression form is as follows:

where VFL (p, q) is the Loss of the VarifocalNet network, p is the predicted IACS score, α is the weighted weight of the negative samples, and q is the IoU score of the target, i.e., q is IoU of gt box and predicted candidate rectangular box in positive samples, and q is 0 in negative samples, r is the attenuation coefficient of the Focal Loss, typically greater than 1. In the Varifocal local, p is used only for negative samples^rAnd attenuation is carried out, so that the supervision signals of the positive samples can be fully utilized, and the problem that the number of the positive samples in all the samples is small is effectively solved.

In the model, the candidate rectangular frame specifically adopts a representation form of a star-shaped bounding box (star-shaped bounding box) for predicting the target region. 9 fixed sampling points were used. Specifically, given a sampling point (x, y), x, y are coordinates of the sampling point, a 3 × 3 convolution regression is used to obtain a candidate rectangular box, a star bounding box is encoded as a four-dimensional vector (l ', t', r ', b') respectively representing distances from the sampling point to four edges, and then 9 (x, y), (x, y + b '), (x, y-t'), (x + r ', y), (x + r', y + b '), (x + r', y-t '), (x-l', y), (x-l ', y + b'), (x-l ', y-t') are heuristically selected. Because the points are obtained by manual setting and direct calculation, the prediction load is not required to be increased, and the calculation efficiency is high.

For the update of the star bounding box, it is a problem of residual learning. And learning scaling factors (delta l, delta t, delta r and delta b) of 4 distances in the training process, wherein the scaling factors (delta l, delta t, delta r and delta b) are used for changing the area of the initial star bounding box, and the specific updating is changed into a four-dimensional vector (l, t, r and b) ═ delta l × l ', delta t × t', delta r × r ', delta b × b'), which respectively represents the distances from the sampling point to the four edges, so that the updated star bounding box is closer to the gt box.

(202) Fast R-CNN network

The invention has compatibility to various target detection networks. The target detection network may also employ the Faster R-CNN network. The Faster R-CNN can be viewed as a combination of RPN and Fast R-CNN networks.

The Faster R-CNN Network uses an RPN (Region pro positive Network) to replace a Selective Search method to generate a rectangular frame, and the calculation speed is improved by 10 times. Each Feature point on a Feature map (Feature map) obtained after passing through a series of convolutional layers and pooling layers is mapped back to the central point of the receptive field in the original image as a reference point. A plurality of candidate boxes of different area and aspect ratios are selected based on each fiducial point. Wherein the area is selected from 3 kinds, respectively {128 }²,256²,512²}; the length-width ratios are 3, namely {1:1,1:2, 2: 1}. The two are combined with each other, and 9 candidate frames with different scales are preliminarily selected for each reference point.

The sampling of training data in the Faster R-CNN network was based on IoU scores. Dividing the samples into positive samples and negative samples, wherein the positive samples are selected as the samples with IoU values of the candidate rectangular box and the gt box being greater than 0.7, and if the samples with the highest score are not greater than 0.7, selecting the samples with the highest score as the positive samples; negative examples are candidate rectangular boxes with IoU values less than 0.3 for all gt boxes. Other samples are discarded.

Training the RPN network requires training classification tasks and regression tasks. The total loss is a linear weighted sum of two part loss functions.

The Loss function of the classification task takes the form of Focal local:

wherein L is_clsRefers to the loss function of the classification task, i represents each template box (anchor), N refers to the total number of template boxes, alpha refers to the weighting weight, p refers to the total number of template boxes_iRefers to the prediction probability, q, for the ith stencil mask_iThe i-th template frame is a real label, and gamma is an attenuation coefficient.

The loss function of the regression task takes the form:

for any x, define

Wherein L is_regRefers to the loss function of the regression task,

i-th coordinate value, v, of a predicted rectangular frame of the u-th class object_iThe ith coordinate value corresponding to the real target rectangular frame is indicated.

Loss of regression tasks using those commonly used in object detection models

Calculating a function, wherein x, y, w and h are respectively in the template frameCoordinates of the center point and the length and width, x, of the stencil frame^*，y^*，w^*，h^*The coordinate of the central point of the real target rectangular frame and the length width of the real target rectangular frame are respectively. In order to ensure the translation invariance and the length and width consistency of the coordinates, the coordinates are parameterized to generate a 4-dimensional vector t ═ t_x，t_y，t_w，t_h}. The specific calculation method is as follows:

wherein, subscript a represents a stencil box (anchor), superscript denotes a real target rectangle box, t_x，t_y，t_w，t_yCoordinates representing the template frame subjected to the parameterization processing,

and coordinates representing the real target rectangular frame subjected to parameterization processing.

The complete loss function of the RPN network is a weighted sum of the two task loss functions, which is specifically expressed as follows:

wherein, L ({ p)_i}，{t_i}) is a weighted sum of the sum,

in order to classify the loss function of the task,

as a loss function of the regression task, t_iParameterize the processed coordinate vector for the ith stencil frame,

parameterizing a real target rectangular frame corresponding to the ith template frame to obtain a processed coordinate vector p_iThe predicted probability of the ith stencil box,

for the real label corresponding to the ith template frame (if the template frame is a positive example value, 1, otherwise 0), N_clsFor the number of all stencil boxes, N_regFor the number of all cases, { p }_iRepresents the set of prediction probabilities for all stencil boxes, { t }_iRepresents the set of predicted coordinate vectors, λ, for all stencil boxes₁The weighting coefficient is defaulted to 1 in the present invention.

In a Fast R-CNN network part, a candidate rectangular frame obtained by the Fast R-CNN network through RPN network prediction is sampled to a feature with the same size, specifically a 7 x 7 feature map, through RoI Pooling Region of Interest Pooling (Region of Interest), and the feature map is flattened and then passes through a series of full-connection layers to obtain a prediction result. And performing classification prediction and regression prediction on the obtained feature vectors. The two parts are similar to the calculation method of the classification regression task of the RPN network. The specific loss function is expressed as follows:

L(p,u,t^u,v)＝L_cls(p,u)+λ₂[u≥1]L_reg(t^u,v)

wherein, L (p, u, t)^uV) is a loss function, L_cls(p, u) is a loss function of the classification task, L_reg(t^uV) is the loss function of the regression task, p is the softmax probability distribution predicted by the classifier, u is the true classification label of the corresponding target, t^uCoordinate vector of corresponding category u predicted by regressor corresponding to prediction rectangular box

v is the coordinate vector corresponding to the real target rectangular box (v_x,v_y,v_w,v_h)，λ₂Is a weighting coefficient, and defaults to 1 in the present invention.

As the RPN network and the Fast R-CNN network both need to use the CNN network to extract image features, in the Fast R-CNN network, the two parts share one feature extraction network. Sharing CNN network part weights.

As described above with respect to step 200, the process of identifying the target in a microscopic image based on the Focal target detection algorithm is performed, and in the actual process of detecting the target, the variacalnet network mainly reduces the weight of the negative sample through the Focal distance, and is mainly used for detecting the single-type target in the present invention; for multi-class target detection, the invention adopts a fast R-CNN network based on Focal local, and the effect is better.

And step 300, constructing a semantic segmentation network.

The semantic segmentation network part of the invention is mainly improved on UNet and a new loss function is designed.

(301) UNet with deep Suvision

In the standard UNet, the model is divided into two parts, an encoder and a decoder. The encoder receives the original image and performs the convolution and downsampling operation layer by layer, thereby encoding the original image into a plurality of feature maps. And the decoder receives the result of the encoder, fuses and upsamples layer by layer, and finally obtains a segmentation result. With the codec model, the network can process multi-scale features and produce edge-smoothed segmentation results.

According to the method, a deep hierarchy structure is added in a standard UNet, the output of each layer of decoder is subjected to upsampling and convolution to form a predicted segmentation result, then the predicted segmentation result is subjected to loss calculation by a label, and loss values of all layers are added to obtain a final loss value. The structure is shown in figure 3.

In order to quantify the segmentation result, the invention uses a mean dice index to evaluate the result, and the calculation formula is as follows:

here A, B correspond to the predicted pixel region and the true labeled pixel region, respectively.

In the implementation process, the segmentation network receives pictures with the size of 512 × 512 as input, and performs convolution pooling on each downsampling layer to reduce the length and width of the feature map by half and double the number of channels, so as to gradually increase the receptive field. At each upsampling layer, the feature map transmitted by the encoder is combined and the convolution and bilinear interpolation are utilized, so that the length and the width of the feature map are doubled and gradually restored to the size of the original image. The dimensions of each layer feature map are as follows:

Name	Shape
		Input
	5125123
		Downsample1	256256128
Downsample2	128128256
		Downsample3	6464512
Downsample4	32321024
		Upsample1	6464512
Upsample2	128128256
		Upsample3	256256128
Output	512512N

wherein, the sizes of the feature layers corresponding to the dowsamples 1 to 4 after four times of down sampling, and the sizes of the feature layers corresponding to the upsamples 1 to 3 after four times of up sampling are shown in fig. 3.

Through the analysis of a plurality of different scale feature layers, the model can more effectively extract features of different scales, and the segmentation accuracy is improved.

(302) Bce-bias-dice loss function

The Bias-dice loss function is as follows:

in the Bias-dice loss function, p_iRepresenting the probability of prediction as class i, g_iGT is represented as class i, N is the total number of classes, beta is a number greater than 1 to increase the model's false negative penalty, and epsilon is a very small number to prevent the denominator from being zero. To avoid bias terms with scores of 0 above and below. Then p is_ig_iIndicates True Positive (TP, True Positive), p_i(1-g_i) Indicates False Positive (FP, False Positive, g)_i(1-p_i) Representing False negatives (FN, False Negative), in the formula, by modulation of betaThe size can control the influence degree of FN on the whole Loss. As β increases, the FN contribution to overall Loss increases, thereby achieving the effect of reducing FN, increasing recall, and thus reducing the probability of missing pixels in the prediction.

Although the Bias-dice loss function can enable the model to retain more foreground information, the visualization effect is enhanced. However, the Bias-dice loss function deliberately reduces the penalty of false positive, and on the other hand, most of the microscopic image analysis has the background occupying the main body, which causes that the Bias-dice loss function alone will not predict the penalty of the background, which brings the disadvantages of unstable training and large fluctuation of the loss value. Aiming at the problem, the Bce-Bias-dice loss function is formed by combining the cross entropy loss function and the Bias-dice loss function, so that the problem that the Bias-dice loss function is not stable enough is solved, and the final loss function is as follows.

The Bce-bias-dice loss proposed by the present invention will be a function of the loss of UNet based on the deep super vision structure.

The technical effects of the method are shown in the following aspects:

1. the method utilizes a Focal local-based target detection algorithm to position and judge the type of the target in the microscopic image, and performs experiments on a Kaggle 2018data science bowl nucleus segmentation public data set which comprises 670 pictures. According to the following steps of 6: and 4, dividing the image into a training set and a testing set to obtain 402 images as training data and 268 images as testing data. The mAP reached 66.68 on the test set.

2. According to the invention, a deep learning semantic segmentation algorithm is utilized to perform semantic segmentation on a detected target, and an original image is cut to obtain 41363 subgraphs. According to the following steps of 6: 4, dividing the images into a training set and a testing set to obtain 29372 images as training data and 11991 images as testing data. Mean dice on test set reached 93.08.

Fig. 4 is a schematic diagram of a system for accurately identifying and segmenting an object under a microscopic image according to the present invention.

In fig. 4, a system 400 includes:

the target identification module 41 is used for identifying a target in a microscopic image based on a Focal local target detection algorithm to obtain a target substance;

the semantic segmentation module 42 is configured to perform semantic segmentation on the target substance based on a semantic segmentation algorithm, and acquire an image region only including the target substance from the microscopic image; and

a visualization module 43 for visualizing the image region.

Further, the system 400 further comprises: the preprocessing module 40 is used for uniformly preprocessing the microscopic image, namely processing the microscopic image in a random turning mode with a probability of 50%, a random rotation mode between-5 degrees and 5 degrees, and a random brightness and contrast change mode.

Further, the object recognition module 41 includes:

a model building module 411, configured to build a network model for target area detection;

the area prediction module 412 is configured to identify the microscopic image according to the network model to obtain a candidate rectangular frame;

and a target area module 413, configured to obtain a rectangular frame corresponding to the target substance according to the candidate rectangular frame.

The above-described object recognition in the embodiment of the precise recognition and segmentation of the object under the microscopic image is also applicable to the object recognition module 41.

Further, the semantic segmentation module 42 includes:

the segmentation network module 421 is configured to build a semantic segmentation network by adding deep persistence to the standard uet;

a loss function module 422, configured to combine the cross entropy loss function with the bias-dice loss function to obtain a loss function;

a segmentation module 423, configured to perform semantic segmentation on the target substance according to the semantic segmentation network and the loss function.

The semantic segmentation method in the above-described embodiment of accurately identifying and segmenting the target in the microscopic image is also applicable to the semantic segmentation module 42.

Compared with the prior art, the invention provides a method for accurately identifying and segmenting targets under a microscopic image, which is a microscopic image universal analysis method based on deep learning, can identify single targets in the microscopic image and reproduce high quality based on deep learning target detection and semantic segmentation, and solves the problem that time and labor are wasted when the microscopic image target identification and fine judgment are carried out manually.

The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it should be understood that various changes and modifications can be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for accurately identifying and segmenting a target under a microscopic image is characterized by comprising the following steps:

and step three, visualizing the image area.

2. The method of claim 1, further comprising, before said step one is performed: and performing unified pretreatment on the microscopic image.

3. The method of claim 2, wherein the uniform preprocessing is performed by using random flipping, random rotation between-5 ° and random shading and contrast variation on the microscopic image with a probability of 50%.

4. The method according to claim 1, wherein in the first step, the step of identifying the target in a microscopic image based on a Focal mass target detection algorithm comprises:

constructing a network model for target area detection;

5. The method according to claim 1, wherein in the second step, the step of semantically segmenting the target substance based on the semantic segmentation algorithm comprises:

6. A system for accurately identifying and segmenting a target under a microscopic image is characterized by comprising:

and the visualization module is used for visualizing the image area.

7. The system of claim 6, further comprising:

and the preprocessing module is used for uniformly preprocessing the microscopic image.

8. The system of claim 7, wherein the pre-processing module processes the microscopic images using random flipping, random rotation between-5 ° and 5 °, and random shading and contrast variation with a probability of 50%.

9. The system of claim 6, wherein the object recognition module further comprises:

10. The system of claim 6, wherein the semantic segmentation module further comprises: