disclosure of Invention
the present invention aims to overcome the above problems of the prior cervical cell quantitative analysis and auxiliary diagnosis; based on the knowledge in the field of the existing cervical cytology, the method learns and extracts the key characteristics of the cervical cells through a deep learning technology, automatically segments and identifies the cancerous area and type of the cells on the liquid-based smear, finally achieves the purposes of shortening the smear reading time and reducing the missed diagnosis misdiagnosis rate, provides a solution for the artificial intelligent auxiliary smear reading technology of the cervical cell liquid-based smear, and finally realizes the methodological breakthrough of quantitative evaluation and auxiliary intelligent diagnosis of the cervical cells.
in order to achieve the above object, the present invention provides an artificial intelligence auxiliary smear system for cervical cell fluid-based smear, which comprises:
the cell image acquisition module is used for scanning and storing cell images in an overlapped mode by using the automatic smear scanner;
The cell image preprocessing module is used for preprocessing the cell image;
The cell image detection and segmentation module is used for automatically detecting different cell components of the image cell and automatically segmenting a cell nucleus, a cell cytoplasm and a background in the same cell component; an improved active contour model and a fast regional convolutional neural network are adopted for detection, and algorithms such as cell characteristics and multi-scale levels are used for fine adjustment and optimization of segmentation results;
The cell rapid grading identification module is used for identifying the segmented image and distinguishing the segmented image into single cells or cell clusters; respectively carrying out hierarchical identification on single cells by adopting a double-current convolutional neural network in the additional knowledge field and a constructed cell knowledge map to respectively obtain a first hierarchical result and a second hierarchical result; identifying the non-separable cell clusters by adopting a double-current convolution neural network model of the cell clusters; and
The interpretation and post-processing module is used for carrying out combined interpretation on the first grading result and the second grading result of the single cell and carrying out conflict processing to obtain the grading result of the single cell; the conflict processing is used for solving the problem that when various characteristics point to different interpretation results, various factors are integrated, conflicts are eliminated, and clear and reliable interpretation is made; and then, readability of the cervical cell identification process and interpretability of the cervical cell identification result are realized by using the knowledge map and the activity-like mapping.
In the above technical solution, the cell image acquisition module is implemented by: the scanning range and the range of the liquid-based smear cells are completely covered by adopting the forty-fold magnification of an ocular lens, the scanning path is rectangular, and the scanning mode is overlapped scanning.
In the above technical solution, the preprocessing includes: the image is denoised by adopting a bilateral filter, wherein the filter consists of two functions: one function determines the filter coefficients from the geometric spatial distance and the other determines the filter coefficients from the pixel difference; the edges of the image are then patched using morphological processing, filling the holes and removing fine junctions, and finally histogram equalization is used to increase the contrast of the nucleus to the cytoplasm.
in the above technical solution, the cell image detection and segmentation module specifically comprises the following steps:
step S1), carrying out foreground and background rough segmentation on the preprocessed cell image, and extracting the region to which the cell belongs;
Step S2) detecting and segmenting cell components of the roughly segmented cell image, and segmenting cells of different types by using a fast regional convolution neural network;
Step S3) detecting and segmenting the cervical cell nuclei;
Step S4) screening cell nucleuses according to the characteristic parameters of the cell nucleuses to obtain final candidate cell nucleuses;
Step S5), judging whether the cell type obtained in the step S2) is a cell cluster, if not, using the active contour model and the prior template to segment the cytoplasm region; otherwise, go to step S6);
Step S6) the segmentation results of the cell nucleus and the cytoplasm and the domain knowledge are integrated for post-processing, and the effective segmentation of the whole cervical cell is completed.
in the above technical solution, the fast regional convolutional neural network adopts a network structure of VGG16 of convolutional neural network, the size of the input image is 515 × 512, and the detection categories of the final cell components are classified into 5 categories: squamous cells, glandular cells, cervical cells, metaplastic cells, and background diathesis, the cellular components excluding squamous cells and background diathesis are defined as non-separable cell clusters.
In the above technical solution, the specific process of segmenting the cytoplasmic region by using the active contour model and the prior template in step S5) includes:
An improved active contour model is adopted, an energy function and shape prior information are added, contour optimization is carried out iteratively, and accurate boundaries of cytoplasm are obtained;
the energy function E (u) is:
E(u)=λ1Es(u)+R(u)
Wherein Es(u) is the shape prior, R (u) is a regularization term to ensure smoothness of the segmentation boundary, λ1is a learnable parameter; shape prior Es(u) is:
wherein H is a blackplug matrix.
In the above technical solution, the cell image detection and segmentation module specifically comprises the following steps:
Step 1) preprocessing the segmented cervical cell image;
Step 2) judging whether the cell image after pretreatment is a single cell, if so, switching to step 3), otherwise, switching to step 6 if the image is an inseparable cell cluster;
step 3) determining a calculable cell parameter, and then calculating cell parameter characteristics;
Step 4) establishing a cell knowledge map inference judgment model, and inputting cell parameter characteristics into the model to obtain a first grading result of a single cell;
step 5) constructing a double-current convolutional neural network model with additional domain knowledge, and inputting cell parameter characteristics and cell images into the double-current convolutional neural network model to obtain a second grading result of a single cell;
And 6) constructing a double-current convolutional neural network model of the cell clusters, and carrying out hierarchical identification on the cell clusters of the undifferentiated cell clusters by using the model to obtain a hierarchical result of the cell clusters.
in the technical scheme, one input of the double-current convolutional neural network with additional domain knowledge is the cell parameter characteristics obtained in the step 3), the other input is a single cell image, the size of the single cell image is unified and normalized to 256 × 256 pixel values, and the characteristics of the cell image are implicitly extracted through 5 cascaded convolutional pooling combined modules; the convolution kernel size of the most important convolution operation is 7 × 7, the step size is selected to be 1, the number of feature maps is selected to be 96, and the convolution operation is as follows:
in the above formula, M represents a set of selected input feature maps, wijrepresents a weight, bjan additional bias is added to each feature map output, and the extracted 1096-dimensional features are then spliced together with 20-dimensional features that can be computed from cell domain knowledge and input to the fully-connected and classified layers of the dual-flow convolutional neural network.
in the above technical solution, one input of the double-current convolutional neural network of the cell clusters is: the method comprises the following steps that (1) the cell nuclei are regularly arranged, the other path of input is cervical cells of cell clusters corresponding to cell parameters, the input sizes of the cervical cells are uniformly normalized to 512 x 512 pixel values, and the cell image characteristics are implicitly extracted through 8 cascaded convolution pooling combined modules; the convolution kernel size of the most important convolution operation is 5 × 5, the step size is selected to be 2, and the number of feature maps is selected to be 108.
The invention has the beneficial effects that:
1. the system of the invention has high sensitivity for pathological cervical cell systems and high specificity for normal cervical cells, the whole auxiliary film reading system does not need manual participation, and the labor intensity of film reading workers is greatly reduced.
Detailed Description
the following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the invention easier to understand by those skilled in the art, and thus will clearly and clearly define the scope of the invention.
as shown in fig. 1, an artificial intelligence auxiliary smear system for cervical fluid-based smear, the system comprises:
the cell image acquisition module is used for scanning and storing cell images in an overlapped mode by using the automatic smear scanner; the scanning device adopts the four-fold magnification of an ocular lens, the scanning path is rectangular, and the scanning mode is overlapped scanning, so that the scanning range and the range of the liquid-based smear cells can be completely covered;
for example, 20 scanned images of cervical cells can be obtained for one image of 2 ten thousand by 2 ten thousand pixels.
the cell image preprocessing module is used for preprocessing the cell image and comprises: the image is denoised by adopting a bilateral filter, wherein the filter consists of two functions: one function determines the filter coefficients from the geometric spatial distance and the other determines the filter coefficients from the pixel difference; the edges of the image are then patched using morphological processing, filling the holes and removing fine junctions, and finally histogram equalization is used to increase the contrast of the nucleus to the cytoplasm.
the cell image detection and segmentation module is used for automatically detecting different cell components of the image cell and automatically segmenting a cell nucleus, a cell cytoplasm and a background in the same cell component; an improved active contour model and a fast regional convolutional neural network are adopted for detection, and algorithms such as cell characteristics and multi-scale levels are used for fine adjustment and optimization of segmentation results;
As shown in fig. 2, the specific implementation steps are as follows:
step S1), carrying out foreground and background rough segmentation on the preprocessed cell image, and extracting the region to which the cell belongs;
Removing noise of the whole cell image by adopting 9 threshold values; then, SIFT edge detection and a multi-scale watershed algorithm are adopted to obtain a foreground region, the SIFT keeps invariance to rotation, scale scaling and brightness change, and also keeps certain stability to view angle change, affine transformation and noise; and finally, carrying out optimization fine adjustment on the segmented regions, and merging adjacent pixels by adopting a K-means clustering algorithm, wherein the cluster number of the clustering algorithm is selected to be 3, namely, the clustering algorithm is divided into three categories of cytoplasm, nucleus and background.
Step S2), detecting and segmenting cell components of the roughly segmented cell image, and segmenting different types of cells and background qualities by using a fast regional convolution neural network;
Firstly, cleaning and sorting data aiming at a training sample, and then detecting and segmenting cell components by adopting a fast regional convolution neural network in a deep learning algorithm. The training stage selects an end-to-end training mode, the model adopts a network structure of VGG16 of a convolutional neural network, the size of an input image is 515 × 512, and finally the detection categories of cell components are classified into 5 types: squamous cells, glandular cells, cervical cells, metaplastic cells, and background diathesis, four classes of cellular components other than squamous cells are defined herein as non-separable cell clusters. Considering the artificial intelligence assisted reading and subsequent grading identification steps, the inseparable cell cluster region is divided into four categories of glandular cells, cervical cells, metaplastic cells and background diathesis in detail, wherein the background diathesis does not belong to the cell category and does not participate in the subsequent grading identification.
step S3) detecting and segmenting the cervical cell nuclei;
an improved random forest algorithm is adopted, 5 characteristics of cell nucleuses are extracted to carry out segmentation on cell nucleus areas, the number of trees in the forest is selected to be 20, the characteristic number is selected to be logN when the optimal characteristic is selected each time, and the minimum number of leaf nodes is selected to be 3;
in order to prevent the cell nucleus region from being missed and detect the cell nucleus by a multi-scale watershed algorithm, 5 different parameters are selected for combination, an image to be segmented is divided into cell images with different scales and high combination degree and low combination degree, and the detected results of the cell images and the cell images are combined to serve as candidate regions of the cell nucleus.
Step S4) screening cell nucleuses according to the characteristic parameters of the cell nucleuses to obtain final candidate cell nucleuses;
the characteristics can be calculated according to characteristic parameters of the cell nucleus, including the size of the cell nucleus, the circularity of the cell nucleus and the depth of the cell nucleus. A parameter such as the size of the nucleus is represented by directly calculating the sum of the pixels within the boundary of the nucleus region, as shown in equation 1.
wherein f (x, y) is the pixel value of a certain point (x, y) on the binary image, the pixel point belongs to the target area when the value is 1, the pixel point belongs to the background area when the value is 0, and the area is the number of pixels of which f (x, y) is 1.
Step S5), judging whether the cell type obtained in the step S2) is a cell cluster, if not, using the active contour model and the prior template to segment the cytoplasm region; otherwise, go to step S6);
an improved active contour model is adopted, an energy function and shape prior information are added, contour optimization is carried out in an iterative mode, and accurate boundaries of cytoplasm are obtained;
the energy function E (u) is:
E(u)=λ1Es(u)+R(u)
wherein Es(u) is the shape prior proposed for use by the invention, R (u) is a regularization term that ensures smoothness of the segmentation boundary, λ1Is a learnable parameter; shape prior Es(u) is:
Where H is the blackplug Matrix (Hessian Matrix).
Step S6) the segmentation results of the cell nucleus and the cytoplasm and the domain knowledge are integrated for post-processing, and the effective segmentation of the whole cervical cell is completed.
And (2) carrying out post-processing aiming at the segmentation result of the step, mainly using morphological operation in medical image processing to repair the edge of the image, filling the cavity and removing fine connection, mainly adopting opening operation and closing operation, wherein the parameter of the template is set to be the size of [ 33 ], and smoothing the boundary by using methods such as filtering and denoising, wherein the parameter is set to be w to be 2, and the variance sigma is set to be [ 20.1 ], so as to obtain final accurate boundary information.
the cell rapid grading identification module is used for identifying the segmented image and distinguishing the segmented image into single cells or cell clusters; carrying out hierarchical identification on single cells by adopting a double-current convolutional neural network in the additional knowledge field and a constructed cell knowledge map; identifying the non-separable cell clusters by adopting a double-current convolution neural network model of the cell clusters;
As shown in fig. 3, the specific implementation steps are as follows:
Step 1) preprocessing the segmented cervical cell image;
and scanning the segmented cervical cell region, filling cell boundary pixel values, filling pixel values outside the cell boundary to be 0, and then uniformly normalizing the cell image filled with the pixel values to 256 × 256 pixel values.
step 2) judging whether the cell image after pretreatment is a single cell, if so, switching to step 3), otherwise, switching to step 7 if the image is an inseparable cell cluster;
and (3) realizing by using a watershed algorithm in image processing, and judging that the cell is a single cell if the number of cell nucleuses in the cell image is 1.
Step 3) determining calculable cell parameters including the size, depth and shape of cell nucleus, the size and shape of cytoplasm and nuclear plasma ratio, and then calculating the characteristics of the cell parameters;
taking the size of the cell nucleus as an example, the size parameter of the cell nucleus is represented by directly calculating the sum of pixels within the boundary of the cell nucleus region:
Wherein f (x, y) is the pixel value of a certain point (x, y) on the binary image, the pixel point belongs to the target area when the value is 1, the pixel point belongs to the background area when the value is 0, and the area is the number of pixels of which f (x, y) is 1.
Step 4) establishing a cell knowledge map inference judgment model, and inputting cell parameter characteristics into the model to obtain a first grading result of cells;
step 5) constructing a double-current convolutional neural network model with additional domain knowledge, and inputting cell parameter characteristics and cell images into the double-current convolutional neural network model to obtain a second grading result of the cells;
As shown in fig. 2, one input of the dual-flow convolutional neural network is the cell parameter characteristics obtained in step 3), the other input is a single cell image, the size of the single cell image is uniformly normalized to 256 × 256 pixel values, and the characteristics of the cell image are implicitly extracted through 5 cascaded convolutional pooling modules. The convolution kernel size of the most important convolution operation is 7 × 7, the step size is selected to be 1, the number of feature maps is selected to be 96, and the convolution operation is as follows:
m represents a set of selected input feature maps, wijrepresents a weight, bjAn additional bias is added to each feature map output, and the extracted 1096-dimensional features are then spliced together with 20-dimensional features that can be computed from cell domain knowledge and input to the fully-connected and classified layers of the dual-flow convolutional neural network. Based on TBS criteria diagnosis, the fractional identification of different cells was combined together, totaling 9 classes.
As shown in fig. 3, the cytological features are based solely on the TBS criteria, and the extraction is of the language that uses the criteria. Cells can be classified in different abstract levels, and from the level of existence of pathological changes, the cells are mainly divided into two categories, namely normal cells and abnormal cells, wherein the normal cells comprise columnar cells, middle-layer cells and surface-layer cells; abnormal cells include mild squamous intraepithelial lesion cells, moderate squamous intraepithelial lesion cells, severe squamous intraepithelial lesion cells, squamous cell carcinoma cells.
the interpretation rule base is also completely based on the interpretation process and thought of the reader, such as the mapping of the color characteristics of cytoplasm, wherein the colors having interpretation significance for cervical lesions are blue, pink and orange, and the cytological terms are usually called basophilic, eosinophilic and orange, i.e. basophilic cytoplasm is blue, eosinophilic cytoplasm is pink and orange.
step 6), constructing a double-current convolution neural network model of the cell clusters, and carrying out hierarchical identification on cluster cells of the undifferentiated cell clusters by using the model;
one input of the double-current convolutional neural network of the cell clusters is as follows: the invention unifies and normalizes the input size of the cervical cells into 512 by 512 pixel values, and implicitly extracts the characteristics of the cell image through 8 cascaded convolution pooling combination modules. The convolution kernel size of the most important convolution operation is 5 × 5, the step size is selected to be 2, and the number of feature maps is selected to be 108.
the interpretation and post-processing module is used for carrying out combined interpretation on the first grading result and the second grading result of the single cell and carrying out conflict processing to obtain the grading result of the single cell; readability of the cervical cell identification process and interpretability of the cervical cell identification result are achieved by using a knowledge map and a CAM (class activity Mapping) method.
for example, if the first and second classification results for a cell are both the result of squamous cell carcinoma, then the cell is interpreted as squamous cell carcinoma.
The conflict processing mainly solves the problem that when various characteristics point to different interpretation results, various factors are integrated, conflicts are eliminated, and clear and reliable interpretation is made. For example, a cell may be classified as squamous cell carcinoma by a first classification (a dual-flow convolutional neural network model) and as squamous intraepithelial lesion of a lower classification by a second classification (a knowledge mapping model), which is a conflict of results.
the above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.