CN110765908A

CN110765908A - Cascade type cancer cell detection system based on deep learning

Info

Publication number: CN110765908A
Application number: CN201910971767.1A
Authority: CN
Inventors: 彭雅琴; 刘正涛; 肖璞; 曹鹏飞; 张琳
Original assignee: Sanjiang University
Current assignee: Sanjiang University
Priority date: 2019-10-14
Filing date: 2019-10-14
Publication date: 2020-02-07

Abstract

The invention discloses a cascading type cancer cell detection system based on deep learning, which relates to the field of deep learning detection and solves the technical problem that medical record identification precision is not high when a pathological identification process is not combined. By constructing a deep learning network and a cascade pathology recognition system, the process of pathological image reading and judgment of a doctor is completely simulated, and meanwhile, medical record data is introduced into the process of pathology recognition, so that the treatment record of a patient is effectively applied, and an auxiliary decision is provided for pathology recognition.

Description

Cascade type cancer cell detection system based on deep learning

Technical Field

The disclosure relates to the field of deep learning detection, in particular to a cascading type cancer cell detection system based on deep learning.

Background

Thyroid lymph node metastatic cancer is a common disease, pathological examination is needed to be finally confirmed, the workload of a pathologist is very large, and the working state is influenced by various factors such as time, body and the like, so that an automatic pathological image judgment method needs to be provided to relieve the working pressure of the pathologist and improve the identification working efficiency.

Early pathological image machine recognition methods mainly include SVM, PCA-SVM, LOG filtering, adaptive threshold and the like, and after a deep learning algorithm is proposed, a Convolutional Neural Network (CNN) is introduced into pathological recognition due to excellent image recognition performance, and the method mainly focuses on cell morphology recognition. However, the identification process of a single cell morphology is complicated, and various pretreatments such as cell segmentation are required, and since cells may overlap, have various morphologies, and are difficult to segment, other feature extraction methods are generally required to be assisted, such as: artificial feature extraction, a fast learning method, a resolution self-adaptive method and the like.

In the actual pathological judgment process, a pathologist can firstly read the treatment record and then judge the pathological image, the process of reading the treatment record is very important, pathological section and analysis can be better performed by the pathologist, and the pathological section judgment method is particularly helpful for prompting the pathologist not to omit a certain diagnosis idea. However, the research point of the existing automatic pathology recognition method only focuses on the recognition process of the pathology image, and the reference of the clinic record is lacked.

Disclosure of Invention

The disclosure provides a cascading cancer cell detection system based on deep learning, and the technical purpose of high-precision cancer cell identification is achieved.

The technical purpose of the present disclosure is achieved by the following technical solutions:

a cascading cancer cell detection system based on deep learning comprises a sample establishing module, a data processing module, a model training module and a detection module;

the sample creation module comprises:

the medical record unit is used for converting paper medical records or electronic medical records into text medical record samples to form a medical record database;

the pathological unit is used for converting the pathological image into an electronic pathological sample to form a pathological image database;

the data processing module comprises:

the text processing unit is used for preprocessing the text case history sample and converting the text case history sample into a numerical vector;

the image processing unit is used for preprocessing the electronic pathological sample and comprehensively judging the color similarity and the texture similarity so as to extract a pathological image sample;

the model training module comprises:

the text feature extraction unit is used for extracting text feature vectors of the text medical record samples;

an image feature extraction unit which extracts an image feature vector of the pathological image sample;

and a model training unit;

the model training unit includes:

the first training unit is used for inputting the image feature vectors into a classification layer for training to generate a first detection model;

the second training unit is used for putting the text characteristic vector and the image characteristic vector into a classification layer for training to generate a second detection model;

and the detection module is used for inputting the electronic pathological sample into the first detection model for identification to obtain an identification result, outputting the identification result if the identification result is positive, and otherwise, inputting the text case history sample and the electronic pathological sample into the second detection model for identification at the same time.

Further, the image processing unit includes:

the color similarity judging unit selects the electronic pathological image k, normalizes each color channel and obtains a histogram of each color channel in an area i belonging to the k

The total number of color channels of i is

The pixel size of i is x y; if the staining area learning sample graph is z, the pixel size of z is x y, then the reference color similarity of the area i is W ═ W_i-W_zCalculating W in the electronic pathological image k according to a sliding window mode, wherein the sliding unit is l, and the l belongs to [ x/8, x/2 ]]If the W value is the minimum, the area with the minimum W value is the first interested area;

a texture similarity judging unit for normalizing the first region of interest to obtain Gaussian differential T of each color channel of the region j in different directions_j＝{T_j ¹,...,T_j ^rJ, the total texture amount of the region j is

The region j belongs to the first region of interest, and the pixel size of the region j is x '. multidot.y'; if the cell region learning map is z ', the pixel size is x '. times.y ', and the reference texture similarity of the region j is V ═ V_j-V_z′Calculating V in the first region of interest in a sliding window manner, wherein the sliding unit is l 'and the l' is in [ x '/8, x'/2 ]]If the V value is the minimum, the region with the minimum V value is a second region of interest;

an image sample extraction unit that extracts a pathology image sample from the second region of interest;

wherein m, n, r and s are positive integers.

Further, the first region of interest is at most 5 regions with pixel sizes x y, wherein the W values are sorted from small to large; the second interested region is at most 5 regions with the pixel size of x '. multidot.y' with the V values sorted from small to large.

Further, the text feature extraction unit comprises at least two network layers and outputs a text feature vector; the image feature extraction unit comprises 9 convolutional layers, 9 pooling layers and 1 full-connection layer and outputs image feature vectors.

Further, the model training unit is trained using a BP algorithm.

Further, the text medical record sample comprises a number, a name, a gender, an age, a doctor visit, a date and a record of the doctor visit.

Further, the electronic pathology sample includes a number, a name, a pathology image ID, a pathology image, and a pathology report.

The beneficial effect of this disclosure lies in: the sample establishing module establishes a medical record database and a pathological image database, extracts the characteristics of a text medical record sample and a pathological image sample through the characteristic extracting unit, inputs the characteristics into the classification layer for training and generates a detection model, the detection model firstly identifies the electronic pathological sample, if the identification result is positive, the result is directly output, and otherwise, the text medical record sample and the electronic pathological sample are simultaneously input into the detection model for identification. According to the system, the deep learning network and the cascading type pathology recognition system are constructed, the process of pathological image reading and judgment of a doctor is completely simulated, and meanwhile, medical record data are introduced into the process of pathology recognition, so that the treatment record of a patient is effectively applied, and an auxiliary decision is provided for the pathology recognition.

Drawings

FIG. 1 is a schematic view of the disclosed system;

fig. 2 is an identification flow diagram of the present disclosure.

Detailed Description

The technical scheme of the disclosure will be described in detail with reference to the accompanying drawings. In the description of the present disclosure, it is to be understood that the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated, but merely as distinguishing between different components.

FIG. 1 is a schematic diagram of a system of the present disclosure including a sample building module, a data processing module, a model training module, and a detection module. The sample establishing module comprises a medical record unit and a pathological unit; the data processing module comprises a text processing unit and an image processing unit; the model training module comprises a text feature extraction unit, an image feature extraction unit and a model training unit, and the model training unit comprises a first training unit and a second training unit.

The image processing unit comprises a color similarity judging unit, a texture similarity judging unit and an image sample extracting unit. The color similarity judging unit normalizes each color channel in the region i by adopting an L1-norm method (an absolute value addition method), and the texture similarity judging unit normalizes the region j by adopting an L1-norm method.

The working principle of the color similarity judging unit is as follows: selecting an electronic pathological image k, normalizing each color channel to obtain a histogram of each color channel in an area i e k

The total number of color channels of the region i is

The pixel size of the region i is x y. If the learning sample map of the staining area is z, and the pixel size of z is x y, the similarity of the reference color of the area i is W ═ W_i-W_zCalculating W in the electronic pathological image k according to a sliding window mode, wherein the sliding unit is l, l belongs to [ x/8, x/2 ]]And the area with the minimum W value is the first interested area.

The working principle of the texture similarity judging unit is as follows: normalizing the first region of interest to obtain the Gaussian differential T of each color channel of the region j in different directions_j＝{T_j ¹,...,T_j ^rJ, the texture of the region j is summedMeasured as

The region j belongs to the first region of interest and the pixel size of the region j is x '× y'. If the cell region learning map is z ', the pixel size is x '. times.y ', and the reference texture similarity of the region j is V ═ V_j-V_z′And calculating V in the first region of interest in a sliding window mode, wherein the sliding unit is l ', l' belongs to [ x '/8, x'/2 ]]And the region with the minimum V value is the second region of interest. Wherein m, n, r and s are positive integers.

Finally, the image sample extraction unit extracts a pathology image sample from the second region of interest.

The W value and the V value in the color similarity judging unit and the texture similarity judging unit are calculated in a sliding window mode, namely, an area with a fixed pixel size is selected according to a sliding unit, the color similarity and the texture similarity of each area are calculated, and the smaller the W value and the V value are, the closer the color and the texture are.

As a specific embodiment, in order to more accurately identify the pathological image, at most 5 regions with pixel sizes x y and W values sorted from small to large may be selected from the first region of interest, and then texture similarities of the at most 5 regions are respectively calculated to obtain a second region of interest of each region. Likewise, at most 5 regions with pixel sizes x '. times.y' with V values sorted from small to large are selected in the second region of interest, and the image sample extraction unit extracts pathological image samples from the at most 5 regions, thereby enabling the final pathological image identification to be more accurate.

Fig. 2 is a flow chart of the identification process of the present disclosure, which is to preprocess a text medical record sample in a medical record database to extract a text feature vector thereof, obtain an interested region of a pathological image to obtain a pathological image sample, extract an image feature vector of the pathological image sample, put the image feature vector into a classification layer for training to generate a first detection model, put the text feature vector and the image feature vector into the classification layer for training to generate a second detection model. After the model training is finished, putting the electronic pathological sample into a first detection model for identification to obtain an identification result, and if the identification result is positive, outputting the identification result; otherwise, the text medical record sample and the electronic pathology sample are simultaneously input into the second detection model for identification, and a final identification result is output.

As a specific embodiment, the text feature extraction unit includes at least two network layers, the image feature extraction unit includes 9 convolutional layers, 9 pooling layers, and 1 full-link layer, and the network weights are randomized.

As a specific embodiment, the structure of the classification layer is composed of 5-layer network layers, the first layer is an input layer, the middle three layers are hidden layers, a full connection mode is adopted, the last layer is an output layer, and the adopted algorithm is a BP (back propagation of error) algorithm.

As a specific embodiment, the text medical record sample includes a number, a name, a gender, an age, a doctor visit, a date and a record of the doctor visit; the electronic pathology sample includes a number, a name, a pathology image ID, a pathology image, and a pathology report. The text medical record samples and the electronic pathology samples are concatenated by number and name.

The foregoing is an exemplary embodiment of the present disclosure, and the scope of the present disclosure is defined by the claims and their equivalents.

Claims

1. A cascading type cancer cell detection system based on deep learning is characterized by comprising a sample establishing module, a data processing module, a model training module and a detection module;

the sample creation module comprises:

the data processing module comprises:

the model training module comprises:

and a model training unit;

the model training unit includes:

2. The deep learning based cascaded cancer cell detection system of claim 1, wherein the image processing unit comprises:

The total number of color channels of i is

wherein m, n, r and s are positive integers.

3. The deep learning based cascading cancer cell detection system of claim 2, wherein the first region of interest is at most 5 regions of pixel size x y where the W values are sorted from small to large; the second interested region is at most 5 regions with the pixel size of x '. multidot.y' with the V values sorted from small to large.

4. The deep learning based cascading type cancer cell detecting system of claim 3, wherein the text feature extracting unit comprises at least two network layers and outputs a text feature vector; the image feature extraction unit comprises 9 convolutional layers, 9 pooling layers and 1 full-connection layer and outputs image feature vectors.

5. The deep learning based cascading cancer cell detection system of claim 4, wherein the model training unit is trained using a BP algorithm.

6. The deep learning based cascading cancer cell detection system of any one of claims 1-5, wherein the text case history sample comprises a number, a name, a gender, an age, a physician visit, a date and a record of the visit.

7. The deep learning based cascading systems of cancer cell testing of any one of claims 1-5, wherein the electronic pathology sample includes a number, a name, a pathology image ID, a pathology image, and a pathology report.