CN110852396A - Sample data processing method for cervical image - Google Patents
Sample data processing method for cervical image Download PDFInfo
- Publication number
- CN110852396A CN110852396A CN201911125170.1A CN201911125170A CN110852396A CN 110852396 A CN110852396 A CN 110852396A CN 201911125170 A CN201911125170 A CN 201911125170A CN 110852396 A CN110852396 A CN 110852396A
- Authority
- CN
- China
- Prior art keywords
- data
- sample
- image
- data set
- cervical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 15
- 238000000034 method Methods 0.000 claims abstract description 38
- 238000012545 processing Methods 0.000 claims abstract description 27
- 238000012549 training Methods 0.000 claims abstract description 20
- 238000012360 testing method Methods 0.000 claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 12
- 238000012795 verification Methods 0.000 claims abstract description 12
- 238000010276 construction Methods 0.000 claims abstract description 5
- 230000001502 supplementing effect Effects 0.000 claims abstract description 3
- 230000011218 segmentation Effects 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 6
- 230000000877 morphologic effect Effects 0.000 claims description 5
- 238000012217 deletion Methods 0.000 claims description 4
- 230000037430 deletion Effects 0.000 claims description 4
- 230000002194 synthesizing effect Effects 0.000 claims description 4
- 239000012535 impurity Substances 0.000 claims 1
- 238000003745 diagnosis Methods 0.000 abstract description 19
- 230000000694 effects Effects 0.000 abstract description 6
- 230000003902 lesion Effects 0.000 description 16
- 238000012216 screening Methods 0.000 description 14
- 230000006872 improvement Effects 0.000 description 9
- 238000013527 convolutional neural network Methods 0.000 description 8
- 238000004195 computer-aided diagnosis Methods 0.000 description 7
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 7
- 238000000605 extraction Methods 0.000 description 7
- 238000001914 filtration Methods 0.000 description 7
- 201000010099 disease Diseases 0.000 description 6
- 238000003709 image segmentation Methods 0.000 description 6
- 210000002569 neuron Anatomy 0.000 description 5
- 206010061218 Inflammation Diseases 0.000 description 4
- 238000013145 classification model Methods 0.000 description 4
- 238000004140 cleaning Methods 0.000 description 4
- 230000004054 inflammatory process Effects 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 206010008263 Cervical dysplasia Diseases 0.000 description 3
- 206010008342 Cervix carcinoma Diseases 0.000 description 3
- 206010028980 Neoplasm Diseases 0.000 description 3
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 201000011510 cancer Diseases 0.000 description 3
- 201000010881 cervical cancer Diseases 0.000 description 3
- 238000013136 deep learning model Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 238000013526 transfer learning Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 201000003565 cervix uteri carcinoma in situ Diseases 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000009595 pap smear Methods 0.000 description 2
- 230000002265 prevention Effects 0.000 description 2
- 208000022159 squamous carcinoma in situ Diseases 0.000 description 2
- 208000022625 uterine cervix carcinoma in situ Diseases 0.000 description 2
- 208000005016 Intestinal Neoplasms Diseases 0.000 description 1
- 208000000453 Skin Neoplasms Diseases 0.000 description 1
- 208000005718 Stomach Neoplasms Diseases 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000002902 bimodal effect Effects 0.000 description 1
- 210000003679 cervix uteri Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 238000002573 colposcopy Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000008014 freezing Effects 0.000 description 1
- 238000007710 freezing Methods 0.000 description 1
- 206010017758 gastric cancer Diseases 0.000 description 1
- 210000001035 gastrointestinal tract Anatomy 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 201000002313 intestinal cancer Diseases 0.000 description 1
- 231100000225 lethality Toxicity 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 201000000849 skin cancer Diseases 0.000 description 1
- 201000011549 stomach cancer Diseases 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 229960005486 vaccine Drugs 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24143—Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/20—Image enhancement or restoration using local operators
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/136—Segmentation; Edge detection involving thresholding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20024—Filtering details
- G06T2207/20032—Median filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30096—Tumor; Lesion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Medical Informatics (AREA)
- Quality & Reliability (AREA)
- Radiology & Medical Imaging (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a sample data processing method of a cervical image, which comprises the following steps: establishing classification; preprocessing data; dividing; data enhancement: classifying the target image data, confirming the difference between various target image data, and implementing enhancement processing aiming at the difference; and (3) equalization processing: aiming at the total amount difference between various target image data, supplementing a few types of samples by adopting data fitting to realize the total amount balance between various target image data; and (3) data set construction: aiming at various target image data after equalization processing, respectively and randomly dividing the target image data into a training data set, a verification data set and a test data set in proportion; constructing a model: based on the training data set and/or the verification data set and/or the test data set, the data set is mapped to the comparison data set to obtain the corresponding classification of the sample data. The method and the device improve and solve the problem of data imbalance in cervical image data classification, improve the precision and efficiency of image classification, and improve the effect and quality of auxiliary diagnosis.
Description
Technical Field
The invention belongs to a computer-aided application method in the medical field, and particularly relates to a sample data processing method of a cervical image.
Background
The cervical region lesion has definite reasons, so that clinical prevention can be realized. Thus, the high mortality rate of cervical cancer can be alleviated to a large extent. However, as China is still in developing countries and population density is high, HPV prevention vaccines are still difficult to popularize comprehensively. Therefore, screening for early cervical lesions is still the main measure and method for preventing and treating cervical related diseases. At present, the main methods for screening cervical lesions in various hospitals are Pap smear (Pap test), liquid-based cytology (TCT), HPV-DNA detection, electronic colposcopy and histopathological detection. However, the mainstream precancerous lesion screening methods still have respective defects, so that the purpose of diagnosis confirmation can be achieved by combining multiple diagnosis methods in some cases, and the accuracy of precancerous screening needs to be improved to a certain extent. In addition, the final determination of the disease condition requires the physician to carefully observe and analyze the lesion area or the lesion image to draw a conclusion, which imposes a high requirement on the expertise of the physician. For the condition that a doctor needs to make a diagnosis by combining medical images, the doctor is easy to fatigue due to long-time reading and image observation, and the accuracy of disease diagnosis is further influenced. The realization of a novel cervical lesion screening auxiliary diagnosis system is particularly necessary by combining the current situation of the traditional medical field, the characteristics of high cervical cancer incidence rate and strong lethality and the rapid development of current artificial intelligence and machine learning technologies.
At present, computer-aided diagnosis (CAD) systems such as gastric cancer, skin cancer, digestive tract cancer, intestinal cancer and the like which need to combine a medical endoscope and a dermatoscope have been vigorously developed, the realization of the CAD systems needs to use a large number of color images acquired by medical equipment, after preprocessing operations such as image filtering, image enhancement, image segmentation and the like are performed on image data, valuable image features are selected by using feature extraction and feature screening, the selected features are sent into a machine learning classification model for training, and finally, the CAD system with better effect is obtained by adjusting model parameters. However, the existing auxiliary diagnosis systems for cervical region lesions are relatively few, and the traditional cancer tumor classification diagnosis is mostly two classifications. In addition, the current situation that the related performance of the traditional machine learning algorithm on pathological classification reaches a certain bottleneck is achieved.
Disclosure of Invention
As the gynecological malignant tumor disease with the only definite etiology, the cervical cancer has the characteristics of high clinical morbidity and mortality. Therefore, a clear direction is provided for clinical diagnosis and treatment, the cure rate of patients can be greatly improved, and the death rate is reduced. The invention aims to design a cervical lesion screening auxiliary diagnosis system based on deep learning, which can relieve the diagnosis pressure of doctors and greatly improve the accuracy of disease diagnosis. The invention firstly uses the electronic colposcope to collect color images of cervical regions from different patients, and obtains usable patient image data through data cleaning; then, carrying out a series of preprocessing operations such as image filtering, image segmentation (ROI extraction), image enhancement and the like on the acquired image; aiming at the problems of unbalanced data and small data quantity of the lesion to be classified, carrying out balancing treatment by means of SMOTE algorithm, data enhancement and the like; and (3) sending the processed image data into a deep learning model for learning and training to finally obtain six classification results of normal, inflammation, cervical intraepithelial neoplasia I (CIN I), cervical intraepithelial neoplasia II (CIN II), cervical intraepithelial neoplasia III (CIN III) and canceration. The method plays a good auxiliary role in the definite diagnosis of the cervical region lesion.
In order to achieve the above object, the present invention discloses a method for processing sample data of a cervical image, comprising the following steps:
establishing classification: establishing a contrast data set, and acquiring a classification standard for cervical image characteristics on the basis of the contrast data set; the possible textual positions of the operation established in a classified manner do not represent the actual sequence of the process, and the operation can be actually carried out after any step, and even carried out synchronously with other steps has no influence on the scheme of the invention.
Data preprocessing: acquiring sample data of a cervical image (wherein the sample data can be original data of the image or data subjected to preprocessing screening, and when the original data contains content interfering with subsequent processing of the data, preprocessing screening is performed on the original data, and the preprocessing screening operation at least comprises directly deleting images with factors such as over-brightness, over-darkness, blurring and medical instruments and sundries in image visual fields);
and (3) dividing: in the preprocessed data, segmenting the image data to obtain target image data;
data enhancement: classifying the target image data, confirming the difference between various target image data, and implementing enhancement processing aiming at the difference;
and (3) equalization processing: aiming at the total amount difference between various target image data, supplementing a few types of samples by adopting data fitting to realize the total amount balance between various target image data;
and (3) data set construction: aiming at various target image data after equalization processing, respectively and randomly dividing the target image data into a training data set, a verification data set and a test data set in proportion;
constructing a model: based on the training data set and/or the verification data set and/or the test data set, the data set is mapped to the comparison data set to obtain the corresponding classification of the sample data.
The invention discloses an improvement of a sample data processing method of a cervical image, in the segmentation operation, the segmentation of the image data is carried out by adopting an Ostu algorithm:
setting image data comprising foreground pixel data and background pixel data, and calculating to obtain a threshold value for distinguishing the foreground pixel data from the background pixel data;
the image data is divided into foreground pixel data and background pixel data by the threshold.
The invention discloses an improvement of the sample data processing method of the cervical image, wherein the segmentation operation further comprises the step of performing morphological operation on the divided and obtained foreground pixel data and/or background pixel data respectively so as to obtain target image data.
The invention discloses an improvement of the sample data processing method of cervical images, wherein the morphological operation at least comprises any one of addition, filling, deletion and segmentation.
The invention discloses an improvement of a sample data processing method of a cervical image, in the balanced processing operation, the processing of a few types of sample data in target data is carried out by adopting an SMOTE algorithm:
analyzing a few types of sample data;
and synthesizing a new sample according to the minority sample, and adding the synthesized new sample into the original minority sample to form a new minority sample set until the minority sample set realizes total balance among various types of target image data.
The invention discloses an improvement of a sample data processing method of a cervical image, in the balanced processing operation, the SMOTE algorithm adopted for processing a few types of sample data in target data is as follows:
step 1: for each sample x in the minority class samples, calculating the sample x to a minority class sample set S by using Euclidean distance as a standardminObtaining k neighbors of the samples according to the distances of all the samples;
step 2: setting a sampling ratio according to the sample imbalance ratio to determine a sampling multiplying factor N, and randomly selecting a plurality of samples from k neighbors of each sample x of a minority class, wherein the selected neighbors are assumed to be xn;
Step 3: for each randomly selected neighbor xnAnd respectively constructing new samples according to the following formulas with the original samples:
xnew=xn+rand(0,1)|x-xnwhere rand (0,1) refers to a random value in the range of 0 to 1.
The invention discloses an improvement of a sample data processing method of cervical images, wherein the data preprocessing operation at least comprises the step of deleting over-bright image data and/or over-dark image data and/or blurred image data and/or image data containing sundries in the sample data.
The invention discloses an improvement of the sample data processing method of cervical images, and the deleting operation comprises a complete deleting operation executed on each image or a partial deleting operation executed on a target area after each image is segmented.
The invention discloses an improvement of a sample data processing method of cervical images, wherein in the data set construction operation, the division ratio of a training data set, a verification data set and a test data set of each type of target image data is 80%, 15% and 5%.
The invention discloses an improvement of a sample data processing method of cervical images, wherein a training data set, a verification data set and a test data set of each type of target image data are constructed by randomly dividing according to a proportion.
In general, the method comprises the steps of firstly, carrying out data cleaning on a cervical image acquired through an electronic colposcope, and deleting images which are fuzzy in shooting, too dark and have foreign matters in key visual fields of the images; then, under the guidance of a professional doctor and relevant experts, classifying (labeling) the image data by combining with an electronic medical record; preprocessing operations such as image noise reduction, image segmentation, image enhancement and the like are completed on the image data to obtain a region of interest (ROI) image; aiming at the phenomenon that the number of samples in different classes is greatly different, an SMOTE algorithm is utilized to artificially synthesize a few classes of samples; carrying out data enhancement to further expand the number of samples; and (3) sending the final image data into a deep learning model for training, adjusting the classification performance of the model by adjusting parameters and introducing transfer learning, and finally achieving the effect of auxiliary diagnosis.
The existing auxiliary diagnosis system mainly has the following problems: firstly, the training data volume is small, and the effective labeling sample volume is small, so that the training and testing samples selected by the method are obtained by carefully screening and confirming professional doctors and related experts, and each acquired image data is labeled as a corresponding disease label, so that the accuracy of model training and diagnosis is ensured; aiming at the problem of small sample size, the invention expands the sample size by utilizing data enhancement technologies such as center cutting, up-down turning, left-right turning, brightness adjustment and the like on the basis of manually synthesizing a small number of samples by adopting an SMOTE algorithm, so that the final sample size reaches a satisfactory level; the other is that the currently applied auxiliary diagnostic system for cervical lesion detection for electronic colposcopic images, either uses a traditional machine learning algorithm on a classification model, needs to perform tedious feature extraction and feature screening, consumes time, has performance reaching a bottleneck, and cannot improve accuracy well, or uses a new deep learning model to directly train an image subjected to image noise reduction and enhancement, and this method cannot make a computer perform key learning on the features of a key lesion region, and can improve the learning effect to a certain extent.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of an embodiment of a method for processing sample data of a cervical image according to the present invention;
fig. 2 is a schematic diagram of a segmentation operation of an embodiment of the cervical image sample data processing method of the present invention;
fig. 3 is a schematic diagram of model construction of an embodiment of the method for processing sample data of a cervical image according to the present invention.
Detailed Description
The present invention will be described in detail below with reference to embodiments shown in the drawings. The embodiments are not intended to limit the present invention, and structural, methodological, or functional changes made by those skilled in the art according to the embodiments are included in the scope of the present invention.
The system implementation flow of the present invention can be described as shown in fig. 1. It mainly comprises the following parts:
and an image preprocessing part which comprises image data cleaning, image noise reduction, image segmentation (ROI extraction) and image enhancement.
And a sample amplification part which comprises a SMOTE algorithm to artificially synthesize a new minority of samples and data enhancement.
And (4) constructing a classification model part, sending the image data into a CNN model for training, and introducing. And the training precision is further improved by the transfer learning.
The detailed process of the invention is explained in detail as follows:
the method comprises the following steps of firstly, carrying out data cleaning on a cervical data set collected by an electronic colposcope and stored in a workstation, and directly deleting images of over-bright images, over-dark images, blurred images and images with medical instruments and sundries in image visual fields.
And secondly, dividing the data set into six types of normal, inflammation, CINI, CINIII and canceration according to the opinion of a professional doctor and the electronic medical record.
And thirdly, the image collected by the electronic colposcope is not only limited by the conditions of hardware facilities in the processes of shooting, compressing, transmitting and storing, but also influenced by various objective factors of the external environment, so that more noise information is often mixed in the image. The noise in the image not only affects the sensory effect of the physician, but also affects the extraction of image features by the CAD system, and affects the identification and diagnosis performance of the system. In the field of medical image processing, invalid signals in an image that can affect image feature extraction can be referred to as noise. After experimental comparison, the noise in the colposcope image is filtered by selecting a median filtering mode.
Median filtering: the median filtering method is a non-linear smoothing technique, and sets the gray value of each pixel point as the median of all the gray values of the pixel points in a certain neighborhood window of the point.
The median filtering is a nonlinear signal processing technology which is based on the ordering statistical theory and can effectively inhibit noise, and the basic principle of the median filtering is to replace the value of one point in a digital image or a digital sequence by the median of all point values in a neighborhood of the point, so that the surrounding pixel values are close to the true values, and isolated noise points are eliminated. The method is to sort the pixels in the plate according to the size of the pixel value by using a two-dimensional sliding template with a certain structure, and generate a monotonously ascending (or descending) two-dimensional data sequence. The two-dimensional median filter output is g (x, y) ═ med { f (x-k, y-l), (k, l ∈ W) }, where f (x, y), g (x, y) are the original image and the processed image, respectively. W is a two-dimensional template, typically 3 × 3, 5 × 5 regions, and may also be of different shapes, such as lines, circles, crosses, circles, and the like.
For example, the screening of data is realized by the following method:
1: ordering by taking an odd number of data from a sampling window in the image
2: and replacing the data to be processed by the sorted median value.
Fourthly, in the cervical disease diagnosis process, the doctor only needs to observe and analyze the region of interest, so that image segmentation is necessary to obtain image information of the region of interest. The invention adopts an Ostu algorithm, assumes that an image comprises two types of pixels (foreground pixels and background pixels), the histogram is a bimodal histogram, and then calculates an optimal threshold value (intra-class variance) for separating the two types of pixels or the equivalent inter-class variance is the maximum, thereby realizing the distinguishing of the ROI and the background irrelevant area and acquiring the image of the cervix interested area by adding morphological operations such as filling, deleting, segmenting and the like. The image segmentation results are shown in fig. 2.
Ostu algorithm:
for an image I (x, y), a segmentation threshold value of a foreground (namely a target) and a background is marked as T, the proportion of the number of pixels belonging to the foreground in the whole image is marked as omega 0, and the average gray level of the pixel number is mu 0; the ratio of the number of background pixels to the whole image is ω 1, and the average gray level is μ 1. The total mean gray level of the image is denoted as μ and the inter-class variance is denoted as g.
Assuming that the background of the image is dark and the size of the image is M × N, the number of pixels in the image with the gray scale value smaller than the threshold T is denoted as N0, and the number of pixels with the gray scale value larger than the threshold T is denoted as N1, there are:
(1)ω0=N0/(M×N)
(2)ω1=N1/(M×N)
(3)N0+N1=M×N
(4)ω0+ω1=1
(5)μ=ω0*μ0+ω1*μ1
(6)g=ω0*(μ0-μ)2+ω1*(μ1-μ)2
substituting formula (5) for formula (6) yields the equivalent formula:
(7)g=ω0*ω1*(μ0-μ1)2
and obtaining a threshold T which enables the inter-class variance g to be maximum by adopting a traversal method. At this time, the image with the gray value smaller than T is the foreground, and the part with the gray value larger than T is the background.
Fifthly, the image enhancement in the CAD system is to highlight the effective information in the image, so that the effective information is properly amplified, and the difference between the features of different types of pictures is also amplified, so that the CAD system can more accurately identify the difference between the images. The key to identifying a cervical image is the transformation zone near the cervical os. In a cervical image taken by colposcope, the area should be rich in detailed texture, while the area is usually in a higher gray scale area in an image taken by colposcope, so the enhancement of the cervical image should highlight the high gray scale part of the cervical os and compress the "highlight" part far away from the cervical os. The invention adopts gamma correction to enhance the images, so that the characteristic difference between the cervical images is more obvious, the difference in numerical value can be amplified, and the classification and identification work can be more easily distinguished.
gamma correction is mainly used for correcting images, and corrects pictures with excessively high gray levels (transition exposure) or excessively low gray levels (underexposure), so as to enhance contrast. The transformation formula is to perform product operation on each pixel value on the original image:
s=c·rγ
when the gamma value is less than 1, stretching the area with lower gray level in the image and compressing the part with higher gray level; when the gamma value is greater than 1, a region of the image having a higher gray level is stretched while a portion having a lower gray level is compressed. Therefore, the effect of enhancing details of low gray or high gray parts can be achieved by adjusting different gamma values. The gamma transformation has obvious enhancement effect on the colposcopic image with low image contrast and high overall brightness value.
And sixthly, in order to solve the difference of samples existing among different cervical lesions and prevent precision loss caused by unbalanced samples, the method introduces a SMOTE algorithm, analyzes a few types of samples, artificially synthesizes new samples according to the few types of samples and adds the new samples into a data set, thereby solving the problem of unbalanced six categories. In the data set acquired by the method, because the inflammation images are the most and the rest images are fewer, the number of samples of the rest categories is consistent with that of the inflammation images after being processed by the SMOTE algorithm.
The SMOTE algorithm is called Synthetic minimum optimization Technique, i.e. a Technique for synthesizing a few classes of Oversampling, and is an improved scheme based on a random Oversampling algorithm. Since random oversampling takes the strategy of simply copying samples to add a few classes of samples, it is easy to create the problem of model overfitting, i.e. to make the information learned by the model too Specific (Specific) to generalize (General). The basic idea of the SMOTE algorithm is to analyze a few types of samples and artificially synthesize new samples according to the few types of samples to add the new samples into a data set, and the specific algorithm flow is as follows:
step 1: for each sample x in the minority class, calculating a sample set S from the sample x to the minority class by using Euclidean distance as a standardminThe k neighbors of the distance between all samples are obtained.
Step 2: setting a sampling ratio to determine a sampling rate N based on the sample imbalance ratio, forRandomly selecting a plurality of samples from k neighbors of each sample x of the minority class, and assuming that the selected neighbors are xn。
Step 3: for each randomly selected neighbor xnAnd respectively constructing new samples according to the following formulas with the original samples:
xnew=xn+rand(0,1)|x-xn|
and seventhly, randomly dividing each class of image into a training set, a verification set and a test set according to the proportion of 80%, 15% and 5% respectively for the balanced data set. Because the sample data size is small and may affect the generalization performance of the classification diagnosis model, data enhancement work needs to be performed on the training set and the verification set test set respectively. The invention adopts various data enhancement operations such as center clipping, up-down turning, left-right turning, brightness and chromaticity change and the like.
And eighthly, constructing a CNN classification model. The invention selects a VGG19 network structure, further modifies a model structure, deletes all original full connection layers and adds three new connection layers at the end of the network, wherein the first two layers are full connection layers, local features extracted from the convolution layers are assembled into a complete graph again through a weight matrix, and the third layer is a softmax function activation layer and is also the last output layer of the model, thereby realizing six types of differentiation of cervical lesions. And on the basis, transfer learning is introduced, and the final cervical lesion screening auxiliary diagnosis system is realized by freezing and unfreezing partial convolutional layers and fine-tuning network parameters.
The CNN model is composed of an input layer, an output layer, a hidden layer, and weights (parameters) connecting the layers. Each layer of network has multiple neurons, the neuron in the upper layer is mapped to the neuron in the next layer by an activation function, and each neuron has a corresponding weight, and the output is the classification category of the neurons. The CNN model has the following advantages: the method has the advantages that firstly, a parameter sharing mechanism is provided, in the convolutional neural network, the parameters are kernel values, and the values are the same for all regions, so that the number of the parameters can be small, and overfitting can be effectively prevented. Secondly, CNN has the advantage of a sparse connection mechanism, in the output of the convolutional network, the output of each "small lattice" is only related to the input image and its corresponding part, while the other parts are unrelated, and the computation amount is small. Finally, CNNs have the property of advanced feature extraction, with more advanced features being extracted continuously by using a process of convolutional pooling.
Of course, the VGG19 deep neural network is selected in the present embodiment, and actually, other CNN networks, such as VGG16, inclusion net, ResNet, etc., may be adopted in the deep model.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.
Claims (10)
1. A sample data processing method of a cervical image comprises the following steps:
establishing classification: establishing a contrast data set, and acquiring a classification standard for cervical image characteristics on the basis of the contrast data set;
data preprocessing: acquiring and denoising sample data of a cervical image;
and (3) dividing: in the preprocessed data, segmenting the image data to obtain target image data;
data enhancement: classifying the target image data, confirming the difference between various target image data, and implementing enhancement processing aiming at the difference;
and (3) equalization processing: aiming at the total amount difference between various target image data, supplementing a few types of samples by adopting data fitting to realize the total amount balance between various target image data;
and (3) data set construction: aiming at various target image data after equalization processing, respectively and randomly dividing the target image data into a training data set, a verification data set and a test data set in proportion;
constructing a model: based on the training data set and/or the verification data set and/or the test data set, the data set is mapped to the comparison data set to obtain the corresponding classification of the sample data.
2. The method for processing sample data of a cervical image according to claim 1, wherein in the segmentation operation, the segmentation of the image data is performed by using an Ostu algorithm:
setting image data comprising foreground pixel data and background pixel data, and calculating to obtain a threshold value for distinguishing the foreground pixel data from the background pixel data;
the image data is divided into foreground pixel data and background pixel data by the threshold.
3. The method for processing the sample data of the cervical image according to claim 2, wherein the segmentation operation further includes performing a morphological operation on each of the foreground pixel data and/or the background pixel data obtained by the division, so as to obtain the target image data.
4. The method of claim 3, wherein said morphological operation includes at least any of adding, padding, deleting and segmenting.
5. The method for processing the sample data of the cervical image according to claim 1, wherein in the equalizing operation, the small number of sample data in the target data is processed by using SMOTE algorithm:
analyzing a few types of sample data;
and synthesizing a new sample according to the minority sample, and adding the synthesized new sample into the original minority sample to form a new minority sample set until the minority sample set realizes total balance among various types of target image data.
6. The method for processing the sample data of the cervical image according to claim 5, wherein in the equalizing operation, the SMOTE algorithm is adopted to process a few types of sample data in the target data as follows:
step 1: for each sample x in the minority class samples, calculating the sample x to a minority class sample set S by using Euclidean distance as a standardminObtaining k neighbors of the samples according to the distances of all the samples;
step 2: setting a sampling ratio according to the sample imbalance ratio to determine a sampling multiplying factor N, and randomly selecting a plurality of samples from k neighbors of each sample x of a minority class, wherein the selected neighbors are assumed to be xn;
Step 3: for each randomly selected neighbor xnAnd respectively constructing new samples according to the following formulas with the original samples:
xnew=xn+rand(0,1)|x-xnwhere rand (0,1) refers to a random value in the range of 0 to 1.
7. The method for processing the sample data of the cervical image according to claim 1, wherein the data preprocessing operation at least includes a deleting operation performed on the image data with too bright and/or the image data with too dark and/or the image data with blur and/or the image data with impurities in the sample data.
8. The method for processing sample data of a cervical image according to claim 7, wherein the deletion operation includes a complete deletion operation performed on each image or a partial deletion operation performed on a target region after segmenting each image.
9. The method for processing the sample data of the cervical image according to claim 1, wherein in the data set constructing operation, the division ratio of the training data set, the verification data set and the testing data set of each target image data is 80%, 15% or 5%.
10. The method for processing the sample data of the cervical image according to claim 9, wherein the training data set, the verification data set, and the testing data set of each of the target image data of the respective types are constructed by randomly dividing according to a scale.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911125170.1A CN110852396A (en) | 2019-11-15 | 2019-11-15 | Sample data processing method for cervical image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911125170.1A CN110852396A (en) | 2019-11-15 | 2019-11-15 | Sample data processing method for cervical image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110852396A true CN110852396A (en) | 2020-02-28 |
Family
ID=69600588
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911125170.1A Pending CN110852396A (en) | 2019-11-15 | 2019-11-15 | Sample data processing method for cervical image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110852396A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111640097A (en) * | 2020-05-26 | 2020-09-08 | 上海鹰瞳医疗科技有限公司 | Skin mirror image identification method and equipment |
CN111666872A (en) * | 2020-06-04 | 2020-09-15 | 电子科技大学 | Efficient behavior identification method under data imbalance |
CN111723856A (en) * | 2020-06-11 | 2020-09-29 | 广东浪潮大数据研究有限公司 | Image data processing method, device and equipment and readable storage medium |
CN111784593A (en) * | 2020-06-04 | 2020-10-16 | 广东省智能制造研究所 | Lung nodule CT image data enhancement method and system for deep learning |
CN111863118A (en) * | 2020-07-20 | 2020-10-30 | 湖南莱博赛医用机器人有限公司 | Method for carrying out TCT and DNA ploidy analysis based on TCT film-making |
CN112241715A (en) * | 2020-10-23 | 2021-01-19 | 北京百度网讯科技有限公司 | Model training method, expression recognition method, device, equipment and storage medium |
CN112861734A (en) * | 2021-02-10 | 2021-05-28 | 北京农业信息技术研究中心 | Trough food residue monitoring method and system |
CN113052865A (en) * | 2021-04-16 | 2021-06-29 | 南通大学 | Power transmission line small sample temperature image amplification method based on image similarity |
CN113139944A (en) * | 2021-04-25 | 2021-07-20 | 山东大学齐鲁医院 | Deep learning-based colposcopic image classification computer-aided diagnosis system and method |
CN113268623A (en) * | 2021-06-01 | 2021-08-17 | 上海市第一人民医院 | Artificial intelligence gastroscope image recognition processing system |
WO2022121032A1 (en) * | 2020-12-10 | 2022-06-16 | 广州广电运通金融电子股份有限公司 | Data set division method and system in federated learning scene |
TWI779284B (en) * | 2020-05-06 | 2022-10-01 | 商之器科技股份有限公司 | Device for marking image data |
CN117541482A (en) * | 2024-01-10 | 2024-02-09 | 中国人民解放军空军军医大学 | Cervical image enhancement system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102495901A (en) * | 2011-12-16 | 2012-06-13 | 山东师范大学 | Method for keeping balance of implementation class data through local mean |
CN105574859A (en) * | 2015-12-14 | 2016-05-11 | 中国科学院深圳先进技术研究院 | Liver tumor segmentation method and device based on CT (Computed Tomography) image |
CN109410196A (en) * | 2018-10-24 | 2019-03-01 | 东北大学 | Cervical cancer tissues pathological image diagnostic method based on Poisson annular condition random field |
CN109961838A (en) * | 2019-03-04 | 2019-07-02 | 浙江工业大学 | A kind of ultrasonic image chronic kidney disease auxiliary screening method based on deep learning |
-
2019
- 2019-11-15 CN CN201911125170.1A patent/CN110852396A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102495901A (en) * | 2011-12-16 | 2012-06-13 | 山东师范大学 | Method for keeping balance of implementation class data through local mean |
CN105574859A (en) * | 2015-12-14 | 2016-05-11 | 中国科学院深圳先进技术研究院 | Liver tumor segmentation method and device based on CT (Computed Tomography) image |
CN109410196A (en) * | 2018-10-24 | 2019-03-01 | 东北大学 | Cervical cancer tissues pathological image diagnostic method based on Poisson annular condition random field |
CN109961838A (en) * | 2019-03-04 | 2019-07-02 | 浙江工业大学 | A kind of ultrasonic image chronic kidney disease auxiliary screening method based on deep learning |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI779284B (en) * | 2020-05-06 | 2022-10-01 | 商之器科技股份有限公司 | Device for marking image data |
CN111640097A (en) * | 2020-05-26 | 2020-09-08 | 上海鹰瞳医疗科技有限公司 | Skin mirror image identification method and equipment |
CN111640097B (en) * | 2020-05-26 | 2023-10-17 | 上海鹰瞳医疗科技有限公司 | Dermatological image recognition method and dermatological image recognition equipment |
CN111666872B (en) * | 2020-06-04 | 2022-08-05 | 电子科技大学 | Efficient behavior identification method under data imbalance |
CN111666872A (en) * | 2020-06-04 | 2020-09-15 | 电子科技大学 | Efficient behavior identification method under data imbalance |
CN111784593A (en) * | 2020-06-04 | 2020-10-16 | 广东省智能制造研究所 | Lung nodule CT image data enhancement method and system for deep learning |
CN111723856A (en) * | 2020-06-11 | 2020-09-29 | 广东浪潮大数据研究有限公司 | Image data processing method, device and equipment and readable storage medium |
CN111723856B (en) * | 2020-06-11 | 2023-06-09 | 广东浪潮大数据研究有限公司 | Image data processing method, device, equipment and readable storage medium |
CN111863118A (en) * | 2020-07-20 | 2020-10-30 | 湖南莱博赛医用机器人有限公司 | Method for carrying out TCT and DNA ploidy analysis based on TCT film-making |
CN111863118B (en) * | 2020-07-20 | 2023-09-05 | 湖南莱博赛医用机器人有限公司 | TCT and DNA ploidy analysis method based on TCT flaking |
CN112241715A (en) * | 2020-10-23 | 2021-01-19 | 北京百度网讯科技有限公司 | Model training method, expression recognition method, device, equipment and storage medium |
WO2022121032A1 (en) * | 2020-12-10 | 2022-06-16 | 广州广电运通金融电子股份有限公司 | Data set division method and system in federated learning scene |
CN112861734A (en) * | 2021-02-10 | 2021-05-28 | 北京农业信息技术研究中心 | Trough food residue monitoring method and system |
CN113052865A (en) * | 2021-04-16 | 2021-06-29 | 南通大学 | Power transmission line small sample temperature image amplification method based on image similarity |
CN113052865B (en) * | 2021-04-16 | 2023-12-19 | 南通大学 | Power transmission line small sample temperature image amplification method based on image similarity |
CN113139944A (en) * | 2021-04-25 | 2021-07-20 | 山东大学齐鲁医院 | Deep learning-based colposcopic image classification computer-aided diagnosis system and method |
CN113268623B (en) * | 2021-06-01 | 2022-07-19 | 上海市第一人民医院 | Artificial intelligence gastroscope image identification processing system |
CN113268623A (en) * | 2021-06-01 | 2021-08-17 | 上海市第一人民医院 | Artificial intelligence gastroscope image recognition processing system |
CN117541482A (en) * | 2024-01-10 | 2024-02-09 | 中国人民解放军空军军医大学 | Cervical image enhancement system |
CN117541482B (en) * | 2024-01-10 | 2024-03-26 | 中国人民解放军空军军医大学 | Cervical image enhancement system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110852396A (en) | Sample data processing method for cervical image | |
Chen et al. | GasHis-Transformer: A multi-scale visual transformer approach for gastric histopathological image detection | |
CN111667489B (en) | Cancer hyperspectral image segmentation method and system based on double-branch attention deep learning | |
CN111401480B (en) | Novel mammary gland MRI automatic auxiliary diagnosis method based on fusion attention mechanism | |
CN106940816B (en) | CT image pulmonary nodule detection system based on 3D full convolution neural network | |
CN108052977B (en) | Mammary gland molybdenum target image deep learning classification method based on lightweight neural network | |
CN110852350B (en) | Pulmonary nodule benign and malignant classification method and system based on multi-scale migration learning | |
CN108416360B (en) | Cancer diagnosis system and method based on breast molybdenum target calcification features | |
Ashwin et al. | Efficient and reliable lung nodule detection using a neural network based computer aided diagnosis system | |
CN112132166B (en) | Intelligent analysis method, system and device for digital cell pathology image | |
CN110309329A (en) | The method of Weigh sensor and record alimentary canal tissue and foreign matter in endoscopy | |
CN113781489B (en) | Polyp image semantic segmentation method and device | |
CN109785320A (en) | It is a kind of to be classified to mammograms and known method for distinguishing using modified AlexNet model | |
CN114648806A (en) | Multi-mechanism self-adaptive fundus image segmentation method | |
CN113450305B (en) | Medical image processing method, system, equipment and readable storage medium | |
Akkar et al. | Diagnosis of lung cancer disease based on back-propagation artificial neural network algorithm | |
Khan et al. | A review of retinal vessel segmentation techniques and algorithms | |
CN115661029A (en) | Pulmonary nodule detection and identification system based on YOLOv5 | |
KR102407248B1 (en) | Deep Learning based Gastric Classification System using Data Augmentation and Image Segmentation | |
CN112419246B (en) | Depth detection network for quantifying esophageal mucosa IPCLs blood vessel morphological distribution | |
CN106097283A (en) | A kind of multiple dimensioned X-ray image Enhancement Method based on human visual system's characteristic | |
Bhuvaneswari et al. | Contrast enhancement of retinal images using green plan masking and whale optimization algorithm | |
CN117152179A (en) | Segmentation and classification method for realizing rectal cancer CT image based on U-Net and SENet | |
CN111062909A (en) | Method and equipment for judging benign and malignant breast tumor | |
Jagadeesh et al. | Brain Tumour Classification using CNN Algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200228 |
|
RJ01 | Rejection of invention patent application after publication |