CN110852396A - Sample data processing method for cervical image - Google Patents

Sample data processing method for cervical image Download PDF

Info

Publication number
CN110852396A
CN110852396A CN201911125170.1A CN201911125170A CN110852396A CN 110852396 A CN110852396 A CN 110852396A CN 201911125170 A CN201911125170 A CN 201911125170A CN 110852396 A CN110852396 A CN 110852396A
Authority
CN
China
Prior art keywords
data
sample
image
data set
cervical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911125170.1A
Other languages
Chinese (zh)
Inventor
李凌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Zhongkehuaying Health Technology Co Ltd
Original Assignee
Suzhou Zhongkehuaying Health Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Zhongkehuaying Health Technology Co Ltd filed Critical Suzhou Zhongkehuaying Health Technology Co Ltd
Priority to CN201911125170.1A priority Critical patent/CN110852396A/en
Publication of CN110852396A publication Critical patent/CN110852396A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24143Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • G06T2207/20032Median filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Quality & Reliability (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a sample data processing method of a cervical image, which comprises the following steps: establishing classification; preprocessing data; dividing; data enhancement: classifying the target image data, confirming the difference between various target image data, and implementing enhancement processing aiming at the difference; and (3) equalization processing: aiming at the total amount difference between various target image data, supplementing a few types of samples by adopting data fitting to realize the total amount balance between various target image data; and (3) data set construction: aiming at various target image data after equalization processing, respectively and randomly dividing the target image data into a training data set, a verification data set and a test data set in proportion; constructing a model: based on the training data set and/or the verification data set and/or the test data set, the data set is mapped to the comparison data set to obtain the corresponding classification of the sample data. The method and the device improve and solve the problem of data imbalance in cervical image data classification, improve the precision and efficiency of image classification, and improve the effect and quality of auxiliary diagnosis.

Description

Sample data processing method for cervical image
Technical Field
The invention belongs to a computer-aided application method in the medical field, and particularly relates to a sample data processing method of a cervical image.
Background
The cervical region lesion has definite reasons, so that clinical prevention can be realized. Thus, the high mortality rate of cervical cancer can be alleviated to a large extent. However, as China is still in developing countries and population density is high, HPV prevention vaccines are still difficult to popularize comprehensively. Therefore, screening for early cervical lesions is still the main measure and method for preventing and treating cervical related diseases. At present, the main methods for screening cervical lesions in various hospitals are Pap smear (Pap test), liquid-based cytology (TCT), HPV-DNA detection, electronic colposcopy and histopathological detection. However, the mainstream precancerous lesion screening methods still have respective defects, so that the purpose of diagnosis confirmation can be achieved by combining multiple diagnosis methods in some cases, and the accuracy of precancerous screening needs to be improved to a certain extent. In addition, the final determination of the disease condition requires the physician to carefully observe and analyze the lesion area or the lesion image to draw a conclusion, which imposes a high requirement on the expertise of the physician. For the condition that a doctor needs to make a diagnosis by combining medical images, the doctor is easy to fatigue due to long-time reading and image observation, and the accuracy of disease diagnosis is further influenced. The realization of a novel cervical lesion screening auxiliary diagnosis system is particularly necessary by combining the current situation of the traditional medical field, the characteristics of high cervical cancer incidence rate and strong lethality and the rapid development of current artificial intelligence and machine learning technologies.
At present, computer-aided diagnosis (CAD) systems such as gastric cancer, skin cancer, digestive tract cancer, intestinal cancer and the like which need to combine a medical endoscope and a dermatoscope have been vigorously developed, the realization of the CAD systems needs to use a large number of color images acquired by medical equipment, after preprocessing operations such as image filtering, image enhancement, image segmentation and the like are performed on image data, valuable image features are selected by using feature extraction and feature screening, the selected features are sent into a machine learning classification model for training, and finally, the CAD system with better effect is obtained by adjusting model parameters. However, the existing auxiliary diagnosis systems for cervical region lesions are relatively few, and the traditional cancer tumor classification diagnosis is mostly two classifications. In addition, the current situation that the related performance of the traditional machine learning algorithm on pathological classification reaches a certain bottleneck is achieved.
Disclosure of Invention
As the gynecological malignant tumor disease with the only definite etiology, the cervical cancer has the characteristics of high clinical morbidity and mortality. Therefore, a clear direction is provided for clinical diagnosis and treatment, the cure rate of patients can be greatly improved, and the death rate is reduced. The invention aims to design a cervical lesion screening auxiliary diagnosis system based on deep learning, which can relieve the diagnosis pressure of doctors and greatly improve the accuracy of disease diagnosis. The invention firstly uses the electronic colposcope to collect color images of cervical regions from different patients, and obtains usable patient image data through data cleaning; then, carrying out a series of preprocessing operations such as image filtering, image segmentation (ROI extraction), image enhancement and the like on the acquired image; aiming at the problems of unbalanced data and small data quantity of the lesion to be classified, carrying out balancing treatment by means of SMOTE algorithm, data enhancement and the like; and (3) sending the processed image data into a deep learning model for learning and training to finally obtain six classification results of normal, inflammation, cervical intraepithelial neoplasia I (CIN I), cervical intraepithelial neoplasia II (CIN II), cervical intraepithelial neoplasia III (CIN III) and canceration. The method plays a good auxiliary role in the definite diagnosis of the cervical region lesion.
In order to achieve the above object, the present invention discloses a method for processing sample data of a cervical image, comprising the following steps:
establishing classification: establishing a contrast data set, and acquiring a classification standard for cervical image characteristics on the basis of the contrast data set; the possible textual positions of the operation established in a classified manner do not represent the actual sequence of the process, and the operation can be actually carried out after any step, and even carried out synchronously with other steps has no influence on the scheme of the invention.
Data preprocessing: acquiring sample data of a cervical image (wherein the sample data can be original data of the image or data subjected to preprocessing screening, and when the original data contains content interfering with subsequent processing of the data, preprocessing screening is performed on the original data, and the preprocessing screening operation at least comprises directly deleting images with factors such as over-brightness, over-darkness, blurring and medical instruments and sundries in image visual fields);
and (3) dividing: in the preprocessed data, segmenting the image data to obtain target image data;
data enhancement: classifying the target image data, confirming the difference between various target image data, and implementing enhancement processing aiming at the difference;
and (3) equalization processing: aiming at the total amount difference between various target image data, supplementing a few types of samples by adopting data fitting to realize the total amount balance between various target image data;
and (3) data set construction: aiming at various target image data after equalization processing, respectively and randomly dividing the target image data into a training data set, a verification data set and a test data set in proportion;
constructing a model: based on the training data set and/or the verification data set and/or the test data set, the data set is mapped to the comparison data set to obtain the corresponding classification of the sample data.
The invention discloses an improvement of a sample data processing method of a cervical image, in the segmentation operation, the segmentation of the image data is carried out by adopting an Ostu algorithm:
setting image data comprising foreground pixel data and background pixel data, and calculating to obtain a threshold value for distinguishing the foreground pixel data from the background pixel data;
the image data is divided into foreground pixel data and background pixel data by the threshold.
The invention discloses an improvement of the sample data processing method of the cervical image, wherein the segmentation operation further comprises the step of performing morphological operation on the divided and obtained foreground pixel data and/or background pixel data respectively so as to obtain target image data.
The invention discloses an improvement of the sample data processing method of cervical images, wherein the morphological operation at least comprises any one of addition, filling, deletion and segmentation.
The invention discloses an improvement of a sample data processing method of a cervical image, in the balanced processing operation, the processing of a few types of sample data in target data is carried out by adopting an SMOTE algorithm:
analyzing a few types of sample data;
and synthesizing a new sample according to the minority sample, and adding the synthesized new sample into the original minority sample to form a new minority sample set until the minority sample set realizes total balance among various types of target image data.
The invention discloses an improvement of a sample data processing method of a cervical image, in the balanced processing operation, the SMOTE algorithm adopted for processing a few types of sample data in target data is as follows:
step 1: for each sample x in the minority class samples, calculating the sample x to a minority class sample set S by using Euclidean distance as a standardminObtaining k neighbors of the samples according to the distances of all the samples;
step 2: setting a sampling ratio according to the sample imbalance ratio to determine a sampling multiplying factor N, and randomly selecting a plurality of samples from k neighbors of each sample x of a minority class, wherein the selected neighbors are assumed to be xn
Step 3: for each randomly selected neighbor xnAnd respectively constructing new samples according to the following formulas with the original samples:
xnew=xn+rand(0,1)|x-xnwhere rand (0,1) refers to a random value in the range of 0 to 1.
The invention discloses an improvement of a sample data processing method of cervical images, wherein the data preprocessing operation at least comprises the step of deleting over-bright image data and/or over-dark image data and/or blurred image data and/or image data containing sundries in the sample data.
The invention discloses an improvement of the sample data processing method of cervical images, and the deleting operation comprises a complete deleting operation executed on each image or a partial deleting operation executed on a target area after each image is segmented.
The invention discloses an improvement of a sample data processing method of cervical images, wherein in the data set construction operation, the division ratio of a training data set, a verification data set and a test data set of each type of target image data is 80%, 15% and 5%.
The invention discloses an improvement of a sample data processing method of cervical images, wherein a training data set, a verification data set and a test data set of each type of target image data are constructed by randomly dividing according to a proportion.
In general, the method comprises the steps of firstly, carrying out data cleaning on a cervical image acquired through an electronic colposcope, and deleting images which are fuzzy in shooting, too dark and have foreign matters in key visual fields of the images; then, under the guidance of a professional doctor and relevant experts, classifying (labeling) the image data by combining with an electronic medical record; preprocessing operations such as image noise reduction, image segmentation, image enhancement and the like are completed on the image data to obtain a region of interest (ROI) image; aiming at the phenomenon that the number of samples in different classes is greatly different, an SMOTE algorithm is utilized to artificially synthesize a few classes of samples; carrying out data enhancement to further expand the number of samples; and (3) sending the final image data into a deep learning model for training, adjusting the classification performance of the model by adjusting parameters and introducing transfer learning, and finally achieving the effect of auxiliary diagnosis.
The existing auxiliary diagnosis system mainly has the following problems: firstly, the training data volume is small, and the effective labeling sample volume is small, so that the training and testing samples selected by the method are obtained by carefully screening and confirming professional doctors and related experts, and each acquired image data is labeled as a corresponding disease label, so that the accuracy of model training and diagnosis is ensured; aiming at the problem of small sample size, the invention expands the sample size by utilizing data enhancement technologies such as center cutting, up-down turning, left-right turning, brightness adjustment and the like on the basis of manually synthesizing a small number of samples by adopting an SMOTE algorithm, so that the final sample size reaches a satisfactory level; the other is that the currently applied auxiliary diagnostic system for cervical lesion detection for electronic colposcopic images, either uses a traditional machine learning algorithm on a classification model, needs to perform tedious feature extraction and feature screening, consumes time, has performance reaching a bottleneck, and cannot improve accuracy well, or uses a new deep learning model to directly train an image subjected to image noise reduction and enhancement, and this method cannot make a computer perform key learning on the features of a key lesion region, and can improve the learning effect to a certain extent.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of an embodiment of a method for processing sample data of a cervical image according to the present invention;
fig. 2 is a schematic diagram of a segmentation operation of an embodiment of the cervical image sample data processing method of the present invention;
fig. 3 is a schematic diagram of model construction of an embodiment of the method for processing sample data of a cervical image according to the present invention.
Detailed Description
The present invention will be described in detail below with reference to embodiments shown in the drawings. The embodiments are not intended to limit the present invention, and structural, methodological, or functional changes made by those skilled in the art according to the embodiments are included in the scope of the present invention.
The system implementation flow of the present invention can be described as shown in fig. 1. It mainly comprises the following parts:
and an image preprocessing part which comprises image data cleaning, image noise reduction, image segmentation (ROI extraction) and image enhancement.
And a sample amplification part which comprises a SMOTE algorithm to artificially synthesize a new minority of samples and data enhancement.
And (4) constructing a classification model part, sending the image data into a CNN model for training, and introducing. And the training precision is further improved by the transfer learning.
The detailed process of the invention is explained in detail as follows:
the method comprises the following steps of firstly, carrying out data cleaning on a cervical data set collected by an electronic colposcope and stored in a workstation, and directly deleting images of over-bright images, over-dark images, blurred images and images with medical instruments and sundries in image visual fields.
And secondly, dividing the data set into six types of normal, inflammation, CINI, CINIII and canceration according to the opinion of a professional doctor and the electronic medical record.
And thirdly, the image collected by the electronic colposcope is not only limited by the conditions of hardware facilities in the processes of shooting, compressing, transmitting and storing, but also influenced by various objective factors of the external environment, so that more noise information is often mixed in the image. The noise in the image not only affects the sensory effect of the physician, but also affects the extraction of image features by the CAD system, and affects the identification and diagnosis performance of the system. In the field of medical image processing, invalid signals in an image that can affect image feature extraction can be referred to as noise. After experimental comparison, the noise in the colposcope image is filtered by selecting a median filtering mode.
Median filtering: the median filtering method is a non-linear smoothing technique, and sets the gray value of each pixel point as the median of all the gray values of the pixel points in a certain neighborhood window of the point.
The median filtering is a nonlinear signal processing technology which is based on the ordering statistical theory and can effectively inhibit noise, and the basic principle of the median filtering is to replace the value of one point in a digital image or a digital sequence by the median of all point values in a neighborhood of the point, so that the surrounding pixel values are close to the true values, and isolated noise points are eliminated. The method is to sort the pixels in the plate according to the size of the pixel value by using a two-dimensional sliding template with a certain structure, and generate a monotonously ascending (or descending) two-dimensional data sequence. The two-dimensional median filter output is g (x, y) ═ med { f (x-k, y-l), (k, l ∈ W) }, where f (x, y), g (x, y) are the original image and the processed image, respectively. W is a two-dimensional template, typically 3 × 3, 5 × 5 regions, and may also be of different shapes, such as lines, circles, crosses, circles, and the like.
For example, the screening of data is realized by the following method:
1: ordering by taking an odd number of data from a sampling window in the image
2: and replacing the data to be processed by the sorted median value.
Fourthly, in the cervical disease diagnosis process, the doctor only needs to observe and analyze the region of interest, so that image segmentation is necessary to obtain image information of the region of interest. The invention adopts an Ostu algorithm, assumes that an image comprises two types of pixels (foreground pixels and background pixels), the histogram is a bimodal histogram, and then calculates an optimal threshold value (intra-class variance) for separating the two types of pixels or the equivalent inter-class variance is the maximum, thereby realizing the distinguishing of the ROI and the background irrelevant area and acquiring the image of the cervix interested area by adding morphological operations such as filling, deleting, segmenting and the like. The image segmentation results are shown in fig. 2.
Ostu algorithm:
for an image I (x, y), a segmentation threshold value of a foreground (namely a target) and a background is marked as T, the proportion of the number of pixels belonging to the foreground in the whole image is marked as omega 0, and the average gray level of the pixel number is mu 0; the ratio of the number of background pixels to the whole image is ω 1, and the average gray level is μ 1. The total mean gray level of the image is denoted as μ and the inter-class variance is denoted as g.
Assuming that the background of the image is dark and the size of the image is M × N, the number of pixels in the image with the gray scale value smaller than the threshold T is denoted as N0, and the number of pixels with the gray scale value larger than the threshold T is denoted as N1, there are:
(1)ω0=N0/(M×N)
(2)ω1=N1/(M×N)
(3)N0+N1=M×N
(4)ω0+ω1=1
(5)μ=ω0*μ0+ω1*μ1
(6)g=ω0*(μ0-μ)2+ω1*(μ1-μ)2
substituting formula (5) for formula (6) yields the equivalent formula:
(7)g=ω0*ω1*(μ0-μ1)2
and obtaining a threshold T which enables the inter-class variance g to be maximum by adopting a traversal method. At this time, the image with the gray value smaller than T is the foreground, and the part with the gray value larger than T is the background.
Fifthly, the image enhancement in the CAD system is to highlight the effective information in the image, so that the effective information is properly amplified, and the difference between the features of different types of pictures is also amplified, so that the CAD system can more accurately identify the difference between the images. The key to identifying a cervical image is the transformation zone near the cervical os. In a cervical image taken by colposcope, the area should be rich in detailed texture, while the area is usually in a higher gray scale area in an image taken by colposcope, so the enhancement of the cervical image should highlight the high gray scale part of the cervical os and compress the "highlight" part far away from the cervical os. The invention adopts gamma correction to enhance the images, so that the characteristic difference between the cervical images is more obvious, the difference in numerical value can be amplified, and the classification and identification work can be more easily distinguished.
gamma correction is mainly used for correcting images, and corrects pictures with excessively high gray levels (transition exposure) or excessively low gray levels (underexposure), so as to enhance contrast. The transformation formula is to perform product operation on each pixel value on the original image:
s=c·rγ
when the gamma value is less than 1, stretching the area with lower gray level in the image and compressing the part with higher gray level; when the gamma value is greater than 1, a region of the image having a higher gray level is stretched while a portion having a lower gray level is compressed. Therefore, the effect of enhancing details of low gray or high gray parts can be achieved by adjusting different gamma values. The gamma transformation has obvious enhancement effect on the colposcopic image with low image contrast and high overall brightness value.
And sixthly, in order to solve the difference of samples existing among different cervical lesions and prevent precision loss caused by unbalanced samples, the method introduces a SMOTE algorithm, analyzes a few types of samples, artificially synthesizes new samples according to the few types of samples and adds the new samples into a data set, thereby solving the problem of unbalanced six categories. In the data set acquired by the method, because the inflammation images are the most and the rest images are fewer, the number of samples of the rest categories is consistent with that of the inflammation images after being processed by the SMOTE algorithm.
The SMOTE algorithm is called Synthetic minimum optimization Technique, i.e. a Technique for synthesizing a few classes of Oversampling, and is an improved scheme based on a random Oversampling algorithm. Since random oversampling takes the strategy of simply copying samples to add a few classes of samples, it is easy to create the problem of model overfitting, i.e. to make the information learned by the model too Specific (Specific) to generalize (General). The basic idea of the SMOTE algorithm is to analyze a few types of samples and artificially synthesize new samples according to the few types of samples to add the new samples into a data set, and the specific algorithm flow is as follows:
step 1: for each sample x in the minority class, calculating a sample set S from the sample x to the minority class by using Euclidean distance as a standardminThe k neighbors of the distance between all samples are obtained.
Step 2: setting a sampling ratio to determine a sampling rate N based on the sample imbalance ratio, forRandomly selecting a plurality of samples from k neighbors of each sample x of the minority class, and assuming that the selected neighbors are xn
Step 3: for each randomly selected neighbor xnAnd respectively constructing new samples according to the following formulas with the original samples:
xnew=xn+rand(0,1)|x-xn|
and seventhly, randomly dividing each class of image into a training set, a verification set and a test set according to the proportion of 80%, 15% and 5% respectively for the balanced data set. Because the sample data size is small and may affect the generalization performance of the classification diagnosis model, data enhancement work needs to be performed on the training set and the verification set test set respectively. The invention adopts various data enhancement operations such as center clipping, up-down turning, left-right turning, brightness and chromaticity change and the like.
And eighthly, constructing a CNN classification model. The invention selects a VGG19 network structure, further modifies a model structure, deletes all original full connection layers and adds three new connection layers at the end of the network, wherein the first two layers are full connection layers, local features extracted from the convolution layers are assembled into a complete graph again through a weight matrix, and the third layer is a softmax function activation layer and is also the last output layer of the model, thereby realizing six types of differentiation of cervical lesions. And on the basis, transfer learning is introduced, and the final cervical lesion screening auxiliary diagnosis system is realized by freezing and unfreezing partial convolutional layers and fine-tuning network parameters.
The CNN model is composed of an input layer, an output layer, a hidden layer, and weights (parameters) connecting the layers. Each layer of network has multiple neurons, the neuron in the upper layer is mapped to the neuron in the next layer by an activation function, and each neuron has a corresponding weight, and the output is the classification category of the neurons. The CNN model has the following advantages: the method has the advantages that firstly, a parameter sharing mechanism is provided, in the convolutional neural network, the parameters are kernel values, and the values are the same for all regions, so that the number of the parameters can be small, and overfitting can be effectively prevented. Secondly, CNN has the advantage of a sparse connection mechanism, in the output of the convolutional network, the output of each "small lattice" is only related to the input image and its corresponding part, while the other parts are unrelated, and the computation amount is small. Finally, CNNs have the property of advanced feature extraction, with more advanced features being extracted continuously by using a process of convolutional pooling.
Of course, the VGG19 deep neural network is selected in the present embodiment, and actually, other CNN networks, such as VGG16, inclusion net, ResNet, etc., may be adopted in the deep model.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (10)

1. A sample data processing method of a cervical image comprises the following steps:
establishing classification: establishing a contrast data set, and acquiring a classification standard for cervical image characteristics on the basis of the contrast data set;
data preprocessing: acquiring and denoising sample data of a cervical image;
and (3) dividing: in the preprocessed data, segmenting the image data to obtain target image data;
data enhancement: classifying the target image data, confirming the difference between various target image data, and implementing enhancement processing aiming at the difference;
and (3) equalization processing: aiming at the total amount difference between various target image data, supplementing a few types of samples by adopting data fitting to realize the total amount balance between various target image data;
and (3) data set construction: aiming at various target image data after equalization processing, respectively and randomly dividing the target image data into a training data set, a verification data set and a test data set in proportion;
constructing a model: based on the training data set and/or the verification data set and/or the test data set, the data set is mapped to the comparison data set to obtain the corresponding classification of the sample data.
2. The method for processing sample data of a cervical image according to claim 1, wherein in the segmentation operation, the segmentation of the image data is performed by using an Ostu algorithm:
setting image data comprising foreground pixel data and background pixel data, and calculating to obtain a threshold value for distinguishing the foreground pixel data from the background pixel data;
the image data is divided into foreground pixel data and background pixel data by the threshold.
3. The method for processing the sample data of the cervical image according to claim 2, wherein the segmentation operation further includes performing a morphological operation on each of the foreground pixel data and/or the background pixel data obtained by the division, so as to obtain the target image data.
4. The method of claim 3, wherein said morphological operation includes at least any of adding, padding, deleting and segmenting.
5. The method for processing the sample data of the cervical image according to claim 1, wherein in the equalizing operation, the small number of sample data in the target data is processed by using SMOTE algorithm:
analyzing a few types of sample data;
and synthesizing a new sample according to the minority sample, and adding the synthesized new sample into the original minority sample to form a new minority sample set until the minority sample set realizes total balance among various types of target image data.
6. The method for processing the sample data of the cervical image according to claim 5, wherein in the equalizing operation, the SMOTE algorithm is adopted to process a few types of sample data in the target data as follows:
step 1: for each sample x in the minority class samples, calculating the sample x to a minority class sample set S by using Euclidean distance as a standardminObtaining k neighbors of the samples according to the distances of all the samples;
step 2: setting a sampling ratio according to the sample imbalance ratio to determine a sampling multiplying factor N, and randomly selecting a plurality of samples from k neighbors of each sample x of a minority class, wherein the selected neighbors are assumed to be xn
Step 3: for each randomly selected neighbor xnAnd respectively constructing new samples according to the following formulas with the original samples:
xnew=xn+rand(0,1)|x-xnwhere rand (0,1) refers to a random value in the range of 0 to 1.
7. The method for processing the sample data of the cervical image according to claim 1, wherein the data preprocessing operation at least includes a deleting operation performed on the image data with too bright and/or the image data with too dark and/or the image data with blur and/or the image data with impurities in the sample data.
8. The method for processing sample data of a cervical image according to claim 7, wherein the deletion operation includes a complete deletion operation performed on each image or a partial deletion operation performed on a target region after segmenting each image.
9. The method for processing the sample data of the cervical image according to claim 1, wherein in the data set constructing operation, the division ratio of the training data set, the verification data set and the testing data set of each target image data is 80%, 15% or 5%.
10. The method for processing the sample data of the cervical image according to claim 9, wherein the training data set, the verification data set, and the testing data set of each of the target image data of the respective types are constructed by randomly dividing according to a scale.
CN201911125170.1A 2019-11-15 2019-11-15 Sample data processing method for cervical image Pending CN110852396A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911125170.1A CN110852396A (en) 2019-11-15 2019-11-15 Sample data processing method for cervical image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911125170.1A CN110852396A (en) 2019-11-15 2019-11-15 Sample data processing method for cervical image

Publications (1)

Publication Number Publication Date
CN110852396A true CN110852396A (en) 2020-02-28

Family

ID=69600588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911125170.1A Pending CN110852396A (en) 2019-11-15 2019-11-15 Sample data processing method for cervical image

Country Status (1)

Country Link
CN (1) CN110852396A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111640097A (en) * 2020-05-26 2020-09-08 上海鹰瞳医疗科技有限公司 Skin mirror image identification method and equipment
CN111666872A (en) * 2020-06-04 2020-09-15 电子科技大学 Efficient behavior identification method under data imbalance
CN111723856A (en) * 2020-06-11 2020-09-29 广东浪潮大数据研究有限公司 Image data processing method, device and equipment and readable storage medium
CN111784593A (en) * 2020-06-04 2020-10-16 广东省智能制造研究所 Lung nodule CT image data enhancement method and system for deep learning
CN111863118A (en) * 2020-07-20 2020-10-30 湖南莱博赛医用机器人有限公司 Method for carrying out TCT and DNA ploidy analysis based on TCT film-making
CN112241715A (en) * 2020-10-23 2021-01-19 北京百度网讯科技有限公司 Model training method, expression recognition method, device, equipment and storage medium
CN112861734A (en) * 2021-02-10 2021-05-28 北京农业信息技术研究中心 Trough food residue monitoring method and system
CN113052865A (en) * 2021-04-16 2021-06-29 南通大学 Power transmission line small sample temperature image amplification method based on image similarity
CN113139944A (en) * 2021-04-25 2021-07-20 山东大学齐鲁医院 Deep learning-based colposcopic image classification computer-aided diagnosis system and method
CN113268623A (en) * 2021-06-01 2021-08-17 上海市第一人民医院 Artificial intelligence gastroscope image recognition processing system
WO2022121032A1 (en) * 2020-12-10 2022-06-16 广州广电运通金融电子股份有限公司 Data set division method and system in federated learning scene
TWI779284B (en) * 2020-05-06 2022-10-01 商之器科技股份有限公司 Device for marking image data
CN117541482A (en) * 2024-01-10 2024-02-09 中国人民解放军空军军医大学 Cervical image enhancement system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102495901A (en) * 2011-12-16 2012-06-13 山东师范大学 Method for keeping balance of implementation class data through local mean
CN105574859A (en) * 2015-12-14 2016-05-11 中国科学院深圳先进技术研究院 Liver tumor segmentation method and device based on CT (Computed Tomography) image
CN109410196A (en) * 2018-10-24 2019-03-01 东北大学 Cervical cancer tissues pathological image diagnostic method based on Poisson annular condition random field
CN109961838A (en) * 2019-03-04 2019-07-02 浙江工业大学 A kind of ultrasonic image chronic kidney disease auxiliary screening method based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102495901A (en) * 2011-12-16 2012-06-13 山东师范大学 Method for keeping balance of implementation class data through local mean
CN105574859A (en) * 2015-12-14 2016-05-11 中国科学院深圳先进技术研究院 Liver tumor segmentation method and device based on CT (Computed Tomography) image
CN109410196A (en) * 2018-10-24 2019-03-01 东北大学 Cervical cancer tissues pathological image diagnostic method based on Poisson annular condition random field
CN109961838A (en) * 2019-03-04 2019-07-02 浙江工业大学 A kind of ultrasonic image chronic kidney disease auxiliary screening method based on deep learning

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI779284B (en) * 2020-05-06 2022-10-01 商之器科技股份有限公司 Device for marking image data
CN111640097A (en) * 2020-05-26 2020-09-08 上海鹰瞳医疗科技有限公司 Skin mirror image identification method and equipment
CN111640097B (en) * 2020-05-26 2023-10-17 上海鹰瞳医疗科技有限公司 Dermatological image recognition method and dermatological image recognition equipment
CN111666872B (en) * 2020-06-04 2022-08-05 电子科技大学 Efficient behavior identification method under data imbalance
CN111666872A (en) * 2020-06-04 2020-09-15 电子科技大学 Efficient behavior identification method under data imbalance
CN111784593A (en) * 2020-06-04 2020-10-16 广东省智能制造研究所 Lung nodule CT image data enhancement method and system for deep learning
CN111723856A (en) * 2020-06-11 2020-09-29 广东浪潮大数据研究有限公司 Image data processing method, device and equipment and readable storage medium
CN111723856B (en) * 2020-06-11 2023-06-09 广东浪潮大数据研究有限公司 Image data processing method, device, equipment and readable storage medium
CN111863118A (en) * 2020-07-20 2020-10-30 湖南莱博赛医用机器人有限公司 Method for carrying out TCT and DNA ploidy analysis based on TCT film-making
CN111863118B (en) * 2020-07-20 2023-09-05 湖南莱博赛医用机器人有限公司 TCT and DNA ploidy analysis method based on TCT flaking
CN112241715A (en) * 2020-10-23 2021-01-19 北京百度网讯科技有限公司 Model training method, expression recognition method, device, equipment and storage medium
WO2022121032A1 (en) * 2020-12-10 2022-06-16 广州广电运通金融电子股份有限公司 Data set division method and system in federated learning scene
CN112861734A (en) * 2021-02-10 2021-05-28 北京农业信息技术研究中心 Trough food residue monitoring method and system
CN113052865A (en) * 2021-04-16 2021-06-29 南通大学 Power transmission line small sample temperature image amplification method based on image similarity
CN113052865B (en) * 2021-04-16 2023-12-19 南通大学 Power transmission line small sample temperature image amplification method based on image similarity
CN113139944A (en) * 2021-04-25 2021-07-20 山东大学齐鲁医院 Deep learning-based colposcopic image classification computer-aided diagnosis system and method
CN113268623B (en) * 2021-06-01 2022-07-19 上海市第一人民医院 Artificial intelligence gastroscope image identification processing system
CN113268623A (en) * 2021-06-01 2021-08-17 上海市第一人民医院 Artificial intelligence gastroscope image recognition processing system
CN117541482A (en) * 2024-01-10 2024-02-09 中国人民解放军空军军医大学 Cervical image enhancement system
CN117541482B (en) * 2024-01-10 2024-03-26 中国人民解放军空军军医大学 Cervical image enhancement system

Similar Documents

Publication Publication Date Title
CN110852396A (en) Sample data processing method for cervical image
Chen et al. GasHis-Transformer: A multi-scale visual transformer approach for gastric histopathological image detection
CN111667489B (en) Cancer hyperspectral image segmentation method and system based on double-branch attention deep learning
CN111401480B (en) Novel mammary gland MRI automatic auxiliary diagnosis method based on fusion attention mechanism
CN106940816B (en) CT image pulmonary nodule detection system based on 3D full convolution neural network
CN108052977B (en) Mammary gland molybdenum target image deep learning classification method based on lightweight neural network
CN110852350B (en) Pulmonary nodule benign and malignant classification method and system based on multi-scale migration learning
CN108416360B (en) Cancer diagnosis system and method based on breast molybdenum target calcification features
Ashwin et al. Efficient and reliable lung nodule detection using a neural network based computer aided diagnosis system
CN112132166B (en) Intelligent analysis method, system and device for digital cell pathology image
CN110309329A (en) The method of Weigh sensor and record alimentary canal tissue and foreign matter in endoscopy
CN113781489B (en) Polyp image semantic segmentation method and device
CN109785320A (en) It is a kind of to be classified to mammograms and known method for distinguishing using modified AlexNet model
CN114648806A (en) Multi-mechanism self-adaptive fundus image segmentation method
CN113450305B (en) Medical image processing method, system, equipment and readable storage medium
Akkar et al. Diagnosis of lung cancer disease based on back-propagation artificial neural network algorithm
Khan et al. A review of retinal vessel segmentation techniques and algorithms
CN115661029A (en) Pulmonary nodule detection and identification system based on YOLOv5
KR102407248B1 (en) Deep Learning based Gastric Classification System using Data Augmentation and Image Segmentation
CN112419246B (en) Depth detection network for quantifying esophageal mucosa IPCLs blood vessel morphological distribution
CN106097283A (en) A kind of multiple dimensioned X-ray image Enhancement Method based on human visual system's characteristic
Bhuvaneswari et al. Contrast enhancement of retinal images using green plan masking and whale optimization algorithm
CN117152179A (en) Segmentation and classification method for realizing rectal cancer CT image based on U-Net and SENet
CN111062909A (en) Method and equipment for judging benign and malignant breast tumor
Jagadeesh et al. Brain Tumour Classification using CNN Algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200228

RJ01 Rejection of invention patent application after publication