CN108229569A - The digital pathological image data set sample extending method adjusted based on staining components - Google Patents
The digital pathological image data set sample extending method adjusted based on staining components Download PDFInfo
- Publication number
- CN108229569A CN108229569A CN201810020438.4A CN201810020438A CN108229569A CN 108229569 A CN108229569 A CN 108229569A CN 201810020438 A CN201810020438 A CN 201810020438A CN 108229569 A CN108229569 A CN 108229569A
- Authority
- CN
- China
- Prior art keywords
- image
- digital pathological
- pathological image
- digital
- adjusted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000001575 pathological effect Effects 0.000 title claims abstract description 77
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000010186 staining Methods 0.000 title claims abstract description 11
- 238000012549 training Methods 0.000 claims abstract description 60
- 238000010801 machine learning Methods 0.000 claims abstract description 33
- 238000004043 dyeing Methods 0.000 claims abstract description 28
- 230000007170 pathology Effects 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 4
- 239000003086 colorant Substances 0.000 abstract description 3
- 238000000926 separation method Methods 0.000 abstract description 3
- 238000003745 diagnosis Methods 0.000 abstract description 2
- 238000004088 simulation Methods 0.000 abstract 1
- 238000010191 image analysis Methods 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000003902 lesion Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 206010008342 Cervix carcinoma Diseases 0.000 description 1
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 description 1
- 238000002835 absorbance Methods 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 201000010881 cervical cancer Diseases 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 238000004040 coloring Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000003703 image analysis method Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
A kind of digital pathological image data set sample extending method adjusted based on staining components provided by the invention.Before the training of machine learning algorithm each round, dyeing separation is carried out to each digital pathological image in training set first, and the ratio of each staining components image is adjusted at random, it is merged again, digital pathological image of the simulation generation under different coloring agents proportioning, so as to achieve the purpose that sample expands.The invention is a kind of data dynamic expanding method, and exptended sample is generated by random number, therefore the sample standard deviation used in the training of machine learning algorithm each round differs, so as to achieve the purpose that EDS extended data set.The invention can improve the precision of pathological image aided diagnosis method developed based on machine learning method, have a vast market prospect and application value.
Description
Technical Field
The invention relates to the field of digital image processing, in particular to a pathological image data set expansion method based on a dyeing component adjustment algorithm.
Background
The digital pathological image is a high-resolution digital image obtained by scanning and collecting pathological sections through a full-automatic microscope or an optical amplification system, and is widely applied to pathological clinical diagnosis. Pathological image analysis algorithms based on machine learning, especially deep learning, have been rapidly developed in recent years, and have become the mainstream method of pathological image analysis. The machine learning method enables a computer to repeatedly learn the pathological images which are clearly marked by pathological experts, so that pathological images which can be simulated by a pathologist can be analyzed. Different from natural scene image analysis, the pathological image analysis by using a machine learning method has the following difficulties: 1) the pathological image content is complex, only experienced pathological experts can give accurate labels, and the labeling cost is high, so that the number of samples for training is limited, and the performance of the final algorithm is influenced. 2) The digital pathological images are generally affected by the quality and ratio of the staining agent, so that the digital pathological images originally containing the same lesion type are distributed differently in the color space, and the digital pathological images originally containing different lesion types may be very similar in the color space. This phenomenon can affect the understanding of the machine learning algorithm to the pathological image, and affect the algorithm performance.
In the prior art, when the problem of insufficient training samples is faced, the image analysis method based on machine learning, especially deep learning, expands the samples in the training set to simulate and generate more samples, and the expanding method includes rotation, translation, inversion, scaling, noise addition and the like. The expansion methods can relieve the problem of insufficient training samples of the digital pathological images to a certain extent, but the methods are not designed aiming at the digital pathological images and are difficult to relieve the problem caused by dyeing difference.
Disclosure of Invention
The invention aims to solve the technical problem of providing a pathological image data set expansion method for relieving the problems caused by insufficient sample quantity and large dyeing difference when the existing machine learning method is applied to pathological image analysis and improving the accuracy of a pathological image analysis algorithm. In order to solve the technical problems, the technical scheme of the invention is as follows: the digital pathological image data set sample expansion method based on dyeing component adjustment is characterized by comprising the following steps of: step 1: collecting pathological images into a computer to form digital pathological images, storing each digital pathological image in the computer by an RGB channel, marking each digital pathological image as I (X, y), forming a machine learning model training sample set by a plurality of digital pathological images, and expressing the training sample set by X;
step 2: setting a dynamic adjustment parameter theta;
and step 3: before each training of the machine learning model, dynamically adjusting the dyeing proportion of each digital pathological image in the training sample set X to obtain an adjusted training sample set
And 4, step 4: using adjusted training sample setsPerforming a round of training on machine learning;
and 5: and (5) repeating the step (3) and the step (4) to obtain different adjusted training sample sets for machine model learning, so as to realize the expansion of the data samples.
Further, the method for dynamically adjusting the dyeing ratio of the digital pathological image in the step 3 comprises the following steps:
step a: recording the kth digital pathological image in the machine learning model training sample set X as Ik(x, y) to said Ik(x, y) separating the dye components to obtain Ik(x, y) independently stained component imageWherein n represents the number of staining components contained in the pathological image;
step b: generating n random numbers using the set dynamic adjustment parameter theta
Step c: using the random number in step b to independently dye the component imagesPerforming image processing to obtain adjusted independent dyeing component image
Step d: independently dyeing the stretched imageMerging, and converting the merged image back to an RGB channel to obtain an adjusted digital pathological image;
step e: repeating the steps a to d on each digital pathological image in the training sample set X to obtain the adjusted training sample set
Further, the random number in step bIs formulated as:
further, the adjusted independent dyeing component image in the step cAnd independent staining component images before adjustmentThe relationship between them is formulated as:
further, the parameter θ ∈ (0,1) is dynamically adjusted.
Before each round of training of a machine learning model, firstly, dyeing separation is carried out on each digital pathological image in a training set, the proportion of each dyeing component image is randomly adjusted, then fusion is carried out, and digital pathological images under different dye ratios are generated in a simulated mode, so that the purpose of sample expansion is achieved. Because the method is a dynamic data expansion method, and the sample form is determined by random numbers, the samples used in each training round of the machine learning algorithm are different, thereby achieving the purpose of expanding the data set and further improving the precision of the pathological image auxiliary diagnosis method developed based on the machine learning method.
Drawings
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
FIG. 1 is a flow chart of a machine learning method for digital image sample augmentation according to the present invention.
Detailed Description
The invention is further described with reference to the following drawings and detailed description.
Fig. 1 shows a method for expanding a sample of a digital pathology image dataset based on dye component adjustment, comprising the following steps:
step 1: collecting pathological images into a computer to form digital pathological images, displaying each digital pathological image in the computer by using an RGB channel, marking each digital pathological image as I (X, y), forming a machine learning model training sample set by using a plurality of digital pathological images, and expressing the training sample set by using X;
step 2: setting a dynamic adjustment parameter theta, wherein theta belongs to (0, 1);
and step 3: before each training of the machine learning model, dynamically adjusting the dyeing proportion of each digital pathological image in the training sample set X to obtain an adjusted training sample set
And 4, step 4: using adjusted training sample setsPerforming a round of training on machine learning;
and 5: and (5) repeating the step (3) and the step (4) to obtain different adjusted training sample sets for machine model learning, so as to realize the expansion of the data samples.
The method for dynamically adjusting the dyeing proportion of the digital pathological image in the step 3 comprises the following steps:
step a: recording the kth digital pathological image in the machine learning model training sample set X as Ik(x, y) to said Ik(x, y) separating the dye components to obtain Ik(x, y) independently stained component imageWherein n represents the number of staining components contained in the pathological image;
step b: dynamically adjusting the parameter theta to generate n random numbers, wherein the random numbers are expressed by a formula as follows:
step c: using the random number in step b to independently dye the component imagesPerforming image processing to obtain adjusted independent dyeing component imageStretching, rotation, translation, inversion, scaling or adding noise may be employed in particular.Andis formulated as:
step d: independently dyeing the stretched imageMerging, and converting the merged image back to an RGB channel to obtain an adjusted digital pathological image;
step e: repeating the steps a to d on each digital pathological image in the training sample set X to obtain the adjusted training sample set
The following is an example of the method of the present invention to augment a sample set of H-E-DAB stained digital pathology images.
Taking a pathological image dyed by H-E-DAB as an example, the method specifically comprises the following steps:
the method comprises the following steps: collecting pathological images into a computer, forming digital pathological images, and expressing the digital pathological images by RGB channels, wherein the images are marked as I (x, y) [ [ I ] ]R(x,y),IG(x,y),IB(x,y)]In which IR(x,y)、IG(x,y)、IB(x, y) respectively represent the values of three color channels of point (x, y), red, green and blue in the image, and Ic(x,y)∈[0,1]And c is R, G, B. A machine learning model training sample set is composed of a plurality of digital pathological images I (X, y), and is represented by a set X: x ═ I1,I2,…,IKWhere K represents the number of digital pathology images contained in the dataset.
Step two: in order to limit the adjusting range of the dyeing components in the sample expansion, a dynamic parameter theta is introduced, and a value range theta epsilon (0,1) is limited.
Step three: before each round of training of the machine learning model, dynamically adjusting the dyeing proportion of each digital pathological image in the training set X, wherein the specific process comprises the following steps:
1. let the k (k ═ 1,2,3 …, k) th digital pathology image sample in the dataset X be Ik(x, y) to said Ik(x, y) separating the dye components to obtain Ik(x, y) H, E, DAB independent staining component imageIn this embodiment, a Color Deconvolution algorithm is used for separating the dyeing components, and the dyeing separation may adopt, but is not limited to, a Color Deconvolution (Color Deconvolution) algorithm, and the specific steps are as follows:
a) calculate the optical density of the RGB channel (c ═ r, g, b):
wherein, ImaxAs a single channel maximum, in this method Ic(x,y)∈[0,1]Therefore I ismax=1。
b) Calculating the tinting strength A of the individual colorantssThe conversion relationship of (x, y) (s ═ H, E, DAB) is as follows:
wherein,representing the absorbance of the staining agent s on channel c, matrix
Referred to as the inverse convolution matrix, is,is a constant for stain s and channel c, for H-E-DABFor the stained digital pathology image, the deconvolution matrix of channel c for three stains H, E and DAB is
Order to
Equation (6) can be abbreviated as:
Ak=DOk
namely the decomposed dyeing intensity.
c) Calculating an image of the individual stain components, involving the formula:
Amaxthe maximum value of coloring intensity of the coloring agent is the value range of corresponding RGB channel in the method ([0,255 ]]) Taking Amax=255。
2. And dynamically adjusting the parameter theta to generate random numbers. The random number is formulated as:
3. using the random numbers described above to independently stain component imagesStretching to obtain an adjusted independent dyeing component imageIs expressed by formula as
4. Independently dyeing the stretched imageMerging, and converting back to RGB channel to obtain adjusted digital pathological imageThe method comprises the following specific steps:
a) calculating adjusted tinting strength
b) Calculating the adjusted optical density
c) Calculating to obtain adjusted digital pathological image
The formula is used:
5. the process in the step is implemented on all samples in the training sample set X to obtain an adjusted training sample setCompleting one adjustment.
Step four: using adjusted training sample setsAnd carrying out one round of training on the machine learning model. After finishing one round of training, executing the third step on the training sample set again to obtain another adjusted training sample setAnd performing one round of training on the machine learning model again. And repeating the steps continuously until the training of the machine learning model is finished. Because each adjustment to the training sample set is by a random numberCompleted, each generationDifferent from each other, so the machine learning algorithm faces different samples in each round of training, and the purpose of expanding the training samples is achieved.
The method of the invention is verified in a cervical cancer pathological image data set, digital pathological image samples in the data set are marked as 'cancerous region contained' and 'non-cancerous region contained', and the data set is classified by using a machine learning model ResNext. When the sample expansion method provided by the invention is not used for training, the classification precision of the test set is 82.86%; when the sample expansion method provided by the invention is used for training, the classification precision is improved to 88.61%, and the method can effectively improve the precision of analyzing the digital pathological images by a machine learning algorithm.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (5)
1. The digital pathological image data set sample expansion method based on dyeing component adjustment is characterized by comprising the following steps of:
step 1: collecting pathological images into a computer to form digital pathological images, storing each digital pathological image in the computer by an RGB channel, marking each digital pathological image as I (X, y), forming a machine learning model training sample set by a plurality of digital pathological images, and expressing the training sample set by X;
step 2: setting a dynamic adjustment parameter theta;
and step 3: on-machine learning modelBefore each training, dynamically adjusting the dyeing proportion of each digital pathological image in the training sample set X to obtain an adjusted training sample set
And 4, step 4: using adjusted training sample setsPerforming a round of training on machine learning;
and 5: and (5) repeating the step (3) and the step (4) to obtain different adjusted training sample sets for machine model learning, so as to realize the expansion of the data samples.
2. The method for expanding a sample of a digital pathological image data set based on dye component adjustment according to claim 1, wherein the method for dynamically adjusting the dye ratio of the digital pathological image in step 3 comprises the following steps:
step a: recording the kth digital pathological image in the machine learning model training sample set X as Ik(x, y) to said Ik(x, y) separating the dye components to obtain Ik(x, y) independently stained component imageWherein n represents the number of staining components contained in the pathological image;
step b: generating n random numbers using the set dynamic adjustment parameter theta
Step c: using the random number in step b to independently dye the component imagesPerforming image processing to obtain adjusted independent dyeing component image
Step d: independently dyeing the stretched imageMerging, and converting the merged image back to an RGB channel to obtain an adjusted digital pathological image;
step e: repeating the steps a to d on each digital pathological image in the training sample set X to obtain the adjusted training sample set
3. The method for expanding a sample of a digital pathology image dataset based on dye composition adjustment according to claim 2, wherein the random number in step bIs formulated as:
4. the method for expanding digital pathology image dataset samples based on dye component adjustment according to claim 3, wherein the adjusted individual dye component images in step cAnd independent staining component images before adjustmentThe relationship between them is formulated as:
5. the method for expanding a sample of a digital pathology image dataset based on dye composition adjustment according to claim 1, characterized in that the parameter θ e (0,1) is dynamically adjusted in step 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810020438.4A CN108229569A (en) | 2018-01-10 | 2018-01-10 | The digital pathological image data set sample extending method adjusted based on staining components |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810020438.4A CN108229569A (en) | 2018-01-10 | 2018-01-10 | The digital pathological image data set sample extending method adjusted based on staining components |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108229569A true CN108229569A (en) | 2018-06-29 |
Family
ID=62640621
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810020438.4A Pending CN108229569A (en) | 2018-01-10 | 2018-01-10 | The digital pathological image data set sample extending method adjusted based on staining components |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108229569A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109544529A (en) * | 2018-11-19 | 2019-03-29 | 南京信息工程大学 | Pathological image data enhancement methods towards deep learning model training and study |
CN110619312A (en) * | 2019-09-20 | 2019-12-27 | 百度在线网络技术(北京)有限公司 | Method, device and equipment for enhancing positioning element data and storage medium |
CN111291833A (en) * | 2020-03-20 | 2020-06-16 | 京东方科技集团股份有限公司 | Data enhancement method and data enhancement device applied to supervised learning system training |
CN112132843A (en) * | 2020-09-30 | 2020-12-25 | 福建师范大学 | Hematoxylin-eosin staining pathological image segmentation method based on unsupervised deep learning |
CN112488234A (en) * | 2020-12-10 | 2021-03-12 | 武汉大学 | End-to-end histopathology image classification method based on attention pooling |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150055844A1 (en) * | 2013-08-21 | 2015-02-26 | Sectra Ab | Methods, systems and circuits for generating magnification-dependent images suitable for whole slide images |
CN104408717A (en) * | 2014-11-24 | 2015-03-11 | 北京航空航天大学 | Pathological image color quality comprehensive evaluation method based on color separation |
-
2018
- 2018-01-10 CN CN201810020438.4A patent/CN108229569A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150055844A1 (en) * | 2013-08-21 | 2015-02-26 | Sectra Ab | Methods, systems and circuits for generating magnification-dependent images suitable for whole slide images |
CN104408717A (en) * | 2014-11-24 | 2015-03-11 | 北京航空航天大学 | Pathological image color quality comprehensive evaluation method based on color separation |
Non-Patent Citations (3)
Title |
---|
JAKUB M. TOMCZAK ET.AL: "Deep Learning with Permutation-invariant Operator for Multi-instance Histopathology Classification", 《ARXIV:1712.00310V1 [CS.LG]》 * |
LE HOU ET.AL: "Efficient Multiple Instance Convolutional Neural Networks for Gigapixel Resolution Image Classification", 《ARXIV:1504.07947V3 [CS.CV]》 * |
LE HOU ET.AL: "Patch-based Convolutional Neural Network for Whole Slide Tissue Image Classification", 《2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109544529A (en) * | 2018-11-19 | 2019-03-29 | 南京信息工程大学 | Pathological image data enhancement methods towards deep learning model training and study |
CN110619312A (en) * | 2019-09-20 | 2019-12-27 | 百度在线网络技术(北京)有限公司 | Method, device and equipment for enhancing positioning element data and storage medium |
CN111291833A (en) * | 2020-03-20 | 2020-06-16 | 京东方科技集团股份有限公司 | Data enhancement method and data enhancement device applied to supervised learning system training |
CN112132843A (en) * | 2020-09-30 | 2020-12-25 | 福建师范大学 | Hematoxylin-eosin staining pathological image segmentation method based on unsupervised deep learning |
CN112132843B (en) * | 2020-09-30 | 2023-05-19 | 福建师范大学 | Hematoxylin-eosin staining pathological image segmentation method based on unsupervised deep learning |
CN112488234A (en) * | 2020-12-10 | 2021-03-12 | 武汉大学 | End-to-end histopathology image classification method based on attention pooling |
CN112488234B (en) * | 2020-12-10 | 2022-04-29 | 武汉大学 | End-to-end histopathology image classification method based on attention pooling |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108229569A (en) | The digital pathological image data set sample extending method adjusted based on staining components | |
CN108830912B (en) | Interactive gray image coloring method for depth feature-based antagonistic learning | |
Xu et al. | GAN-based virtual re-staining: a promising solution for whole slide image analysis | |
Zhang et al. | Palette-based image recoloring using color decomposition optimization | |
CN105741266B (en) | A kind of pathological image nucleus method for rapidly positioning | |
CN104463843B (en) | Interactive image segmentation method of Android system | |
CN111161272B (en) | Embryo tissue segmentation method based on generation of confrontation network | |
CN110517268A (en) | Pathological image processing method, device, image analysis system and storage medium | |
CN110910347A (en) | Image segmentation-based tone mapping image no-reference quality evaluation method | |
CN110163855B (en) | Color image quality evaluation method based on multi-path deep convolutional neural network | |
CN109102510B (en) | Breast cancer pathological tissue image segmentation method based on semi-supervised k-means algorithm | |
CN117437457A (en) | Self-adaptive color space selection model training method for histopathological image classification | |
CN115018729B (en) | Content-oriented white box image enhancement method | |
CN113610863B (en) | Multi-exposure image fusion quality assessment method | |
Jimenez-Arredondo et al. | Multilevel color transfer on images for providing an artistic sight of the world | |
Pinchaud et al. | Camelyon17 grand challenge | |
CN114627010B (en) | Dyeing space migration method based on dyeing density map | |
CN114187380A (en) | Color transfer method based on visual saliency and channel attention mechanism | |
CN106056544A (en) | Video image raindrop removing method and video image raindrop removing system | |
Khan et al. | Fast color transfer from multiple images | |
Vibashan et al. | Target and task specific source-free domain adaptive image segmentation | |
Biswas et al. | Feature Fusion GAN Based Virtual Staining on Plant Microscopy Images | |
TWI781027B (en) | Neural network system for staining images and image staining conversion method | |
Fang et al. | A domain-invariant feature learning framework for histopathology images | |
CN113744279B (en) | Image segmentation method based on FAF-Net network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180629 |