CN108229569A - The digital pathological image data set sample extending method adjusted based on staining components - Google Patents

The digital pathological image data set sample extending method adjusted based on staining components Download PDF

Info

Publication number
CN108229569A
CN108229569A CN201810020438.4A CN201810020438A CN108229569A CN 108229569 A CN108229569 A CN 108229569A CN 201810020438 A CN201810020438 A CN 201810020438A CN 108229569 A CN108229569 A CN 108229569A
Authority
CN
China
Prior art keywords
image
digital pathological
pathological image
digital
adjusted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810020438.4A
Other languages
Chinese (zh)
Inventor
姜志国
郑钰山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mike Audi (xiamen) Medical Diagnosis System Co Ltd
Original Assignee
Mike Audi (xiamen) Medical Diagnosis System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mike Audi (xiamen) Medical Diagnosis System Co Ltd filed Critical Mike Audi (xiamen) Medical Diagnosis System Co Ltd
Priority to CN201810020438.4A priority Critical patent/CN108229569A/en
Publication of CN108229569A publication Critical patent/CN108229569A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

A kind of digital pathological image data set sample extending method adjusted based on staining components provided by the invention.Before the training of machine learning algorithm each round, dyeing separation is carried out to each digital pathological image in training set first, and the ratio of each staining components image is adjusted at random, it is merged again, digital pathological image of the simulation generation under different coloring agents proportioning, so as to achieve the purpose that sample expands.The invention is a kind of data dynamic expanding method, and exptended sample is generated by random number, therefore the sample standard deviation used in the training of machine learning algorithm each round differs, so as to achieve the purpose that EDS extended data set.The invention can improve the precision of pathological image aided diagnosis method developed based on machine learning method, have a vast market prospect and application value.

Description

Digital pathological image data set sample expansion method based on dyeing component adjustment
Technical Field
The invention relates to the field of digital image processing, in particular to a pathological image data set expansion method based on a dyeing component adjustment algorithm.
Background
The digital pathological image is a high-resolution digital image obtained by scanning and collecting pathological sections through a full-automatic microscope or an optical amplification system, and is widely applied to pathological clinical diagnosis. Pathological image analysis algorithms based on machine learning, especially deep learning, have been rapidly developed in recent years, and have become the mainstream method of pathological image analysis. The machine learning method enables a computer to repeatedly learn the pathological images which are clearly marked by pathological experts, so that pathological images which can be simulated by a pathologist can be analyzed. Different from natural scene image analysis, the pathological image analysis by using a machine learning method has the following difficulties: 1) the pathological image content is complex, only experienced pathological experts can give accurate labels, and the labeling cost is high, so that the number of samples for training is limited, and the performance of the final algorithm is influenced. 2) The digital pathological images are generally affected by the quality and ratio of the staining agent, so that the digital pathological images originally containing the same lesion type are distributed differently in the color space, and the digital pathological images originally containing different lesion types may be very similar in the color space. This phenomenon can affect the understanding of the machine learning algorithm to the pathological image, and affect the algorithm performance.
In the prior art, when the problem of insufficient training samples is faced, the image analysis method based on machine learning, especially deep learning, expands the samples in the training set to simulate and generate more samples, and the expanding method includes rotation, translation, inversion, scaling, noise addition and the like. The expansion methods can relieve the problem of insufficient training samples of the digital pathological images to a certain extent, but the methods are not designed aiming at the digital pathological images and are difficult to relieve the problem caused by dyeing difference.
Disclosure of Invention
The invention aims to solve the technical problem of providing a pathological image data set expansion method for relieving the problems caused by insufficient sample quantity and large dyeing difference when the existing machine learning method is applied to pathological image analysis and improving the accuracy of a pathological image analysis algorithm. In order to solve the technical problems, the technical scheme of the invention is as follows: the digital pathological image data set sample expansion method based on dyeing component adjustment is characterized by comprising the following steps of: step 1: collecting pathological images into a computer to form digital pathological images, storing each digital pathological image in the computer by an RGB channel, marking each digital pathological image as I (X, y), forming a machine learning model training sample set by a plurality of digital pathological images, and expressing the training sample set by X;
step 2: setting a dynamic adjustment parameter theta;
and step 3: before each training of the machine learning model, dynamically adjusting the dyeing proportion of each digital pathological image in the training sample set X to obtain an adjusted training sample set
And 4, step 4: using adjusted training sample setsPerforming a round of training on machine learning;
and 5: and (5) repeating the step (3) and the step (4) to obtain different adjusted training sample sets for machine model learning, so as to realize the expansion of the data samples.
Further, the method for dynamically adjusting the dyeing ratio of the digital pathological image in the step 3 comprises the following steps:
step a: recording the kth digital pathological image in the machine learning model training sample set X as Ik(x, y) to said Ik(x, y) separating the dye components to obtain Ik(x, y) independently stained component imageWherein n represents the number of staining components contained in the pathological image;
step b: generating n random numbers using the set dynamic adjustment parameter theta
Step c: using the random number in step b to independently dye the component imagesPerforming image processing to obtain adjusted independent dyeing component image
Step d: independently dyeing the stretched imageMerging, and converting the merged image back to an RGB channel to obtain an adjusted digital pathological image;
step e: repeating the steps a to d on each digital pathological image in the training sample set X to obtain the adjusted training sample set
Further, the random number in step bIs formulated as:
further, the adjusted independent dyeing component image in the step cAnd independent staining component images before adjustmentThe relationship between them is formulated as:
further, the parameter θ ∈ (0,1) is dynamically adjusted.
Before each round of training of a machine learning model, firstly, dyeing separation is carried out on each digital pathological image in a training set, the proportion of each dyeing component image is randomly adjusted, then fusion is carried out, and digital pathological images under different dye ratios are generated in a simulated mode, so that the purpose of sample expansion is achieved. Because the method is a dynamic data expansion method, and the sample form is determined by random numbers, the samples used in each training round of the machine learning algorithm are different, thereby achieving the purpose of expanding the data set and further improving the precision of the pathological image auxiliary diagnosis method developed based on the machine learning method.
Drawings
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
FIG. 1 is a flow chart of a machine learning method for digital image sample augmentation according to the present invention.
Detailed Description
The invention is further described with reference to the following drawings and detailed description.
Fig. 1 shows a method for expanding a sample of a digital pathology image dataset based on dye component adjustment, comprising the following steps:
step 1: collecting pathological images into a computer to form digital pathological images, displaying each digital pathological image in the computer by using an RGB channel, marking each digital pathological image as I (X, y), forming a machine learning model training sample set by using a plurality of digital pathological images, and expressing the training sample set by using X;
step 2: setting a dynamic adjustment parameter theta, wherein theta belongs to (0, 1);
and step 3: before each training of the machine learning model, dynamically adjusting the dyeing proportion of each digital pathological image in the training sample set X to obtain an adjusted training sample set
And 4, step 4: using adjusted training sample setsPerforming a round of training on machine learning;
and 5: and (5) repeating the step (3) and the step (4) to obtain different adjusted training sample sets for machine model learning, so as to realize the expansion of the data samples.
The method for dynamically adjusting the dyeing proportion of the digital pathological image in the step 3 comprises the following steps:
step a: recording the kth digital pathological image in the machine learning model training sample set X as Ik(x, y) to said Ik(x, y) separating the dye components to obtain Ik(x, y) independently stained component imageWherein n represents the number of staining components contained in the pathological image;
step b: dynamically adjusting the parameter theta to generate n random numbers, wherein the random numbers are expressed by a formula as follows:
step c: using the random number in step b to independently dye the component imagesPerforming image processing to obtain adjusted independent dyeing component imageStretching, rotation, translation, inversion, scaling or adding noise may be employed in particular.Andis formulated as:
step d: independently dyeing the stretched imageMerging, and converting the merged image back to an RGB channel to obtain an adjusted digital pathological image;
step e: repeating the steps a to d on each digital pathological image in the training sample set X to obtain the adjusted training sample set
The following is an example of the method of the present invention to augment a sample set of H-E-DAB stained digital pathology images.
Taking a pathological image dyed by H-E-DAB as an example, the method specifically comprises the following steps:
the method comprises the following steps: collecting pathological images into a computer, forming digital pathological images, and expressing the digital pathological images by RGB channels, wherein the images are marked as I (x, y) [ [ I ] ]R(x,y),IG(x,y),IB(x,y)]In which IR(x,y)、IG(x,y)、IB(x, y) respectively represent the values of three color channels of point (x, y), red, green and blue in the image, and Ic(x,y)∈[0,1]And c is R, G, B. A machine learning model training sample set is composed of a plurality of digital pathological images I (X, y), and is represented by a set X: x ═ I1,I2,…,IKWhere K represents the number of digital pathology images contained in the dataset.
Step two: in order to limit the adjusting range of the dyeing components in the sample expansion, a dynamic parameter theta is introduced, and a value range theta epsilon (0,1) is limited.
Step three: before each round of training of the machine learning model, dynamically adjusting the dyeing proportion of each digital pathological image in the training set X, wherein the specific process comprises the following steps:
1. let the k (k ═ 1,2,3 …, k) th digital pathology image sample in the dataset X be Ik(x, y) to said Ik(x, y) separating the dye components to obtain Ik(x, y) H, E, DAB independent staining component imageIn this embodiment, a Color Deconvolution algorithm is used for separating the dyeing components, and the dyeing separation may adopt, but is not limited to, a Color Deconvolution (Color Deconvolution) algorithm, and the specific steps are as follows:
a) calculate the optical density of the RGB channel (c ═ r, g, b):
wherein, ImaxAs a single channel maximum, in this method Ic(x,y)∈[0,1]Therefore I ismax=1。
b) Calculating the tinting strength A of the individual colorantssThe conversion relationship of (x, y) (s ═ H, E, DAB) is as follows:
wherein,representing the absorbance of the staining agent s on channel c, matrix
Referred to as the inverse convolution matrix, is,is a constant for stain s and channel c, for H-E-DABFor the stained digital pathology image, the deconvolution matrix of channel c for three stains H, E and DAB is
Order to
Equation (6) can be abbreviated as:
Ak=DOk
namely the decomposed dyeing intensity.
c) Calculating an image of the individual stain components, involving the formula:
Amaxthe maximum value of coloring intensity of the coloring agent is the value range of corresponding RGB channel in the method ([0,255 ]]) Taking Amax=255。
2. And dynamically adjusting the parameter theta to generate random numbers. The random number is formulated as:
3. using the random numbers described above to independently stain component imagesStretching to obtain an adjusted independent dyeing component imageIs expressed by formula as
4. Independently dyeing the stretched imageMerging, and converting back to RGB channel to obtain adjusted digital pathological imageThe method comprises the following specific steps:
a) calculating adjusted tinting strength
b) Calculating the adjusted optical density
c) Calculating to obtain adjusted digital pathological image
The formula is used:
5. the process in the step is implemented on all samples in the training sample set X to obtain an adjusted training sample setCompleting one adjustment.
Step four: using adjusted training sample setsAnd carrying out one round of training on the machine learning model. After finishing one round of training, executing the third step on the training sample set again to obtain another adjusted training sample setAnd performing one round of training on the machine learning model again. And repeating the steps continuously until the training of the machine learning model is finished. Because each adjustment to the training sample set is by a random numberCompleted, each generationDifferent from each other, so the machine learning algorithm faces different samples in each round of training, and the purpose of expanding the training samples is achieved.
The method of the invention is verified in a cervical cancer pathological image data set, digital pathological image samples in the data set are marked as 'cancerous region contained' and 'non-cancerous region contained', and the data set is classified by using a machine learning model ResNext. When the sample expansion method provided by the invention is not used for training, the classification precision of the test set is 82.86%; when the sample expansion method provided by the invention is used for training, the classification precision is improved to 88.61%, and the method can effectively improve the precision of analyzing the digital pathological images by a machine learning algorithm.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (5)

1. The digital pathological image data set sample expansion method based on dyeing component adjustment is characterized by comprising the following steps of:
step 1: collecting pathological images into a computer to form digital pathological images, storing each digital pathological image in the computer by an RGB channel, marking each digital pathological image as I (X, y), forming a machine learning model training sample set by a plurality of digital pathological images, and expressing the training sample set by X;
step 2: setting a dynamic adjustment parameter theta;
and step 3: on-machine learning modelBefore each training, dynamically adjusting the dyeing proportion of each digital pathological image in the training sample set X to obtain an adjusted training sample set
And 4, step 4: using adjusted training sample setsPerforming a round of training on machine learning;
and 5: and (5) repeating the step (3) and the step (4) to obtain different adjusted training sample sets for machine model learning, so as to realize the expansion of the data samples.
2. The method for expanding a sample of a digital pathological image data set based on dye component adjustment according to claim 1, wherein the method for dynamically adjusting the dye ratio of the digital pathological image in step 3 comprises the following steps:
step a: recording the kth digital pathological image in the machine learning model training sample set X as Ik(x, y) to said Ik(x, y) separating the dye components to obtain Ik(x, y) independently stained component imageWherein n represents the number of staining components contained in the pathological image;
step b: generating n random numbers using the set dynamic adjustment parameter theta
Step c: using the random number in step b to independently dye the component imagesPerforming image processing to obtain adjusted independent dyeing component image
Step d: independently dyeing the stretched imageMerging, and converting the merged image back to an RGB channel to obtain an adjusted digital pathological image;
step e: repeating the steps a to d on each digital pathological image in the training sample set X to obtain the adjusted training sample set
3. The method for expanding a sample of a digital pathology image dataset based on dye composition adjustment according to claim 2, wherein the random number in step bIs formulated as:
4. the method for expanding digital pathology image dataset samples based on dye component adjustment according to claim 3, wherein the adjusted individual dye component images in step cAnd independent staining component images before adjustmentThe relationship between them is formulated as:
5. the method for expanding a sample of a digital pathology image dataset based on dye composition adjustment according to claim 1, characterized in that the parameter θ e (0,1) is dynamically adjusted in step 2.
CN201810020438.4A 2018-01-10 2018-01-10 The digital pathological image data set sample extending method adjusted based on staining components Pending CN108229569A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810020438.4A CN108229569A (en) 2018-01-10 2018-01-10 The digital pathological image data set sample extending method adjusted based on staining components

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810020438.4A CN108229569A (en) 2018-01-10 2018-01-10 The digital pathological image data set sample extending method adjusted based on staining components

Publications (1)

Publication Number Publication Date
CN108229569A true CN108229569A (en) 2018-06-29

Family

ID=62640621

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810020438.4A Pending CN108229569A (en) 2018-01-10 2018-01-10 The digital pathological image data set sample extending method adjusted based on staining components

Country Status (1)

Country Link
CN (1) CN108229569A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109544529A (en) * 2018-11-19 2019-03-29 南京信息工程大学 Pathological image data enhancement methods towards deep learning model training and study
CN110619312A (en) * 2019-09-20 2019-12-27 百度在线网络技术(北京)有限公司 Method, device and equipment for enhancing positioning element data and storage medium
CN111291833A (en) * 2020-03-20 2020-06-16 京东方科技集团股份有限公司 Data enhancement method and data enhancement device applied to supervised learning system training
CN112132843A (en) * 2020-09-30 2020-12-25 福建师范大学 Hematoxylin-eosin staining pathological image segmentation method based on unsupervised deep learning
CN112488234A (en) * 2020-12-10 2021-03-12 武汉大学 End-to-end histopathology image classification method based on attention pooling

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150055844A1 (en) * 2013-08-21 2015-02-26 Sectra Ab Methods, systems and circuits for generating magnification-dependent images suitable for whole slide images
CN104408717A (en) * 2014-11-24 2015-03-11 北京航空航天大学 Pathological image color quality comprehensive evaluation method based on color separation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150055844A1 (en) * 2013-08-21 2015-02-26 Sectra Ab Methods, systems and circuits for generating magnification-dependent images suitable for whole slide images
CN104408717A (en) * 2014-11-24 2015-03-11 北京航空航天大学 Pathological image color quality comprehensive evaluation method based on color separation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JAKUB M. TOMCZAK ET.AL: "Deep Learning with Permutation-invariant Operator for Multi-instance Histopathology Classification", 《ARXIV:1712.00310V1 [CS.LG]》 *
LE HOU ET.AL: "Efficient Multiple Instance Convolutional Neural Networks for Gigapixel Resolution Image Classification", 《ARXIV:1504.07947V3 [CS.CV]》 *
LE HOU ET.AL: "Patch-based Convolutional Neural Network for Whole Slide Tissue Image Classification", 《2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109544529A (en) * 2018-11-19 2019-03-29 南京信息工程大学 Pathological image data enhancement methods towards deep learning model training and study
CN110619312A (en) * 2019-09-20 2019-12-27 百度在线网络技术(北京)有限公司 Method, device and equipment for enhancing positioning element data and storage medium
CN111291833A (en) * 2020-03-20 2020-06-16 京东方科技集团股份有限公司 Data enhancement method and data enhancement device applied to supervised learning system training
CN112132843A (en) * 2020-09-30 2020-12-25 福建师范大学 Hematoxylin-eosin staining pathological image segmentation method based on unsupervised deep learning
CN112132843B (en) * 2020-09-30 2023-05-19 福建师范大学 Hematoxylin-eosin staining pathological image segmentation method based on unsupervised deep learning
CN112488234A (en) * 2020-12-10 2021-03-12 武汉大学 End-to-end histopathology image classification method based on attention pooling
CN112488234B (en) * 2020-12-10 2022-04-29 武汉大学 End-to-end histopathology image classification method based on attention pooling

Similar Documents

Publication Publication Date Title
CN108229569A (en) The digital pathological image data set sample extending method adjusted based on staining components
CN108830912B (en) Interactive gray image coloring method for depth feature-based antagonistic learning
Xu et al. GAN-based virtual re-staining: a promising solution for whole slide image analysis
Zhang et al. Palette-based image recoloring using color decomposition optimization
CN105741266B (en) A kind of pathological image nucleus method for rapidly positioning
CN104463843B (en) Interactive image segmentation method of Android system
CN111161272B (en) Embryo tissue segmentation method based on generation of confrontation network
CN110517268A (en) Pathological image processing method, device, image analysis system and storage medium
CN110910347A (en) Image segmentation-based tone mapping image no-reference quality evaluation method
CN110163855B (en) Color image quality evaluation method based on multi-path deep convolutional neural network
CN109102510B (en) Breast cancer pathological tissue image segmentation method based on semi-supervised k-means algorithm
CN117437457A (en) Self-adaptive color space selection model training method for histopathological image classification
CN115018729B (en) Content-oriented white box image enhancement method
CN113610863B (en) Multi-exposure image fusion quality assessment method
Jimenez-Arredondo et al. Multilevel color transfer on images for providing an artistic sight of the world
Pinchaud et al. Camelyon17 grand challenge
CN114627010B (en) Dyeing space migration method based on dyeing density map
CN114187380A (en) Color transfer method based on visual saliency and channel attention mechanism
CN106056544A (en) Video image raindrop removing method and video image raindrop removing system
Khan et al. Fast color transfer from multiple images
Vibashan et al. Target and task specific source-free domain adaptive image segmentation
Biswas et al. Feature Fusion GAN Based Virtual Staining on Plant Microscopy Images
TWI781027B (en) Neural network system for staining images and image staining conversion method
Fang et al. A domain-invariant feature learning framework for histopathology images
CN113744279B (en) Image segmentation method based on FAF-Net network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180629