WO2023287235A1 - 병리 이미지 분석 방법 및 시스템 - Google Patents
병리 이미지 분석 방법 및 시스템 Download PDFInfo
- Publication number
- WO2023287235A1 WO2023287235A1 PCT/KR2022/010321 KR2022010321W WO2023287235A1 WO 2023287235 A1 WO2023287235 A1 WO 2023287235A1 KR 2022010321 W KR2022010321 W KR 2022010321W WO 2023287235 A1 WO2023287235 A1 WO 2023287235A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pathology
- image
- pathology image
- data set
- type
- Prior art date
Links
- 230000007170 pathology Effects 0.000 title claims abstract description 632
- 238000003703 image analysis method Methods 0.000 title claims abstract description 25
- 238000004458 analytical method Methods 0.000 claims abstract description 210
- 238000012549 training Methods 0.000 claims abstract description 116
- 238000010801 machine learning Methods 0.000 claims abstract description 88
- 238000000034 method Methods 0.000 claims abstract description 46
- 230000014509 gene expression Effects 0.000 claims description 81
- 238000010186 staining Methods 0.000 claims description 81
- 206010028980 Neoplasm Diseases 0.000 claims description 28
- 230000001575 pathological effect Effects 0.000 claims description 28
- 238000011156 evaluation Methods 0.000 claims description 26
- 238000005070 sampling Methods 0.000 claims description 17
- 230000010365 information processing Effects 0.000 claims description 9
- 239000003086 colorant Substances 0.000 claims description 6
- 238000010191 image analysis Methods 0.000 description 97
- 210000004027 cell Anatomy 0.000 description 57
- 210000001519 tissue Anatomy 0.000 description 47
- 238000010586 diagram Methods 0.000 description 29
- 230000000875 corresponding effect Effects 0.000 description 28
- 238000011532 immunohistochemical staining Methods 0.000 description 24
- 238000007447 staining method Methods 0.000 description 20
- 210000000170 cell membrane Anatomy 0.000 description 19
- 238000013528 artificial neural network Methods 0.000 description 18
- 239000000284 extract Substances 0.000 description 18
- 101001012157 Homo sapiens Receptor tyrosine-protein kinase erbB-2 Proteins 0.000 description 16
- 102100030086 Receptor tyrosine-protein kinase erbB-2 Human genes 0.000 description 16
- 210000003855 cell nucleus Anatomy 0.000 description 16
- 238000000605 extraction Methods 0.000 description 13
- 238000003364 immunohistochemistry Methods 0.000 description 13
- 201000009030 Carcinoma Diseases 0.000 description 12
- 210000004881 tumor cell Anatomy 0.000 description 12
- 230000000877 morphologic effect Effects 0.000 description 11
- 238000004891 communication Methods 0.000 description 10
- 238000002372 labelling Methods 0.000 description 10
- 230000011218 segmentation Effects 0.000 description 10
- 238000009826 distribution Methods 0.000 description 9
- 238000011160 research Methods 0.000 description 9
- 230000002159 abnormal effect Effects 0.000 description 8
- 238000004590 computer program Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 238000013473 artificial intelligence Methods 0.000 description 7
- 238000004043 dyeing Methods 0.000 description 7
- 230000003190 augmentative effect Effects 0.000 description 6
- 210000000056 organ Anatomy 0.000 description 6
- 238000007781 pre-processing Methods 0.000 description 6
- 206010006187 Breast cancer Diseases 0.000 description 5
- 208000026310 Breast neoplasm Diseases 0.000 description 5
- 239000000090 biomarker Substances 0.000 description 5
- 230000008859 change Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 239000000427 antigen Substances 0.000 description 4
- 102000036639 antigens Human genes 0.000 description 4
- 108091007433 antigens Proteins 0.000 description 4
- 210000000481 breast Anatomy 0.000 description 4
- 210000000805 cytoplasm Anatomy 0.000 description 4
- 102000015694 estrogen receptors Human genes 0.000 description 4
- 108010038795 estrogen receptors Proteins 0.000 description 4
- 210000004072 lung Anatomy 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 210000004940 nucleus Anatomy 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 102000003998 progesterone receptors Human genes 0.000 description 4
- 108090000468 progesterone receptors Proteins 0.000 description 4
- 102000004169 proteins and genes Human genes 0.000 description 4
- 108090000623 proteins and genes Proteins 0.000 description 4
- 201000011510 cancer Diseases 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 239000000975 dye Substances 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 210000004698 lymphocyte Anatomy 0.000 description 3
- 230000001131 transforming effect Effects 0.000 description 3
- HSTOKWSFWGCZMH-UHFFFAOYSA-N 3,3'-diaminobenzidine Chemical compound C1=C(N)C(N)=CC=C1C1=CC=C(N)C(N)=C1 HSTOKWSFWGCZMH-UHFFFAOYSA-N 0.000 description 2
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 2
- 238000003782 apoptosis assay Methods 0.000 description 2
- 238000013529 biological neural network Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000013434 data augmentation Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 210000002950 fibroblast Anatomy 0.000 description 2
- 210000002865 immune cell Anatomy 0.000 description 2
- 239000003446 ligand Substances 0.000 description 2
- 201000005202 lung cancer Diseases 0.000 description 2
- 208000020816 lung neoplasm Diseases 0.000 description 2
- 210000002540 macrophage Anatomy 0.000 description 2
- 239000002547 new drug Substances 0.000 description 2
- 230000005522 programmed cell death Effects 0.000 description 2
- 210000002536 stromal cell Anatomy 0.000 description 2
- 210000000225 synapse Anatomy 0.000 description 2
- 230000000946 synaptic effect Effects 0.000 description 2
- 102000004190 Enzymes Human genes 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 210000001789 adipocyte Anatomy 0.000 description 1
- 238000009739 binding Methods 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 210000003850 cellular structure Anatomy 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 210000002808 connective tissue Anatomy 0.000 description 1
- 238000011109 contamination Methods 0.000 description 1
- 230000001086 cytosolic effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000007850 fluorescent dye Substances 0.000 description 1
- 238000013388 immunohistochemistry analysis Methods 0.000 description 1
- 210000004969 inflammatory cell Anatomy 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 230000011278 mitosis Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 208000002154 non-small cell lung carcinoma Diseases 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001151 other effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 208000029729 tumor suppressor gene on chromosome 11 Diseases 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/69—Microscopic objects, e.g. biological cells or cellular parts
- G06V20/698—Matching; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/69—Microscopic objects, e.g. biological cells or cellular parts
- G06V20/695—Preprocessing, e.g. image segmentation
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
- G01N33/53—Immunoassay; Biospecific binding assay; Materials therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
- G06V10/235—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on user input or interaction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/98—Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
- G06V10/993—Evaluation of the quality of the acquired pattern
Definitions
- the present disclosure relates to a pathology image analysis method and system, and more specifically, to a method and system for analyzing various types of pathology images using a machine learning model.
- immunohistochemistry is a method in which an antibody that reacts to a specific antigen (Target antigen) is bound to an enzyme or a fluorescent dye as a secondary antibody, and then the specific tissue is stained.
- target antigen a specific antigen
- the antibody binds to cells expressing a specific antigen corresponding to the target, and this binding reaction activates the secondary antibody to cause a staining reaction.
- a pathologist may check the stained cells under a microscope and perform an evaluation of the cells. For example, a pathologist can derive meaningful information within a tissue by evaluating and quantifying the amount of staining expression.
- biomarkers associated with many new drugs are being developed.
- clinical data related to biomarkers already used in clinical practice eg, clinical data related to PD-L1 IHC, HER2 IHC, etc.
- have a lot of accumulated data so it is difficult to create learning data necessary for learning AI algorithms. Relatively easy.
- the artificial intelligence model may not be trained properly or may be biased toward a specific training data set.
- the present disclosure provides a pathology image analysis method capable of accurately analyzing various types of pathology images, a computer program stored in a recording medium, and an apparatus (system).
- the present disclosure may be implemented in a variety of ways, including a method, an apparatus (system) and/or a computer program stored in a computer readable storage medium, and a computer readable storage medium in which the computer program is stored.
- a pathology image analysis method performed by at least one processor includes acquiring a pathology image, inputting the obtained pathology image into a machine learning model, and determining a pathology image from the machine learning model. Acquiring an analysis result and outputting the obtained analysis result, wherein the machine learning model is applied to a first pathology data set associated with a first domain and a second pathology data set associated with a second domain different from the first domain. It may be a model learned using a training data set generated based thereon.
- the pathology image analysis method may include, prior to the acquiring of the pathology image, acquiring a first pathology data set including a pathology image of a first type and a second pathology data set including a pathology image of a second type. , generating a training data set based on the first pathology data set and the second pathology data set, and learning a machine learning model using the generated training data set.
- generating a data set for learning may include associating an item associated with a pathological image of a first type with an item associated with a pathological image of a second type based on at least one of a staining expression level or a region of interest, and an associated item. It may include generating a data set for learning that includes.
- the step of associating the items may include extracting a first item associated with a tumor tissue region included in the first type of pathology image and a second item associated with a non-tumor tissue region included in the first type of pathology image; Extracting a third item associated with a tumor tissue region included in a pathology image of a second type and a fourth item associated with a non-tumor tissue region included in a pathology image of a second type, and the extracted first item and the extracted first item associating the three items, and associating the extracted second item with the extracted fourth item.
- the step of associating the items may include a fifth item associated with the first expression range and a sixth item associated with the second expression range, among items representing staining expression intensities of pixels included in the pathology image of the first type. extracting, identifying a seventh item associated with the first expression range and an eighth item associated with the second expression range among items representing the staining expression intensity of each of the pixels included in the second pathology data set; and associating the fifth item with the seventh item, and associating the sixth item with the eighth item.
- the step of associating the items may include associating at least one object class representing the type of cells included in the pathology image of the first type with at least one object class representing the type of cells included in the pathology image of the second type. or associating at least one object class representing the staining expression intensity of cells included in the first type of pathology image with at least one object class representing the staining expression intensity of cells included in the second type of pathology image steps may be included.
- generating a data set for training based on the first pathology data set and the second pathology data set may include extracting patches from the first pathology data set and the second pathology data set, and the training data set including the patches.
- the step of learning the machine learning model using the generated training data set includes the step of generating a number of first types corresponding to the first sampling number among labeled patches extracted from the first pathology data set. Fetching image patches of the second type, fetching a number of second type image patches corresponding to the second sampling number among labeled patches extracted from a second pathology data set, and first type image patches. and the second type of image patches, generating a batch and training a machine learning model using the batch.
- generating a data set for training based on the first pathology data set and the second pathology data set may include extracting first type of image patches from the first pathology data set, and extracting a second type of image patches from the second pathology data set. It may include extracting image patches of a type and copying a predetermined number of image patches of a first type and including them in a training data set.
- the step of training the machine learning model may include adjusting the size of at least one of the first type of pathology image and the second type of pathology image, and training data including at least one resized pathology image. It may include the step of learning a machine learning model using.
- the step of training the machine learning model may include removing pixels corresponding to a predetermined range from among pixels included in at least one of the pathology image of the first type and the pathology image of the second type.
- the step of training the machine learning model may include inverting at least one of the first type of pathology image or the second type of pathology image from side to side or up and down, and machine learning using training data including the inverted pathology image. It may include training the model.
- the step of training the machine learning model may include removing or modifying pixels in a predetermined range from among pixels included in at least one of the pathology image of the first type or the pathology image of the second type, and pixels in the predetermined range. and training a machine learning model using learning data including pathology images from which the pathologies have been removed or modified.
- the step of training the machine learning model includes the step of converting the color of pixels included in at least one of the pathology image of the first type or the pathology image of the second type, and the at least one pathology image in which the color of the pixels is converted.
- a step of learning a machine learning model using the included training data may be included.
- the step of learning the machine learning model may include determining data for target learning from among the data sets for learning, inputting the data for target learning to the machine learning model, and obtaining an output value from the machine learning model, a first pathology data set. or obtaining a reference value for target learning data by using annotation information included in at least one of the second pathology data sets, and feeding back a loss value between an output value and the obtained reference value to a machine learning model. can do.
- the machine learning model includes a plurality of analysis models that output different types of analysis results, and obtaining the analysis results includes identifying a staining color and a location where staining is expressed from the acquired pathology image; Based on the identified staining color and the expressed position, determining one of a plurality of analysis models as a target analysis model and inputting the pathology image into the determined target analysis model to analyze the staining intensity at the expressed position. It may include obtaining from the target analysis model.
- the machine learning model includes a plurality of analysis models that output different types of analysis results
- the obtaining of the analysis results includes target analysis of one of the plurality of analysis models based on user input information. It may include determining as a model, inputting the pathology image into the target analysis model, and obtaining an analysis result of the pathology image from the target analysis model.
- the machine learning model outputs an analysis result including at least one of a cell type or an evaluation index of the cell, and the evaluation index of the cell is a positive or negative result value for the cell, and a staining expression level for the cell. , a value representing the degree of staining expression in cells, or at least one of statistical information on staining expression in cells.
- a computer-readable non-transitory recording medium on which instructions for executing the pathology image analysis method described above on a computer may be recorded.
- An information processing system includes a memory and at least one processor connected to the memory and configured to execute at least one computer-readable program included in the memory, and the at least one program includes: and instructions for obtaining an image, inputting the obtained pathology image to a machine learning model, obtaining an analysis result of the pathology image from the machine learning model, and outputting the obtained analysis result, wherein the machine learning model includes a first It may be a model trained using a training data set generated based on a first pathology data set associated with a domain and a second pathology data set associated with a second domain different from the first domain.
- a machine learning model is learned based on a training data set composed of heterogeneous domains, and thus the machine learning model can accurately analyze even various types of pathology images not used for learning.
- sampling is performed on heterogeneous pathology data sets so that a machine learning model can be learned in a balanced way without being biased toward a specific type of pathology data set.
- items included in the heterogeneous pathology data set may be associated with each other, and a data set for learning may be generated based on the heterogeneous pathology data set in which the items are associated.
- a machine learning model is trained using such a training data set, the machine learning model can accurately analyze a new carcinoma or a pathology image containing cells expressed according to a new IHC staining method without performing separate learning.
- learning data in which pathological images are intentionally modified is input to a machine learning model and the machine learning model is learned, thereby building a machine learning model that is robust even in unintended situations such as image distortion or change.
- analysis results including various types of output values may be output through a machine learning model. Accordingly, the user may proceed with a follow-up procedure, such as medical treatment, by using a desired type of output value among output values included in the analysis result.
- 1 and 2 are diagrams illustrating different types of pathology images.
- FIG. 3 is a diagram illustrating a pathology image including an object class.
- FIG. 4 is a diagram illustrating a pathology image in which a tumor region and a precancerous region are segmented.
- FIG. 5 is a diagram illustrating an environment to which a system for analyzing a pathology image according to an embodiment of the present disclosure is applied.
- FIG. 6 is a schematic diagram illustrating that a pathology image analysis model is learned, according to an embodiment of the present disclosure.
- FIG. 7 is a flowchart illustrating a method of learning a pathology image analysis model according to an embodiment of the present disclosure.
- FIG. 8 is a flowchart illustrating a method of generating a training data set by pre-processing a heterogeneous pathology data set, according to an embodiment of the present disclosure.
- 9 is a diagram illustrating an example in which patches are sampled to create a batch.
- FIG. 10 is a diagram illustrating another example in which patches are sampled to create a batch.
- FIG. 11 is a diagram illustrating output of an analysis result of a pathology image through a pathology image analysis model according to an embodiment of the present disclosure.
- FIG. 12 is a diagram illustrating an artificial neural network model included in a pathology image analysis model according to an embodiment of the present disclosure.
- FIG. 13 is a flowchart illustrating a method of outputting an analysis result of a pathology image using a pathology image analysis model, according to an embodiment of the present disclosure.
- FIG. 14 is a diagram illustrating a pathology image analysis model according to another embodiment of the present disclosure.
- 15 is a diagram illustrating output of an analysis result of a pathology image through an analysis model called based on characteristics of the pathology image, according to another embodiment of the present disclosure.
- 16 is a diagram illustrating output of an analysis result of a pathology image through an analysis model called based on user input information, according to another embodiment of the present disclosure.
- 17 to 20 are diagrams illustrating various types of analysis results output from a pathology image analysis model.
- 21 is an exemplary configuration diagram of a system for analyzing a pathology image according to an embodiment of the present disclosure.
- a 'system' may include at least one of a server device and a cloud device, but is not limited thereto.
- a system may consist of one or more server devices.
- a system may consist of one or more cloud devices.
- the system may be operated by configuring a server device and a cloud device together.
- 'comprises' and/or 'comprising' means that the mentioned components, steps, operations and/or elements refer to one or more other components, steps, operations. and/or the presence or addition of elements is not excluded.
- 'IHC (Immunohistochemistry) staining' is a principle of reacting an antibody of interest on a tissue in order to observe the presence or absence of a protein (or antigen) present in the nucleus, cytoplasm, or cell membrane of a tissue or cell sample with an optical microscope. It can refer to a staining method using . Since the antigen-antibody reaction cannot be observed under a microscope as it is, a method of attaching a biomarker and then developing the color of the marker is used. Various coloring agents such as 3,3'-diaminobenzidine (DAB) with a band may be used.
- DAB 3,3'-diaminobenzidine
- a 'pathology image' may refer to an image obtained by taking a pathology slide that has been fixed and stained through a series of chemical treatment processes in order to observe a tissue removed from a human body under a microscope.
- the pathology image may refer to a whole slide image (WSI) including a high-resolution image of a pathology slide or a part of a high-resolution whole slide image.
- a portion of the image of the entire slide may refer to a region divided in units of a predetermined size from the entire image of the pathology slide.
- the pathology image may refer to a digital image obtained by scanning a pathology slide using a digital scanner, and may include information about cells, tissues, and/or structures in the human body. there is.
- the pathology image may include one or more patches, and histological information may be applied to the one or more patches through an annotation (eg, tagging).
- a 'pathology image' may refer to 'at least a partial region included in a pathology image'.
- a 'patch' may refer to a partial region in a pathology image.
- the patch may include a region corresponding to a semantic object extracted by performing segmentation on the pathology image.
- a patch may refer to a combination of pixels associated with histological information generated by analyzing a pathology image.
- the patch may include an object associated with a tumor tissue, an object associated with a precancerous tissue, an object associated with a tissue surrounding a tumor, and an object associated with other tissues.
- 'histological components' may include characteristics or information about cells, tissues, and/or structures in a human body included in a pathology image.
- the characteristics of the cell may include cytologic features such as a nucleus and a cell membrane.
- the histological information may refer to histological information about the patch inferred through a machine learning model or input by a pathologist.
- 'pathology data' may refer to a pathology image including annotation information.
- a pathology data set including a plurality of pathology data may be referred to as a 'pathology data set'. Domains of pathology data may be considered when generating a pathology data set.
- a pathology data set may be configured by collecting only pathology images having matching domains.
- 'annotation information' may be information input by an expert such as a pathologist or a pathologist in association with a pathology image.
- Annotation information may include histological information on the pathology image.
- the annotation information may include at least one item related to the pathology image.
- an 'item' associated with the pathology image is data representing detailed information about the pathology image, and includes a first data associated with a region of an object where staining is expressed (eg, a pixel range included in the region, a position of the pixel, etc.)
- An item which may include a second item associated with a class of object.
- an object may be associated with a pixel range as a significant cell region (eg, an abnormal region), and an object class may include a cell type and an evaluation index.
- the cell type may be a tumor cell, a lymphocyte, and the like
- the evaluation index is an index related to staining expression intensity and may include positivity, expression level, expression value, expression statistical information, and the like.
- the expression level may be a grade of cells based on staining intensity among a plurality of predetermined grades (eg, 0, t+1, t+2, and t+3), and the expression value is within a predetermined numerical range (eg, 0 to 1). ) can be the expression value of cells based on the staining intensity.
- expression statistical information is statistics on cell expression intensity and can be output when a plurality of pathology images are continuously analyzed. For example, by analyzing 10 pathology images, a ratio of PD-L1 positive tumor cells to all tumor cells in each pathology image is calculated, and a distribution of the calculated ratio value may be included in expression statistical information.
- expression statistical information may include statistical information about specific cells within a single pathology image. For example, one pathology image may be analyzed, and the expression statistical information may include a ratio of cells classified into a specific class out of total cells expressing dye in the pathology image.
- 'heterogeneous' may refer to pathology data or pathology images having different domains.
- matching the 'domains' can be understood as having the same type of pathology images and matching item types related to the pathological images, and different 'domains' meaning pathology images. It can be understood that the types of images are different, or that the types of items associated with pathology images are different.
- the same type of pathology image has the same staining method. For example, pathology images of lung cancer tissues stained using programmed cell death ligand 1 (PD-L1) IHC staining may constitute a first pathology data set associated with the first domain.
- PD-L1 programmed cell death ligand 1
- pathology images of breast cancer tissue stained using human epidermal growth factor receptor 2 (HER2) staining may constitute a second pathology data set associated with the second domain.
- the pathology images included in the first pathology data set and the pathology images included in the second pathology data set may be referred to as heterogeneous. That is, if pathology data having the same domain can be referred to as data of the same type, pathology data having different domains may be referred to as heterogeneous data.
- each of a plurality of A' may refer to each of all components included in a plurality of A's, or each of some components included in a plurality of A's.
- 'instruction' as one or more instructions grouped on the basis of function, may refer to a component of a computer program and executed by a processor.
- FIGS. 1 to 4 various examples of pathology images that can be used for learning of the present disclosure will be described.
- a first pathology image 110 is a slide image stained with programmed cell death ligand 1 (PD-L1) in non-small cell lung cancer using 22C3 IHC staining.
- the second pathological image 120 of FIG. 2 is a slide image in which HER2 (human epidermal growth factor receptor 2) IHC staining in breast cancer is stained.
- HER2 human epidermal growth factor receptor 2
- a third pathology image 210 shown in FIG. 2 is a slide image in which breast cancer is stained using ER (estrogen receptor) IHC staining
- a fourth pathology image 220 is a slide image of breast cancer using PR (progesterone receptor) IHC staining.
- This is a slide image stained with .
- IHC staining patterns are similar in that the nuclei stained by ER IHC staining and PR IHC staining both show the same color (eg, brown).
- a heterogeneous pathology image in which the same or similar color (eg, brown) is expressed may be used to generate training data to be described later.
- a training data set may be generated based on heterogeneous pathology data including various types of pathology images as shown in FIGS. 1 and 2 , and a machine learning model may be learned using the training data set.
- the training data set may be a plurality of training data sets.
- pathological images expressed in various colors (eg, red, pink, blue, etc.) other than a specific color may be used to generate training data.
- Each pathology image may include annotation information input by a pathologist.
- the annotation information may include at least one item of an object (eg, cell, tissue, structure, etc.) on the pathology image.
- the item may include the type of object in which staining is expressed and the class of the object input by the pathologist.
- labeling information may be used interchangeably with annotation information.
- FIG. 3 is a diagram illustrating a pathology image 310 including an object class.
- the pathology image 310 illustrated in FIG. 3 includes an object expressed in a specific color and an object class.
- a region associated with an object may be identified based on pixels expressed in a specific color.
- the object class may be determined based on the degree to which cells express a specific color, and each class of objects may be determined according to the saturation of a specific color. Conventionally, this determination could be made by a pathologist. That is, after confirming the pathology image, the pathology specialist inputs each cell class according to the degree of staining expression, and the cell class and the corresponding cell area (ie, pixel range) set in this way can be included in the pathology image as annotation information. there is. In FIG. 3, it is illustrated that the intensity of staining expression increased from t0 to t3+.
- an object class and an object may be automatically determined using a pre-built image analysis algorithm (eg, a machine learning model for image analysis).
- a pre-built image analysis algorithm eg, a machine learning model for image analysis
- FIG. 4 is a diagram illustrating a pathology image 410 in which a tumor region and a precancerous region are segmented.
- the tumor area (ca) is visualized with a first color
- the precancerous area (cis) is second. It can be visualized in 2 colors.
- the classification of these areas could be determined by a pathologist. For example, a pathologist may identify a tumor region (ca) and a precancerous region (cis) based on morphological characteristics of cells and tissues expressed in the pathology image 410 .
- such a segmentation task may be automatically performed using a pre-built image analysis algorithm (eg, a machine learning model for image analysis). For example, through an image analysis algorithm, the degree of staining expression of cells is extracted from the pathology image, and each region is automatically segmented based on the intensity of staining expression, and then visualized in different colors.
- a pre-built image analysis algorithm eg, a machine learning model for image analysis.
- a region around a tumor may be visualized in a third color
- other tissues may be visualized in a fourth color.
- Annotation information associated with the visualization task may be included in the pathology image. That is, annotation information including information on a first item related to the visualized object region and information on a second item related to the class of the object may be included in the pathology image.
- pathology images may be different, and annotation information items included in the pathology images may be different.
- the reason why the pathological images are different can be understood that staining methods for the pathological images may be different, and body parts from which cells are harvested may be different.
- the first pathology image and the second pathology image may be images of different types.
- the third pathology image is an image obtained from breast tissue and the fourth pathology image is a pathology image obtained from lung tissue
- the third pathology image and the fourth pathology image may be images of different types.
- annotation information may be understood as different types of items included in the annotation information.
- the first pathology image includes a third item related to benignity as an object class
- the second pathology image includes a fourth item representing a grade of any one of t0, t1+, t2+, and t3+ as an object class.
- the first pathology image and the second pathology image are heterogeneous.
- pathology images that are different from each other among the types of pathology images and items included in the annotation information may be determined to be heterogeneous pathology images.
- Pathology images of the same type may be gathered to form a pathology image set of the same domain.
- FIGS. 5 to 21 various embodiments of the present disclosure will be described with reference to FIGS. 5 to 21 .
- the analysis system 510 may communicate with each of the research information system 520 , the scanner 530 , and the user terminal 540 through a network 550 .
- the network 550 includes a mobile communication network and a wired communication network, and since it corresponds to well-known and used technology in the technical field of the present disclosure, a detailed description thereof will be omitted.
- the analysis system 510 may communicate with an image management system (not shown) including a storage for storing pathology images and a storage for storing analysis results.
- the scanner 530 may acquire a digitized pathology image from a tissue sample slide generated using a patient's tissue sample. For example, the scanner 530 may generate and store a pathology image, which is a digital image obtained by scanning a pathology slide. The scanner 530 may transmit the obtained pathology image from the analysis system 510 .
- the user terminal 540 may receive an analysis result of the pathology image from the analysis system 510 .
- the user terminal 540 may be a computing device located in a medical facility such as a hospital and used by a medical staff.
- the user terminal 540 may be a computing device used by a general user such as a patient.
- the research information system 520 may be a computing system including a server and a database used in hospitals, universities, research facilities, and the like.
- the research information system 520 may provide a pathology image set, which is a set of raw data used for learning, to the analysis system 510 .
- the research information system 520 may transmit a heterogeneous pathology data set corresponding to a single domain to the analysis system 510 .
- the research information system 520 may provide heterogeneous pathology data sets to the analysis system 510 . That is, the research information system 520 analyzes two or more of the first pathology data set corresponding to the first domain, the second pathology data set corresponding to the second domain, or the third pathology data set corresponding to the third domain. (510).
- the analysis system 510 may include a data store (eg, a database) for storing a plurality of pathology data sets used for learning, and may include a machine learning model for analyzing pathology images.
- Analysis system 510 may include at least one processor and memory.
- the analysis system 510 may generate a training data set based on a heterogeneous pathology data set, and may train a machine learning model using the training data set.
- the analysis system 510 may perform analysis on the pathology image not including the annotation information using the machine learning model. That is, the analysis system 510 may perform analysis on the pathology image using a machine learning model without requiring intervention by a pathology expert.
- the analysis system 510 may analyze the pathology image received from the scanner 530 and provide the analyzed result to the client.
- the client may be a doctor/researcher/patient using the user terminal 540 .
- FIG. 6 is a schematic diagram illustrating a pathology image analysis model 630 being trained, according to an embodiment of the present disclosure.
- machine learning model 630 is referred to as pathology image analysis model 630 .
- the machine learning model 630 and the pathology image analysis model 630 will be used interchangeably.
- a plurality of heterogeneous pathology data sets 610_1 to 610_n may be pre-processed to generate a training data set 620 . That is, a plurality of heterogeneous pathology data sets 610_1 to 610_n corresponding to different domains may be preprocessed to generate a training data set 620 including a plurality of training data. While the heterogeneous pathology data sets 610_1 to 610_n are pre-processed, the number of samplings extracted from each pathology data set 610_1 to 610_n may be determined. Data sampling will be described in detail with reference to FIG. 8 .
- an intentionally distorted pathology image is input to the pathology image analysis model 630 so that the analysis can be easily performed on the pathology image including the artifact.
- a robust machine learning model that can output analysis results can also be built.
- the pathology image including the artifact may be an image in which a partial region is distorted, transformed, or removed.
- various embodiments of generating training data will be described in detail with reference to FIG. 8 .
- an item related to a pathology image included in the pathology data sets 610_1 to 610_n may be associated with an item related to a pathology image included in another pathology data set based on at least one of an object class or a region of interest.
- An item related to a pathology image may mean a criterion for classifying a type or class of a cell, tissue, or structure appearing on a pathology image. For example, a first pathology image of a first type is included in the first pathology data set 610_1 , a second pathology image of a second type is included in the second pathology data set 610_2 , and associated with the first pathology image.
- the first item and the second item associated with the second pathological image correspond to similar staining expression levels or similar regions of interest
- the region of interest may be a region associated with cells.
- the region of interest may be a region associated with at least one of tumor cells, inflammatory cells, or other cells.
- the region of interest may be a region associated with at least one of tumor tissue, precancerous tissue, tissue surrounding the tumor, or other tissue.
- a training data set 620 including related items may be created. For example, when a first item associated with a first pathology image and a second item associated with a second pathology image are associated with each other, the first learning data is obtained based on the first and second items associated with each other and the first pathology image. It may be generated and included in the training data set 620 . Additionally, second learning data may be generated and included in the training data set 620 based on the first item, the second item, and the second pathology image associated with each other. Accordingly, the training data set 620 may further include, in addition to the pathology image, an item of a heterogeneous pathology image associated with an item included in the pathology image.
- At least one batch including part or all of the training data set 620 may be generated, and the pathology image analysis model 630 may be trained.
- a loss value between an output value (ie, an analysis result) 640 output from the pathology image analysis model 630 and a reference value 650 may be calculated during a learning process.
- the reference value 650 may be a kind of correct answer value obtained from annotation information of a pathology image.
- the reference value 650 may be obtained from an evaluation index included in annotation information.
- a loss value is fed back to the pathology image analysis model 630 so that a weight of at least one node included in the pathology image analysis model 630 may be adjusted.
- the node may be a node included in the artificial neural network.
- pathology image analysis model 630 when training data is input to the pathology image analysis model 630, related items included in the training data are grouped as similar item groups, and at least one having a weight in the pathology image analysis model 630 It can act as a single node.
- pathology image analysis model 630 is trained by inputting the related items together, analysis can be performed on various types of pathology images and various types of result values can be output.
- FIGS. 7 and 8 a method for learning a pathology image analysis model will be described in detail with reference to FIGS. 7 and 8 .
- the method shown in FIGS. 7 and 8 is only one embodiment for achieving the object of the present disclosure, and it goes without saying that some steps may be added or deleted as needed.
- the methods shown in FIGS. 7 and 8 may be performed by at least one processor included in the analysis system shown in FIG. 5 .
- each step shown in FIGS. 7 and 8 is performed by a processor included in the analysis system shown in FIG. 5 .
- the heterogeneous pathology data set includes a plurality of heterogeneous pathology data sets of different types.
- the Nth heterogeneous pathology data set (where N is a natural number) and the N+1th heterogeneous pathology data set are referred to as data sets corresponding to different domains.
- the processor may obtain a heterogeneous pathology data set (S710).
- the processor may obtain a heterogeneous pathology data set received and stored from at least one of the analysis system 510 of FIG. 5 , the research information system 520 , or the scanner 530 from the storage.
- the processor may pre-process each of the acquired heterogeneous pathology data sets to generate a training data set (S720).
- Pathology images including annotation information may be included in the individual training data.
- the processor may associate an item associated with the first pathology image with an item associated with the second pathology image, and include the associated item in each pathology data set.
- heterogeneous individual pathology data included in the heterogeneous pathology data set may be merged with each other.
- the processor may determine the number of training data sets so that the size of the training data set corresponds to a predetermined batch size, and generate the training data set to have the number of data corresponding to the number. there is.
- the processor may perform at least one of sampling or data augmentation on a heterogeneous pathology data set. A more detailed description of the data pre-processing will be described later with reference to FIGS. 8 to 10 .
- the processor may determine target learning data from among data included in the training data set (S730). Subsequently, the processor may perform learning on the pathology image analysis model using target learning data (S740). In one embodiment, the processor may determine a reference value from annotation information included in data for target learning. For example, the processor may extract an object class from annotation information included in training data and determine a reference value based on an evaluation index included in the extracted object class. In addition, the processor may determine a reference value based on a region (ie, pixel range) of at least one object segmented from annotation information included in training data and a type (ie, cell type) of each object.
- a region ie, pixel range
- the processor inputs data for target learning into the pathology image analysis model, calculates a loss value between an output value (ie, analysis result) output from the pathology image analysis model and a reference value, and then converts the calculated loss value to the pathology image.
- At least one weight included in the pathology image analysis model may be adjusted by providing feedback to the analysis model.
- the output value may include at least one of an evaluation index or an object region and an object type (ie, cell type).
- the loss value may be calculated by arithmetically calculating the difference between the evaluation index and the reference value, or may be calculated using an evaluation function for evaluating a pixel range matching rate between an object included in the output value and an object included in the reference value. there is.
- the processor may determine whether all data included in the training data set are determined as target training data (S750). Next, if there is data included in the training data set that has not yet been determined as target learning data, the processor determines one of the learning data that has not yet been determined as target learning data as target learning data, A pathology image analysis model may be trained using the data.
- the processor may end learning at the epoch of this cycle.
- the same training data set may be used again to train the pathology image analysis model again, or a new training data set may be created and the pathology image analysis model may be trained again.
- the pathology image analysis model when the pathology image analysis model is repeatedly trained using the data included in the training data set, the weight of each node included in the pathology image analysis model may converge to an optimal value. Accordingly, the pathology image analysis model can output more accurate analysis results.
- FIG. 8 is a flowchart illustrating a method of generating a training data set by pre-processing a heterogeneous pathology data set, according to an embodiment of the present disclosure.
- the first pathology image and the second pathology image may differ in at least one of cell type, staining intensity, or tissue region required for IHC evaluation.
- most pathology images contain information labeling tumor cells in common, depending on the type of IHC, immune cells such as lymphocytes and macrophages in addition to tumor cells, or fibroblasts constituting the matrix around the tumor Stromal cells such as fibroblasts or adipocytes may also be stained, and labeling information for these cells may be included in some types of heterogeneous images.
- a training data set including various types of pathology images as shown in FIGS. 1 to 4 may be generated, and a machine learning model may be learned based on the training data set.
- the criterion for evaluating the staining intensity may also differ depending on the cell type (eg, carcinoma) and the type of IHC constituting the pathological image.
- the level of HER2 expression in tumor cells was 3+ (strong), 2+ (moderate), 1+ (weak), 0 (no expression) ), but classified into four stages
- the level of PD-L1 expression in tumor cells is divided into positive or negative.
- a separate algorithm e.g., a machine learning model
- a pathology image labeled with a small number of detailed items can be automatically re-labeled with a larger number of detailed items.
- An algorithm e.g, a machine learning model
- a first pathology image labeled with four detailed items may be output through the algorithm.
- the processor may merge heterogeneous pathology data sets corresponding to heterogeneous domains by processing different items included in heterogeneous pathology data sets in association with each other (S810).
- merging may mean that heterogeneous pathology data sets are associated with each other as a common item.
- a table of items that can be correlated with each other may be stored in advance in the analysis system, and the processor may refer to the table to extract items that are correlated with each other from the heterogeneous pathology data set and process the extracted items in association with each other.
- Tables 1 and 2 below are diagrams illustrating mapping tables referred to when relating items. Tables 1 and 2 map items by taking the first pathology data set associated with PD-L1 and the second pathology data set associated with HER2 as an example. That is, a mapping table for associating items associated with a first type pathology image using PD-L1 IHC staining with items associated with a second type pathology image using HER2 IHC staining is exemplified.
- Tissue mapping BG Background
- CA Cancer Area
- PD-L1 lung BG CA HER2 breast BG, CIS CA
- the first type of pathological image associated with PD-L1 IHC staining and the second type of pathological image associated with HER2 IHC staining are different in staining method and primary site (lung or breast), but have different characteristics. They have one thing in common: they are expressed in color. However, objects associated with carcinoma may be found in common in the first pathology image and the second pathology image, but the types of cells expressed, staining intensity, and tissue regions required for IHC evaluation may be different. In the present disclosure, among items related to heterogeneous pathology images, items having common properties may be associated with each other. Table 1 illustrates that items associated with heterogeneous pathology images may be associated with each other based on the tissue of interest.
- the item representing tumor tissue is associated with the second type of pathological image obtained by HER2 IHC staining.
- the items may be associated with an item (CA) representing a tumor tissue.
- the object-related item (BG) representing a non-tumor tissue is an item associated with the second type of pathology image obtained by HER2 IHC staining.
- it may be related to voltaic tissue (CIS) and background tissue (BG).
- Table 2 illustrates that items associated with heterogeneous pathology images may be associated with each other based on the object class.
- lymphocytes LP+, LP-
- macrophages MP+, MP-
- other cells excluding tumors
- OT lymphocytes
- BG tumor
- CIS precancerous tissue
- a negative (TC-) item related to the first expression range is a negative (TC-) item of each of the pixels included in the second type of pathology image.
- the items representing the staining expression intensity it may be associated with the TC0 item related to the first expression range.
- the positive (TC+) item related to the second expression range is in each of the pixels included in the second type of pathology image.
- items representing staining expression intensity it may be associated with TC1, TC2, and TC3 items related to the second expression range.
- the processor may associate an item associated with the first pathology image with an item associated with the second pathology image, and cause the associated item to be included in each pathology data set. Accordingly, heterogeneous individual pathology data included in the heterogeneous pathology data set may be merged with each other.
- a BG item associated with a first pathology image, a BG item associated with a second image, and a CIS item may be associated.
- the OT item associated with the second pathology image and the LP+, LP-, MP+, MP-, and OT items associated with the first pathology image may be associated with each other.
- the TC- item associated with the first pathology image and the TC0 item associated with the second pathology image may be associated, and the TC+ item associated with the first pathology image and the TC1, TC2, and TC3 items associated with the second pathological image may be associated.
- the related items are included in each of the first pathology data set and the second pathology data set, and accordingly, the first pathology data set associated with the first domain and the second pathology data set associated with the second domain may be merged.
- the processor may extract labeled patches from each pathology data set and store the extracted patches in a patch database (S820).
- the labeled patch may refer to an object whose object class is labeled, and may be part or all of the pathology image.
- the processor may extract the same predetermined number of patches from each pathology data set.
- the processor may extract different numbers of labeled patches from each pathology data set.
- the processor may extract a first number or a first ratio of labeled patches from the first pathology data set, and extract a second number or a second ratio of labeled patches from the second pathology data set.
- the processor may store the labeled patches in a patch database.
- the labeled patch may include an item (eg, object type, class, etc.) and an item of heterogeneous pathology data associated with the item.
- the processor may copy a predetermined number of patches of a specific type and store the copied patches in a patch database.
- the predetermined number of copies of patches of a specific type may be determined based on the patch type of the largest number of patches.
- the number of patches to be copied may be determined based on a difference between the largest number of patch types and the number of patches of a specific type among patch types stored in the patch database.
- the patch type may correspond to the pathology image type.
- patches extracted from the pathology image may also be of the first type. Examples of patches stored in the patch list or patch database will be described with reference to FIGS. 9 and 10 .
- the processor may augment the labeled patch by artificially transforming the image included in the patch database, such as distortion, deletion, or contamination (S830).
- the processor may extract at least one patch from patches included in the patch database and adjust the size of the extracted patch. For example, the processor may change the resolution of the patch size to a resolution higher or lower than the original resolution. As another example, the processor may change the size of the patch by removing pixels located outside the patch.
- the processor may extract at least one patch from patches included in the patch database and remove pixels corresponding to a predetermined range from among pixels included in the extracted patch. Also, the processor may enlarge the size of the patch from which pixels are removed to the original size of the patch.
- the processor may extract at least one patch from patches included in the patch database, invert the extracted patch horizontally or vertically, and then generate a horizontally or vertically inverted patch.
- the pathology image analysis model is trained using the pathology image including the inverted patch, the pathology image analysis model can be trained to output a meaningful analysis result even for a new type of pathology image.
- the processor may enhance the patch by extracting at least one patch from patches included in the patch database and removing pixels in a predetermined range from among pixels included in the extracted patch.
- the pathology image analysis model may output an accurate analysis result even for a pathology image including artifacts.
- the processor may augment the patch by extracting at least one patch from patches included in the patch database and artificially transforming pixels in a predetermined range among pixels included in the extracted patch. For example, the processor may apply a blurring effect to pixels within a range determined by using a median-filter to blur some pixels, thereby transforming the corresponding pixels. As another example, the processor may modify some pixels by adding noise to pixels within a determined range using a Gaussian-filter.
- a pathology image analysis model is learned using the pathology image including the modified patch, a pathology image analysis model that is robust against scanner errors and staining errors may be constructed.
- the processor may extract at least one patch from patches included in the patch database, convert colors of pixels included in the extracted patch, and then generate a patch including the converted color to augment the patch.
- the processor may change at least one of hue, contrast, brightness, or saturation of the patch using a color jittering technique.
- the processor may change the color of the patch using a grayscale technique.
- a detailed setting value for changing the color of the patch may be determined by the user.
- the processor may generate a training data set using at least one augmented patch and some or all of the patches included in the patch database (S840).
- the processor determines the number of patches of each type used to generate the training data set, extracts the determined number of patches for each type from the patch database, and uses the extracted patches and augmented patches for training.
- a data set can be created.
- the processor may randomly extract a predetermined number of patches from among patches included in the patch database regardless of type, and generate training data using the extracted patches.
- the processor may generate a training data set using all patches included in the patch database. When only some of the patches included in the patch database are extracted to generate a training data set, it may be referred to as a training data set corresponding to a mini-batch size.
- individual training data included in the training data set may include at least one patch.
- individual training data may include patches of different types. Additionally or alternatively, individual training data may include patches of the same type.
- the processor may generate a training pathology image of a predetermined size and randomly arrange at least one patch on the pathology image.
- the processor may insert a randomly selected background image into a region other than the patch in the pathology image for training where the patch is arranged.
- the background image may be extracted from actually scanned pathology images, and the analysis system may store a plurality of background images in advance. In this case, the processor may randomly select one of the plurality of background images and insert it as a background of the pathology image for learning.
- the reason for inserting the background image is to train the pathology image analysis model to perform a segmentation operation from the pathology image.
- a pathology image for training including both a patch and a background image is input to the first analysis model, and the first analysis model is input. Learning of the model may proceed.
- a pathology image including at least one patch and a background image may be generated.
- the individual training data may include a pathology image for training, and may also include at least one labeled patch.
- individual training data may include heterogeneous items related to each other.
- a first pathology data set 910 associated with a first domain includes pathology images 912 and 914 of a first type
- a second pathology data set 920 associated with a second domain includes A second type of pathology images 922 and 924 may be included.
- Each pathology image may include a labeled patch.
- a patch is represented by a rectangle in which '#' and a number are combined.
- the shape of the patch is exemplified as a rectangle of the same size, but this is only for convenience of explanation, and the shape and size of the patch may be different in each actual pathology image.
- patches #1-1 to #1-5 included in the first pathology data set 910 may be extracted and stored in the patch database 930 .
- patches #2-1 to #2-5 included in the second pathology data set 920 may be extracted and stored in the patch database 930 .
- the patch database 930 may store first type patches (#1-1 to #1-5) 932 and second type patches (#2-1 to #2-5) 934. there is.
- the processor determines the sampling number of the first type of patches 932 included in the patch database 930, determines the sampling number of the second type of patches 934, and then determines the number of samples corresponding to the determined number. Tangible patches may be fetched from patch database 930 .
- the number or rate of sampling extracted from each type may be set in advance by the user. For example, the number of samplings for the first type of patch may be 100, and the number of samplings for the second type of patch may be 50.
- the processor may generate a batch 940 having a predetermined size using patches extracted from the patch database 930 .
- the batch 940 thus created may constitute part or all of the training data set.
- the processor may augment the patches in the patch database 930 to generate a batch 940 including the augmented patches #3-1 to #3-6.
- a first pathology data set 1010 associated with a first domain includes pathology images 1012 and 1014 of a first type
- a second pathology data set 1020 associated with a second domain includes Pathology images 1022 and 1024 of the second type may be included.
- patches (#1-1 to #1-5) included in the first pathology data set 1010 may be extracted and stored in the patch database 1030.
- the patches (#2-1 to #2-3) included in the second pathology data set 1020 may be extracted and stored in the patch database 1030.
- patch copying may be performed for at least one of the patches (#2-1 to #2-3) extracted from the second pathology data set 1020.
- patch copying may be performed on the first type of patch or the second type of patch so that the same number of patches of the first type and the number of patches of the second type are stored in the patch database 1030 .
- patch copying may be performed on the first type of patch or the second type of patch so that the number of the first type of patches and the number of the second type of patches are at a predetermined ratio.
- FIG. 10 it is exemplified that patch copying is performed for patches #2-2 and #2-3. Copied patches #2-2 and #2-3 may be included in the patch database 1030. As such patch copying is performed, the number of patches of each type may be balanced and stored in the patch database 1030 .
- the processor randomly brings a predetermined number of patches 1032 and 1034 stored in the patch database 1030, and uses the extracted patches to generate a batch 1040 constituting part or all of the training data set. .
- the processor determines the sampling number of the first type of patches 1032 included in the patch database 1030, determines the sampling number of the second type of patches 1034, and then determines the determined number of samples.
- Each type of patch corresponding to is imported from the patch database 1030, and a batch 1040 may be created using the imported patches.
- the processor may augment patches in patch database 1030 to create a batch 1040 containing augmented patches (#3-1 through #3-6).
- a training data set may be generated using heterogeneous pathology images including annotation information without performing an operation of extracting a patch from a pathology image including annotation information.
- the processor of the analysis system may generate a training data set based on a plurality of first-type pathology images extracted from a first pathology data set and a plurality of second-type pathology images extracted from a second pathology data set. there is.
- the processor of the analysis system may generate a plurality of learning data based on each of the extracted pathology images of the first type, and a plurality of data for learning based on each of the pathology images of the second type included in the second pathology data set. Data for learning can be created.
- the processor may extract a plurality of first type pathology images from the first pathology image set to correspond to the first sampling number, and extract a plurality of first type pathology images from the second pathology image set to correspond to the second sampling number.
- a plurality of pathology images of the second type may be extracted.
- the processor may augment at least one of the first pathology image and the second pathology image to generate a training data set including the augmented image.
- an image enhancement method associated with the above-described patch may be used.
- a data set for additional training is input to the pathology image analysis model, and the pathology image analysis model is additionally trained to improve performance.
- the specific dyeing method may be an existing dyeing method (eg, H&E dyeing method) or a newly developed dyeing method.
- a data set for additional learning including a plurality of pathology images stained with a specific staining method is prepared, and the pathology image analysis model may be additionally learned by using the data set for additional training.
- the weights of nodes included in the pathological image analysis model may be adjusted so as to respond more sensitively to a specific staining method.
- FIG. 11 is a diagram illustrating output of an analysis result of a pathology image through a pathology image analysis model according to an embodiment of the present disclosure.
- various types of pathology images 1110_1 to 1110_3 may be input to the pathology image analysis model 1120 .
- the pathology images 1110_1 to 1110_3 may be of the same type as the type of pathology image used for learning or may be a pathology image obtained through a new biomarker. That is, domains associated with the pathology images 1110_1 to 1110_3 may be the same domain as or different from the domain learned from the pathology image analysis model 1120 .
- the pathology image analysis model 1120 may output an analysis result 1130 for the pathology images 1110_1 to 1110_3.
- the analysis result 1130 may include a class for each object extracted from the pathology images 1110_1 to 1110_3.
- the object class includes a cell type and/or an evaluation index, and the evaluation index may include at least one of positivity, expression level, expression value, or expression statistical information.
- the analysis result 1130 may be a segmentation result of the pathology images 1110_1 to 1110_3. That is, the analysis result 1130 may include at least one tissue and tissue type identified from the pathology images 1110_1 to 1110_3.
- the artificial neural network model 1200 is a statistical learning algorithm implemented based on the structure of a biological neural network or a structure that executes the algorithm in machine learning technology and cognitive science.
- the artificial neural network model 1200 as in a biological neural network, is an artificial neuron nodes that form a network by combining synapses, and repeatedly adjusts synaptic weights to correct correct responses to specific inputs. By learning to reduce the error between the output and the inferred output, it is possible to represent a machine learning model having problem solving ability.
- the artificial neural network model 1200 may include an arbitrary probability model, a neural network model, and the like used in artificial intelligence learning methods such as machine learning and deep learning.
- the above-described pathology image analysis model may be implemented in the form of an artificial neural network model 1200 .
- the artificial neural network model 1200 may receive one or more pathology images including annotation information, and may be trained to detect an object expressed as staining in the received one or more pathology images.
- the artificial neural network model 1200 may perform a classification function (ie, a classifier function) of determining whether each region corresponds to a normal region or an abnormal region with respect to each region in one or more pathological images. can be learned to
- the artificial neural network model 1200 may be trained to perform a segmentation function of labeling pixels included in abnormal regions in one or more pathological images. In this case, the artificial neural network model 1200 may determine an evaluation index for an object associated with the abnormal region and label the object.
- the artificial neural network model 1200 may be implemented as a multilayer perceptron (MLP) composed of multilayer nodes and connections between them.
- the artificial neural network model 1200 may be implemented using one of various artificial neural network model structures including MLP.
- the artificial neural network model 1200 includes an input layer that receives input signals or data from the outside, an output layer that outputs output signals or data corresponding to the input data, and a characteristic that is located between the input layer and the output layer and receives signals from the input layer. It consists of n (where n is a positive integer) hidden layers that are extracted and delivered to the output layer.
- a plurality of output variables corresponding to a plurality of input variables are matched in the input layer and the output layer of the artificial neural network model 1200, respectively, and synaptic values between nodes included in the input layer, hidden layer, and output layer are adjusted, thereby providing a specific input. It can be learned so that the corresponding correct output can be extracted.
- the artificial neural network model 1200 is repeatedly learned based on the data included in the training data set, the difference between the nodes of the artificial neural network model 1200 is reduced so that the error between the output variable calculated based on the input variable and the target output is reduced. Synapse values (or weights) may be adjusted to converge to an optimal value.
- an analysis result corresponding to the pathology expert level is obtained from the pathology image analysis model. can be output through
- FIG. 13 is a flowchart illustrating a method 1300 of outputting an analysis result of a pathology image using a pathology image analysis model, according to an embodiment of the present disclosure.
- the method shown in FIG. 13 is only one embodiment for achieving the object of the present disclosure, and it goes without saying that some steps may be added or deleted as needed.
- the method shown in FIG. 13 may be performed by at least one processor included in the analysis system shown in FIG. 5 .
- each step shown in FIG. 13 is performed by a processor included in the analysis system shown in FIG. 5 .
- the processor may obtain a pathology image (S1310).
- the processor may acquire a pathology image transmitted from a scanner or acquire a pathology image from an external storage, server, or image management system.
- the processor may input the pathology image to the pathology image analysis model and obtain an analysis result of the pathology image output from the pathology image analysis model (S1320).
- the analysis result may include an object identified from the pathology image (ie, a pixel range included in a region corresponding to the object) and an object class.
- the object class may include cell or tissue type and/or evaluation index, and the evaluation index may include at least one of positivity, expression level, expression value, or expression statistical information.
- the processor may output the obtained analysis result (S1330).
- the processor may output the analysis result to a display device such as a monitor.
- the processor may transmit the analysis result to the client's terminal and output it through the client's terminal.
- the processor may output the obtained analysis result in the form of a report.
- the pathology image analysis model may include a plurality of analysis models outputting different types of analysis results.
- the pathology image analysis model 1400 includes a plurality of analysis models 1410 to 1440 that are previously trained to analyze different types of pathology images and output different types of analysis results.
- the pathology image analysis model 1400 includes a first analysis model 1410 outputting a segmentation result for an input pathology image, and a second analysis model 1410 analyzing staining intensity of cell membranes included in the pathology image and outputting an analysis result.
- model 1420 can include a third analysis model 1430 that analyzes the staining intensity of cell nuclei and outputs the analysis result, and a fourth analysis model 1440 that analyzes the morphological characteristics of the cell nucleus and/or cell membrane and outputs the analysis result.
- third analysis model 1430 that analyzes the staining intensity of cell nuclei and outputs the analysis result
- fourth analysis model 1440 that analyzes the morphological characteristics of the cell nucleus and/or cell membrane and outputs the analysis result.
- each of the analysis models 1410 to 1440 may be learned based on a training data set including pathology images having different characteristics.
- the characteristic may include at least one of a dyeing color, a dyeing object type, or a dyeing method.
- the first analysis model 1410 may receive target training data and learn to segment an abnormal region (ie, an object related to a patch) from a pathology image included in the target training data. That is, the first analysis model 1410 may be trained to extract a location region (ie, an object) where staining is expressed in the pathology image.
- a pathology image may be input to the first analysis model 1410 and at least one patch may be output from the first analysis model 1410 .
- a loss value between a region corresponding to the patch output from the first analysis model 1410 and an abnormal region included in the annotation information is calculated, the loss value is fed back to the first analysis model 1410, and the first analysis model (1410) can be learned.
- the second analysis model 1420 may receive target training data including a pathology image in which a cell membrane is stained brown, and may be trained to analyze the staining intensity of a patch included in the pathology image.
- a pathology image in which the brown-stained cell membrane is set as a patch may be input to the second analysis model 1420, and an analysis result of staining intensity for the cell membrane may be output from the second analysis model 1420.
- the loss value is determined by the second analysis model ( 1420, the second analysis model 1420 may be learned.
- the third analysis model 1430 may receive target training data including a pathology image in which cell nuclei are stained blue, and may be trained to analyze the staining intensity of a patch included in the pathology image.
- a pathology image in which blue-stained cell nuclei are set as patches may be input to the third analysis model 1430, and an analysis result of staining intensity for cell nuclei may be output from the third analysis model 1430.
- the loss value is calculated as a third value.
- the third analysis model 1430 may be learned.
- the fourth analysis model 1440 receives target learning data including a pathology image in which cell nuclei and cell membranes are stained pink, and the morphological characteristics and/or It can be taught to analyze color distribution.
- a pathology image in which each of the cell nucleus and cell membrane stained in pink is set as a patch is input to the fourth analysis model 1440, and the analysis result including morphological characteristics and/or color distribution of the cell nucleus and/or cell membrane is obtained. It may be output from the fourth analysis model 1440 .
- morphological characteristics and/or color distribution of cell nuclei and/or cell membranes are obtained as reference values from patch labeling information included in target learning data, and morphological characteristics and/or color distributions included in the reference values and analysis results.
- a loss value can be calculated. The calculated loss value is fed back to the fourth analysis model 1440 so that the fourth analysis model 1440 can be learned.
- one or more of the plurality of analysis models 1410 to 1440 included in the pathology image analysis model 1400 may be called.
- FIG. 15 is a diagram illustrating output of an analysis result 1520 of a pathology image through an analysis model called based on characteristics of the pathology image 1510 according to another embodiment of the present disclosure.
- a feature extraction model 1500 for extracting features of a pathology image 1510 may be combined with a pathology image analysis model 1400 .
- the feature extraction model 1500 may be included in the pathology image analysis model 1400 .
- the feature extraction model 1500 may extract a staining color included in a pathology image and a color expression position as a feature.
- the expression site of the staining color may be at least one of cell membrane, cell nucleus, and cytoplasm.
- the feature extraction model 1500 may extract organs, carcinomas, staining methods, and the like as features of the pathology image.
- the feature extraction model 1500 stores at least one of a pre-stored organ pattern, a carcinoma pattern, or a staining pattern, and compares a pattern appearing in a pathology image with an organ pattern/carcinoma pattern/staining pattern to obtain characteristics from the pathology image. can be extracted.
- the feature extraction model 1500 is implemented as a machine learning model and can be trained to extract from the pathology image at least one of an organ related to the pathology image, a carcinoma included in the pathology image, or a staining method of the pathology image.
- a pathology image 1510 may be acquired, and the pathology image 1510 may be input to a feature extraction model 1500 and a pathology image analysis model 1400 , respectively.
- the pathology image 1510 may be an unlabeled pathology image.
- the pathology image may be a pathology image associated with a new drug or a new staining method.
- the feature extraction model 1500 may extract features of the pathology image and provide the extracted features of the pathology image to the pathology image analysis model 1400 . According to an embodiment, the feature extraction model 1500 may extract at least one of staining color, organ, carcinoma, or staining method as a feature of the pathology image 1510 .
- the pathology image analysis model 1400 calls the first analysis model 1410 and then inputs the pathology image 1510 to the first analysis model 1410 to determine at least one object related to an abnormal region included in the pathology image. Segation can be performed.
- the pathology image analysis model 1400 calls one of a plurality of analysis models 1420 to 1440 outputting different types of analysis results based on characteristics of the pathology image provided from the feature extraction model 1500. and an analysis result 1520 of the pathology image may be obtained from the called analysis model 1420, 1430 or 1440.
- the pathology image analysis model 1400 may input the segmented pathology image through the first analysis model 1410 to the called analysis model 1420 , 1430 , or 1440 .
- Characteristics of the pathological image may include staining color and/or expression location (eg, cell membrane/cytoplasm/nucleus) of the staining color.
- the pathology image analysis model 1400 may determine and call one of the plurality of analysis models 1420 to 1440 as a target analysis model based on characteristics of the pathology image. For example, when a first feature having a cell membrane as an expression site and a brown staining color is provided from the feature extraction model 1500, the pathology image analysis model 1400 determines the second analysis model 1420 as a target analysis model, and the segmented pathology image may be input to the second analysis model 1420 . In this case, the second analysis model 1420 may analyze the staining intensity in which the cell membrane is expressed as brown in the segmented region (ie, the object) in the pathology image, and output the analysis result 1520 .
- the pathology image analysis model 1400 uses the third analysis model 1430 as the target analysis model. It is determined and called, and the segmented pathology image may be input to the third analysis model 1430 .
- the third analysis model 1430 may analyze staining intensity in which cell nuclei are expressed in blue in the segmented region in the pathology image, and output the analysis result 1520 .
- the pathology image analysis model 1400 uses the fourth analysis model 1440 for target analysis. It is determined and called as a model, and the segmented pathology image may be input to the fourth analysis model 1440 .
- the fourth analysis model 1440 may analyze and output distribution and/or morphological characteristics in which each of the cell nucleus and cell membrane is expressed in pink in each segmented region in the pathology image.
- the morphological characteristics may mean association with a specific disease.
- characteristics of the pathology image may be received from the user. That is, the analysis system may receive user input information including characteristics of the pathology image. In this case, an analysis model to be called may be determined based on characteristics of the pathology image input by the user.
- FIG. 16 is a diagram illustrating output of an analysis result 1630 of a pathology image 1610 through an analysis model called based on user input information, according to another embodiment of the present disclosure.
- the pathology image analysis model 1400 may obtain user input information 1620 including characteristics of the pathology image 1610 .
- the pathology image analysis model 1400 calls the first analysis model 1410, inputs the pathology image 1610 to the first analysis model 1410, and performs segmentation of an object related to an abnormal region included in the pathology image. can be done
- the pathology image analysis model 1400 may determine a target analysis model to be called from among a plurality of analysis models based on characteristics of the pathology image included in the user's input information 1620 .
- the user's input information 1620 may include a dye color and/or a location where the dye color is expressed (eg, cell membrane/cytoplasm/cell nucleus). Additionally or alternatively, the user input information 1620 may include at least one of an organ, carcinoma, or staining method.
- the pathology image analysis model 1400 may determine and call one of the plurality of analysis models 1420 to 1440 as a target analysis model based on characteristics included in the user's input information 1620 . For example, when the user's input information 1620 includes the first staining method, the pathology image analysis model 1400 determines and calls the second analysis model 1420 as the target analysis model, and the second analysis model 1420 ), a segmented pathology image can be input. In this case, the second analysis model 1420 may analyze the staining intensity expressed by the first staining scheme in the segmented region in the pathology image and output the analysis result 1630 .
- the pathology image analysis model 1400 determines and calls the third analysis model 1430 as the target analysis model, and calls the third analysis model 1430.
- a segmented pathology image may be input.
- the second analysis model 1430 may analyze the staining intensity expressed by the second staining method in the segmented region in the pathology image and output the analysis result 1630 .
- the pathology image analysis model 1400 determines and calls the fourth analysis model 1440 as the target analysis model, and calls the fourth analysis model 1440.
- a segmented pathology image may be input to the model 1440 .
- the fourth analysis model 1440 may output an analysis result 1630 including distribution and/or morphological characteristics of the color expressed by the third staining method in the segmented region in the pathology image.
- the pathology image analysis model 1400 provides appropriate analysis results for various cells stained according to various staining methods. can output Accordingly, the pathology image analysis model 1400 according to the present disclosure is universally applied and can be used in various environments.
- FIGS. 17 to 20 are diagrams illustrating various types of analysis results output from the pathology image analysis model 1400 .
- at least one object eg, a cell, tissue, or structure
- a pathology image is illustrated as being an ellipse.
- the pathology image analysis model 1400 receives a plurality of pathology images 1710 and determines whether an object included in each pathology image 1710 has staining expression as positive or negative. Then, the determined results 1720 and 1730 may be output.
- positive means that a protein that is a target for staining is present on the object
- negative means that a protein that is a target for staining is not present on the object.
- 17 illustrates that a pathology image 1720 determined to be positive and a pathology image 1730 determined to be negative are output separately.
- the pathology image analysis model 1400 receives a plurality of pathology images 1810, determines a staining expression level for at least one object included in each pathology image 1810, and , analysis results including the determined expression level (1820 to 1850) can be output.
- 18 illustrates that class 3+ is the most strongly expressed object and class 0 is the weakest expressed object. Class 0 may mean that the protein that is the target of staining does not exist on the object.
- the pathology image analysis model 1400 receives a plurality of pathology images 1910 and sets expression values of objects included in each pathology image 1910 within a predetermined range (eg, 0). After digitizing as a number included in 1), it is possible to output an analysis result including expression values (1920 to 1950) for each object. In FIG. 19, the expression value closer to 1 is exemplified as the highest degree of staining expression.
- the pathology image analysis model 1400 receives a plurality of pathology images 2010 and outputs an analysis result 2020 including expression statistical information of an object included in each pathology image. can do. 20 exemplifies an analysis result 2020 including statistical information on the distribution of cell nucleus positivity/grade/expression value and statistical information on cell membrane positivity/grade/expression value distribution. In addition, statistical information on various cells, tissues, or structures may be output through the pathology image analysis model 1400 .
- the information processing system 2100 of FIG. 21 may be an example of the analysis system 510 shown in FIG. 5 .
- the information processing system 2100 includes one or more processors 2120, a bus 2110, a communication interface 2130, and a memory that loads a computer program 2150 executed by the processor 2120. (2140).
- processors 2120 a bus 2110
- communication interface 2130 a communication interface
- memory that loads a computer program 2150 executed by the processor 2120.
- 2140 2140
- FIG. 21 only components related to the embodiment of the present disclosure are shown in FIG. 21 . Accordingly, those skilled in the art to which the present disclosure pertains can know that other general-purpose components may be further included in addition to the components shown in FIG. 21 .
- the processor 2120 controls the overall operation of each component of the information processing system 2100.
- the processor 2120 of the present disclosure may be composed of a plurality of processors.
- the processor 2120 may include a Central Processing Unit (CPU), a Micro Processor Unit (MPU), a Micro Controller Unit (MCU), a Graphic Processing Unit (GPU), a Field Programmable Gate Array (FPGA), any well known in the art of the present disclosure. It may be configured to include at least two processors of the type of processor. Also, the processor 2120 may perform an operation for at least one application or program for executing a method according to embodiments of the present disclosure.
- the memory 2140 may store various data, commands and/or information. Memory 2140 may load one or more computer programs 2150 to execute methods/operations according to various embodiments of the present disclosure.
- the memory 2140 may be implemented as a volatile memory such as RAM, but the technical scope of the present disclosure is not limited thereto.
- the memory 2140 may be non-volatile memory such as read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, etc., a hard disk, a removable disk, or a device to which the present disclosure pertains. It may be configured to include any type of computer-readable recording medium well known in the art.
- the bus 2110 may provide a communication function between components of the information processing system.
- the bus 2110 may be implemented as various types of buses such as an address bus, a data bus, and a control bus.
- the communication interface 2130 may support wired/wireless Internet communication of the information processing system. Also, the communication interface 2130 may support various communication methods other than Internet communication. To this end, the communication interface 2130 may include a communication module well known in the art of the present disclosure.
- Computer program 2150 may include one or more instructions that cause processor 2120 to perform an operation/method in accordance with various embodiments of the present disclosure. That is, the processor 2120 may perform operations/methods according to various embodiments of the present disclosure by executing one or more instructions.
- the computer program 2150 may include an operation of acquiring a pathology image, inputting the acquired pathology image to a machine learning model, obtaining an analysis result of the pathology image from the machine learning model, and outputting the obtained analysis result. It may include one or more instructions to perform the following operations and the like.
- the machine learning model is a model learned using a training data set generated based on a first pathology data set associated with a first domain and a second pathology data set associated with a second domain different from the first domain.
- a system for analyzing a pathology image may be implemented through the information processing system 2100 according to some embodiments of the present disclosure.
- example implementations may refer to utilizing aspects of the presently disclosed subject matter in the context of one or more stand-alone computer systems, the subject matter is not so limited, but rather in conjunction with any computing environment, such as a network or distributed computing environment. may be implemented. Further, aspects of the presently-disclosed subject matter may be implemented in or across a plurality of processing chips or devices, and storage may be similarly affected across a plurality of devices. Such devices may include PCs, network servers, and handheld devices.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Quality & Reliability (AREA)
- Immunology (AREA)
- Urology & Nephrology (AREA)
- Chemical & Material Sciences (AREA)
- Hematology (AREA)
- Evolutionary Computation (AREA)
- Biochemistry (AREA)
- Microbiology (AREA)
- Food Science & Technology (AREA)
- Medicinal Chemistry (AREA)
- Analytical Chemistry (AREA)
- Cell Biology (AREA)
- Pathology (AREA)
- Biotechnology (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
Description
Tissue mapping | ||
BG (Background) | CA (Cancer Area) | |
PD-L1 lung | BG | CA |
HER2 breast | BG, CIS | CA |
Cell mapping | |||
Other Cell | TC- | TC+ | |
PD-L1 lung | LP+, LP-, MP+, MP-, OT | TC- | TC+ |
HER2 breast | OT | TC0 | TC1, TC2, TC3 |
Claims (19)
- 적어도 하나의 프로세서에 의해서 수행되는, 병리 이미지 분석 방법에 있어서,병리 이미지를 획득하는 단계;상기 획득된 병리 이미지를 기계학습 모델에 입력하여, 상기 기계학습 모델로부터 상기 병리 이미지에 대한 분석 결과를 획득하는 단계; 및상기 획득된 분석 결과를 출력하는 단계를 포함하고,상기 기계학습 모델은 제1 도메인과 연관된 제1 병리 데이터 세트 및 제1 도메인과 상이한 제2 도메인과 연관된 제2 병리 데이터 세트에 기초하여 생성된 학습용 데이터 세트를 이용하여 학습된 모델인,병리 이미지 분석 방법.
- 제1항에 있어서,상기 병리 이미지를 획득하는 단계 이전에,제1 유형의 병리 이미지를 포함하는 상기 제1 병리 데이터 세트 및 제2 유형의 병리 이미지를 포함하는 상기 제2 병리 데이터 세트를 획득하는 단계;상기 제1 병리 데이터 세트와 상기 제2 병리 데이터 세트에 기초하여 상기 학습용 데이터 세트를 생성하는 단계; 및상기 생성된 학습용 데이터 세트를 이용하여 상기 기계학습 모델을 학습시키는 단계를 더 포함하는, 병리 이미지 분석 방법.
- 제2항에 있어서,상기 학습용 데이터 세트를 생성하는 단계는,염색 발현 등급 또는 관심 영역 중 적어도 하나에 기초하여, 상기 제1 유형의 병리 이미지와 연관된 항목과 상기 제2 유형의 병리 이미지와 연관된 항목을 연관하는 단계; 및상기 연관된 항목을 포함하는 학습용 데이터 세트를 생성하는 단계를 포함하는, 병리 이미지 분석 방법.
- 제3항에 있어서,상기 항목을 연관하는 단계는,상기 제1 유형의 병리 이미지에 포함된 종양 조직 영역과 연관된 제1 항목 및 상기 제1 유형의 병리 이미지에 포함된 비종양 조직 영역과 연관된 제2 항목을 추출하는 단계;상기 제2 유형의 병리 이미지에 포함된 종양 조직 영역과 연관된 제3 항목 및 상기 제2 유형의 병리 이미지에 포함된 비종양 조직 영역과 연관된 제4 항목을 추출하는 단계; 및상기 추출된 제1 항목과 상기 추출된 제3 항목을 연관하고, 상기 추출된 제2 항목과 상기 추출된 제4 항목을 연관하는 단계를 포함하는, 병리 이미지 분석 방법.
- 제3항에 있어서,상기 항목을 연관하는 단계는,상기 제1 유형의 병리 이미지에 포함된 픽셀들의 각각의 염색 발현 강도를 나타내는 항목들 중에서, 제1 발현 범위와 연관된 제5 항목 및 제2 발현 범위와 연관된 제6 항목을 추출하는 단계;상기 제2 병리 데이터 세트에 포함된 픽셀들의 각각의 염색 발현 강도를 나타내는 항목들 중에서, 상기 제1 발현 범위와 연관된 제7 항목 및 상기 제2 발현 범위와 연관된 제8 항목을 식별하는 단계; 및상기 제5 항목과 상기 제7 항목을 연관하고, 상기 제6 항목과 상기 제8 항목을 연관하는 단계를 포함하는, 병리 이미지 분석 방법.
- 제3항에 있어서,상기 항목을 연관하는 단계는,상기 제1 유형의 병리 이미지에 포함된 세포의 유형을 나타내는 적어도 하나의 오브젝트 클래스와 상기 제2 유형의 병리 이미지에 포함된 세포의 유형을 나타내는 적어도 하나의 오브젝트 클래스를 연관하는 단계; 또는상기 제1 유형의 병리 이미지에 포함된 세포의 염색 발현 강도를 나타내는 적어도 하나의 오브젝트 클래스와 상기 제2 유형의 병리 이미지에 포함된 세포의 염색 발현 강도를 나타내는 적어도 하나의 오브젝트 클래스를 연관하는 단계를 포함하는, 병리 이미지 분석 방법.
- 제2항에 있어서,상기 제1 병리 데이터 세트와 상기 제2 병리 데이터 세트에 기초하여 상기 학습용 데이터 세트를 생성하는 단계는,상기 제1 병리 데이터 세트와 상기 제2 병리 데이터 세트로부터 패치들을 추출하는 단계; 및상기 패치들을 포함하는 상기 학습용 데이터 세트를 생성하는 단계를 포함하고,상기 생성된 학습용 데이터 세트를 이용하여 상기 기계학습 모델을 학습시키는 단계는,상기 제1 병리 데이터 세트로부터 추출된 레이블링된 패치들 중에서 제1 샘플링 개수에 상응하는 개수의 제1 유형의 이미지 패치들을 가져오는(fetch) 단계;상기 제2 병리 데이터 세트로부터 추출된 레이블링된 패치들 중에서 제2 샘플링 개수에 상응하는 개수의 제2 유형의 이미지 패치들을 가져오는 단계;상기 제1 유형의 이미지 패치들 및 상기 제2 유형의 이미지 패치들에 기초하여, 배치를 생성하는 단계; 및상기 배치를 이용하여 상기 기계학습 모델을 학습시키는 단계를 포함하는, 병리 이미지 분석 방법.
- 제2항에 있어서,상기 제1 병리 데이터 세트와 상기 제2 병리 데이터 세트에 기초하여 상기 학습용 데이터 세트를 생성하는 단계는,상기 제1 병리 데이터 세트로부터 제1 유형의 이미지 패치들을 추출하는 단계;상기 제2 병리 데이터 세트로부터 제2 유형의 이미지 패치들을 추출하는 단계; 및소정 개수만큼 상기 제1 유형의 이미지 패치들을 복사하여 상기 학습용 데이터 세트에 포함시키는 단계를 포함하는, 병리 이미지 분석 방법.
- 제2항에 있어서,상기 기계학습 모델에 학습시키는 단계는,상기 제1 유형의 병리 이미지 또는 상기 제2 유형의 병리 이미지 중 적어도 하나에 대한 크기를 조정하는 단계; 및상기 크기가 조정된 적어도 하나의 병리 이미지를 포함하는 학습용 데이터를 이용하여 상기 기계학습 모델을 학습시키는 단계를 포함하는, 병리 이미지 분석 방법.
- 제2항에 있어서,상기 기계학습 모델에 학습시키는 단계는,상기 제1 유형의 병리 이미지 또는 상기 제2 유형의 병리 이미지 중 적어도 하나에 포함된 픽셀들 중에서 미리 결정된 범위에 해당하는 픽셀들을 제거하는 단계를 포함하는, 병리 이미지 분석 방법.
- 제2항에 있어서,상기 기계학습 모델에 학습시키는 단계는,상기 제1 유형의 병리 이미지 또는 상기 제2 유형의 병리 이미지 중 적어도 하나를 좌우 또는 상하로 반전하는 단계; 및상기 반전된 병리 이미지를 포함하는 학습용 데이터를 이용하여 상기 기계학습 모델을 학습시키는 단계를 포함하는, 병리 이미지 분석 방법.
- 제2항에 있어서,상기 기계학습 모델에 학습시키는 단계는,상기 제1 유형의 병리 이미지 또는 상기 제2 유형의 병리 이미지 중 적어도 하나에 포함된 픽셀들 중에서 미리 결정된 범위의 픽셀들을 제거하거나 변형하는 단계; 및상기 미리 결정된 범위의 픽셀들이 제거되거나 변형된 병리 이미지를 포함하는 학습용 데이터를 이용하여 상기 기계학습 모델을 학습시키는 단계를 포함하는, 병리 이미지 분석 방법.
- 제2항에 있어서,상기 기계학습 모델에 학습시키는 단계는,상기 제1 유형의 병리 이미지 또는 상기 제2 유형의 병리 이미지 중 적어도 하나에 포함된 픽셀들의 색상을 변환시키는 단계; 및상기 픽셀들의 색상이 변환된 적어도 하나의 병리 이미지를 포함하는 학습용 데이터를 이용하여 상기 기계학습 모델을 학습시키는 단계를 포함하는, 병리 이미지 분석 방법.
- 제2항에 있어서,상기 기계학습 모델을 학습시키는 단계는,상기 학습용 데이터 세트 중에서 타깃 학습용 데이터를 결정하는 단계;상기 타깃 학습용 데이터를 상기 기계학습 모델에 입력하고, 상기 기계학습 모델로부터 출력 값을 획득하는 단계;상기 제1 병리 데이터 세트 또는 상기 제2 병리 데이터 세트 중 적어도 하나에 포함된 주석 정보를 이용하여, 상기 타깃 학습용 데이터에 대한 레퍼런스 값을 획득하는 단계; 및상기 출력 값과 상기 획득된 레퍼런스 값 간의 손실 값을 상기 기계학습 모델에 피드백하는 단계를 포함하는, 병리 이미지 분석 방법.
- 제1항에 있어서,상기 기계학습 모델은, 서로 다른 유형의 분석 결과를 출력하는 복수의 분석 모델을 포함하고,상기 분석 결과를 획득하는 단계는,상기 획득된 병리 이미지로부터 염색 색상 및 염색이 발현된 위치를 식별하는 단계;상기 식별된 염색 색상 및 발현된 위치에 기초하여, 상기 복수의 분석 모델 중 어느 하나를 타깃 분석 모델로서 결정하는 단계; 및상기 결정된 타깃 분석 모델로 상기 병리 이미지를 입력하여, 상기 발현된 위치에서의 염색 강도에 대한 분석 결과를 상기 타깃 분석 모델로부터 획득하는 단계를 포함하는, 병리 이미지 분석 방법.
- 제1항에 있어서,상기 기계학습 모델은, 서로 다른 유형의 분석 결과를 출력하는 복수의 분석 모델을 포함하고,상기 분석 결과를 획득하는 단계는,사용자의 입력 정보에 기초하여, 상기 복수의 분석 모델 중 어느 하나를 타깃 분석 모델로서 결정하는 단계; 및상기 타깃 분석 모델로 상기 병리 이미지를 입력하여, 상기 병리 이미지에 대한 분석 결과를 상기 타깃 분석 모델로부터 획득하는 단계를 포함하는, 병리 이미지 분석 방법.
- 제1항에 있어서,상기 기계학습 모델은, 세포의 종류 또는 상기 세포의 평가 지표 중 적어도 하나를 포함하는 분석 결과를 출력하고,상기 세포의 평가 지표는, 상기 세포에 대한 양성 또는 음성에 대한 결과값, 상기 세포에 대한 염색 발현 등급, 상기 세포에 대한 염색 발현 정도를 나타내는 값, 또는 상기 세포에 대한 염색 발현 통계 정보 중 적어도 하나를 포함하는, 병리 이미지 분석 방법.
- 제1항에 따른 방법을 컴퓨터에서 실행하기 위한 명령어들을 기록한 컴퓨터 판독 가능한 비일시적 기록매체.
- 정보 처리 시스템으로서,메모리; 및상기 메모리와 연결되고, 상기 메모리에 포함된 컴퓨터 판독 가능한 적어도 하나의 프로그램을 실행하도록 구성된 적어도 하나의 프로세서를 포함하고,상기 적어도 하나의 프로그램은,병리 이미지를 획득하고,상기 획득된 병리 이미지를 기계학습 모델에 입력하여, 상기 기계학습 모델로부터 상기 병리 이미지에 대한 분석 결과를 획득하고,상기 획득된 분석 결과를 출력하기 위한 명령어들을 포함하고,상기 기계학습 모델은 제1 도메인과 연관된 제1 병리 데이터 세트 및 제1 도메인과 상이한 제2 도메인과 연관된 제2 병리 데이터 세트에 기초하여 생성된 학습용 데이터 세트를 이용하여 학습된 모델인, 정보 처리 시스템.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP22842497.4A EP4372379A1 (en) | 2021-07-14 | 2022-07-14 | Pathology image analysis method and system |
US18/491,314 US20240046670A1 (en) | 2021-07-14 | 2023-10-20 | Method and system for analysing pathology image |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR20210092181 | 2021-07-14 | ||
KR10-2021-0092181 | 2021-07-14 | ||
KR1020220087202A KR20230011895A (ko) | 2021-07-14 | 2022-07-14 | 병리 이미지 분석 방법 및 시스템 |
KR10-2022-0087202 | 2022-07-14 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/491,314 Continuation US20240046670A1 (en) | 2021-07-14 | 2023-10-20 | Method and system for analysing pathology image |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023287235A1 true WO2023287235A1 (ko) | 2023-01-19 |
Family
ID=84920299
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2022/010321 WO2023287235A1 (ko) | 2021-07-14 | 2022-07-14 | 병리 이미지 분석 방법 및 시스템 |
Country Status (2)
Country | Link |
---|---|
US (1) | US20240046670A1 (ko) |
WO (1) | WO2023287235A1 (ko) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101889725B1 (ko) * | 2018-07-04 | 2018-08-20 | 주식회사 루닛 | 악성 종양 진단 방법 및 장치 |
KR102039138B1 (ko) * | 2019-04-02 | 2019-10-31 | 주식회사 루닛 | 적대적 학습에 기반한 도메인 어댑테이션 방법 및 그 장치 |
WO2020182710A1 (en) * | 2019-03-12 | 2020-09-17 | F. Hoffmann-La Roche Ag | Multiple instance learner for prognostic tissue pattern identification |
KR102246319B1 (ko) * | 2021-01-07 | 2021-05-03 | 주식회사 딥바이오 | 병리 검체에 대한 판단 결과를 제공하는 인공 뉴럴 네트워크의 학습 방법, 및 이를 수행하는 컴퓨팅 시스템 |
-
2022
- 2022-07-14 WO PCT/KR2022/010321 patent/WO2023287235A1/ko active Application Filing
-
2023
- 2023-10-20 US US18/491,314 patent/US20240046670A1/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101889725B1 (ko) * | 2018-07-04 | 2018-08-20 | 주식회사 루닛 | 악성 종양 진단 방법 및 장치 |
WO2020182710A1 (en) * | 2019-03-12 | 2020-09-17 | F. Hoffmann-La Roche Ag | Multiple instance learner for prognostic tissue pattern identification |
KR102039138B1 (ko) * | 2019-04-02 | 2019-10-31 | 주식회사 루닛 | 적대적 학습에 기반한 도메인 어댑테이션 방법 및 그 장치 |
KR102246319B1 (ko) * | 2021-01-07 | 2021-05-03 | 주식회사 딥바이오 | 병리 검체에 대한 판단 결과를 제공하는 인공 뉴럴 네트워크의 학습 방법, 및 이를 수행하는 컴퓨팅 시스템 |
Non-Patent Citations (1)
Title |
---|
HEATHER D. COUTURE, WILLIAMS LINDSAY A., GERADTS JOSEPH, NYANTE SARAH J., BUTLER EBONEE N., MARRON J. S., PEROU CHARLES M., TROEST: "Image analysis with deep learning to predict breast cancer grade, ER status, histologic subtype, and intrinsic subtype", NPJ BREAST CANCER, vol. 4, no. 1, 1 December 2018 (2018-12-01), pages 1 - 8, XP055621884, DOI: 10.1038/s41523-018-0079-1 * |
Also Published As
Publication number | Publication date |
---|---|
US20240046670A1 (en) | 2024-02-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021060899A1 (ko) | 인공지능 모델을 사용 기관에 특화시키는 학습 방법, 이를 수행하는 장치 | |
WO2021054518A1 (ko) | 인공지능 기반 기술의 의료영상분석을 이용한 자궁경부암 진단방법, 장치 및 소프트웨어 프로그램 | |
WO2021049729A1 (ko) | 인공지능 모델을 이용한 폐암 발병 가능성 예측 방법 및 분석 장치 | |
WO2019132587A1 (ko) | 영상 분석 장치 및 방법 | |
WO2020045848A1 (ko) | 세그멘테이션을 수행하는 뉴럴 네트워크를 이용한 질병 진단 시스템 및 방법 | |
WO2019083227A1 (en) | MEDICAL IMAGE PROCESSING METHOD, AND MEDICAL IMAGE PROCESSING APPARATUS IMPLEMENTING THE METHOD | |
WO2021006522A1 (ko) | 딥 러닝 모델을 활용한 영상 진단 장치 및 그 방법 | |
WO2021153858A1 (ko) | 비정형 피부질환 영상데이터를 활용한 판독보조장치 | |
US20140306992A1 (en) | Image processing apparatus, image processing system and image processing method | |
WO2021137454A1 (ko) | 인공지능 기반의 사용자 의료정보 분석 방법 및 시스템 | |
WO2019009664A1 (en) | APPARATUS FOR OPTIMIZING THE INSPECTION OF THE OUTSIDE OF A TARGET OBJECT AND ASSOCIATED METHOD | |
WO2021006482A1 (en) | Apparatus and method for generating image | |
WO2020045702A1 (ko) | 비색표를 이용한 소변 검사를 제공하는 컴퓨터 프로그램 및 단말기 | |
Chen et al. | Microscope 2.0: an augmented reality microscope with real-time artificial intelligence integration | |
WO2020032561A2 (ko) | 다중 색 모델 및 뉴럴 네트워크를 이용한 질병 진단 시스템 및 방법 | |
WO2023167448A1 (ko) | 병리 슬라이드 이미지를 분석하는 방법 및 장치 | |
WO2022092993A1 (ko) | 대상 이미지에 대한 추론 작업을 수행하는 방법 및 시스템 | |
WO2020091337A1 (ko) | 영상 분석 장치 및 방법 | |
WO2023287235A1 (ko) | 병리 이미지 분석 방법 및 시스템 | |
WO2023234730A1 (ko) | 패치 레벨 중증도 판단 방법, 슬라이드 레벨 중증도 판단 방법 및 이를 수행하는 컴퓨팅 시스템 | |
WO2023128284A1 (ko) | 자궁 경부암의 진단에 대한 정보 제공 방법 및 이를 이용한 자궁 경부암의 진단에 대한 정보 제공용 디바이스 | |
WO2020032560A2 (ko) | 진단 결과 생성 시스템 및 방법 | |
WO2021177532A1 (ko) | 인공지능을 이용하여 정렬된 염색체 이미지의 분석을 통한 염색체 이상 판단 방법, 장치 및 컴퓨터프로그램 | |
Dobrolyubova et al. | Automatic image analysis algorithm for quantitative assessment of breast cancer estrogen receptor status in immunocytochemistry | |
WO2022250190A1 (ko) | 딥러닝 모델을 이용한 영상검사 대상체의 결함 판정시스템 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22842497 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2024501611 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022842497 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022842497 Country of ref document: EP Effective date: 20240214 |