WO2023191503A1 - Method for recommending candidate target of cell cluster in cancer microenvironment through single-cell transcriptome analysis, and apparatus and program therefor - Google Patents

Method for recommending candidate target of cell cluster in cancer microenvironment through single-cell transcriptome analysis, and apparatus and program therefor Download PDF

Info

Publication number
WO2023191503A1
WO2023191503A1 PCT/KR2023/004192 KR2023004192W WO2023191503A1 WO 2023191503 A1 WO2023191503 A1 WO 2023191503A1 KR 2023004192 W KR2023004192 W KR 2023004192W WO 2023191503 A1 WO2023191503 A1 WO 2023191503A1
Authority
WO
WIPO (PCT)
Prior art keywords
cell
target
cancer
candidate
tissue
Prior art date
Application number
PCT/KR2023/004192
Other languages
French (fr)
Korean (ko)
Inventor
임형준
이성준
박정빈
이대승
Original Assignee
주식회사 포트래이
서울대학교산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 포트래이, 서울대학교산학협력단 filed Critical 주식회사 포트래이
Priority claimed from KR1020230041265A external-priority patent/KR20230140439A/en
Publication of WO2023191503A1 publication Critical patent/WO2023191503A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis

Definitions

  • the present invention relates to a method, device, and program for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis.
  • the present invention seeks to provide a method for proposing a target candidate using single cell transcriptome information, a device for suggesting a candidate for a target, and a program.
  • the method of proposing a target candidate through single cell transcriptome information performed by a computing device includes collecting single cell transcriptome information of abnormal tissue and normal tissue; clustering results for single cell transcriptome information; Matching with a corresponding cell group; Selecting a cell group of interest from among the cell groups; Deriving a molecular marker with a higher expression rate in abnormal tissue compared to normal tissue in the cell group of interest; and deriving a molecular marker capable of expressing surface proteins among the molecular markers.
  • the abnormal tissue may be a malignant tumor.
  • the step of collecting single cell transcriptome information of the abnormal tissue and normal tissue is,
  • the step of matching the clustering result for the single cell transcriptome information to the corresponding cell group may be characterized by clustering using unsupervised learning.
  • the step of matching the clustering result for the single cell transcriptome information to the corresponding cell group may be characterized by grouping based on whether the similarity of the entire transcriptome expression profile exceeds a reference value.
  • the step of selecting a cell group of interest among the cell groups is,
  • It may be characterized by selection by calculating the expression rate of a specific biomarker.
  • It may further include deriving specific markers by verifying molecular markers capable of expressing surface proteins from tissue transcript information.
  • It may further include the step of deriving specific marker candidates by verifying molecular markers capable of expressing surface proteins from tissue transcript information.
  • a step of confirming the spatial transcriptome distribution for reference cells of the target carcinoma may be further included.
  • a step of deriving a correlation between the spatial transcript distribution for the cancer-specific marker candidate and the spatial transcript distribution for the reference cells may be further included.
  • a step of secondary confirmation of the cell distribution of cells having a positive correlation and cancer-specific marker candidates may be further included.
  • a step of determining the genetic correlation of the cancer-specific marker candidate may be further included.
  • the step of determining the genetic association of the cancer-specific marker candidate may be a step of quantitatively determining the genetic association using Gene Ontology (GO) analysis or pathway analysis.
  • GO Gene Ontology
  • It may further include determining the gene correlation of the cancer-specific marker candidate and determining whether the quantitatively determined highly correlated gene is a cancer expression-related gene that has an influence greater than the reference value on cancer expression.
  • a step of deriving final cancer-specific marker candidates may be further included.
  • the solution includes a computer program stored in a recording medium to execute the above-described method using a computing device.
  • the target candidate proposal device includes a processor
  • the processor collects single cell transcriptome information of abnormal tissue and normal tissue, matches the clustering result of the single cell transcriptome information with the corresponding cell group, selects the cell group of interest from the cell group, and selects the cell group of interest from the cell group of interest to the abnormal tissue.
  • Molecular markers with a high expression rate compared to normal tissues are derived, and molecular markers capable of expressing surface proteins are derived among the molecular markers.
  • the method, device, and program for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis include i) selecting the type of cancer to be studied, and ii) selecting the target candidate from the corresponding tumor tissue and adjacent normal tissue.
  • Single cell sequencing data can be collected from public databases, and iii) data cleaning and preprocessing can be performed.
  • unsupervised clustering can be used to group individual cells with similar overall transcript expression profiles.
  • the cell type represented by each cluster can be determined by calculating the expression rate of a representative cell type-specific biomarker.
  • differential analysis can be performed by selecting clusters of cell types of interest, such as fibroblasts or infiltrating immune cells, and corresponding normal tissues.
  • the method, device, and program for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis identify molecular markers that can characterize the corresponding cell groups, and use a previously established target protein database. By utilizing the results previously analyzed through single cell transcript expression information, molecular marker candidates that can be expressed as surface proteins can be selected.
  • imaging markers that can bind to surface proteins can be used for diagnosis and severity assessment of specific diseases, and the development of therapeutic agents that can bind to surface proteins can be used to treat specific diseases.
  • RNA sequencing data we built a bioinformatic analysis pipeline to identify stroma cell surface markers expressed in PDAC, a pancreatic tumor.
  • GEO Gene Expression Omnibus
  • stroma cell clusters were selected and differential analysis was performed.
  • stroma cell surface markers expressed at high levels in PDAC were identified and verified in several independent datasets.
  • FIG. 1 is a flowchart showing a method for proposing a target candidate according to the present invention.
  • Figure 2 is a flowchart showing a method for comparing cell population distribution.
  • Figure 3 is a flowchart showing a method for deriving molecular markers according to one embodiment.
  • Figure 4 is a flowchart showing a method for deriving molecular markers according to another example.
  • Figure 5 is a diagram analyzing single cell transcriptome information for severe patients, mild patients, and control groups for exemplary diseases.
  • Figure 6 is a graph showing the distribution of immune cell groups according to patient type.
  • Figure 7 is a diagram showing the intersection of a cluster with a high expression rate with the surface protein DB and the intersection with the DB having a target function.
  • Figure 8 shows the 10 most expressed molecular markers for each cluster, and the genome that satisfies the intersection condition with the surface protein DB among the molecular markers is marked with *.
  • Figure 9 is a flow chart showing a method for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis according to the present invention.
  • Figure 10 is a flowchart of a method for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis according to another embodiment.
  • Figure 11 is a diagram showing a single cell transcriptome profile showing the cluster distribution after data clustering and the relative distribution of two data groups.
  • Figure 12 is a diagram comparing single cell transcriptome profiles of normal cells and cancer cells.
  • Figure 13 is a diagram showing the expression genes of the single cell transcriptome profile of normal cells according to Figure 12.
  • Figure 14 is a diagram showing a dot plot showing the cell type of each cluster.
  • Figure 15 is a diagram classifying the single cell transcriptome into three main cell types.
  • Figure 16 is a diagram illustrating the steps of determining a candidate group of stromal cell surface targets in pancreatic cancer.
  • Figure 17 is a diagram showing the top 10 candidate molecules identified according to the process of Figure 16.
  • FIG. 18 is a diagram showing that the expression rate of the candidate substance according to FIG. 17 is significantly different from that of adjacent normal tissue.
  • FIG. 19 is a diagram illustrating the p-value between normal cells and cancer cells of the candidate substance according to FIG. 17.
  • Figure 20 is a diagram showing that the expression rate of the candidate substance according to Figure 17 is higher in cancer cells than in normal cells at all clinical stages.
  • Figure 21 is a diagram showing an example of extracting cell types from TNBC type and Luminal spatial transcriptome data among breast cancers to describe the distribution and target suitability in the breast cancer spatial transcriptome.
  • Figure 22 is a diagram showing the distribution of eight types of cells in a tumor.
  • Figure 23 is a diagram showing the correlation between target cells and cell types.
  • Figure 24 is a diagram illustrating the expression of cells positively correlated with the target to confirm target suitability.
  • Figure 25 is a graph comparing cell types that are correlated with targets.
  • Figure 26 is a diagram showing spatially related gene extraction and functional terms with positive correlation.
  • Figure 27 is a diagram showing the results of pathway analysis targeting genes with positive correlation.
  • Figure 28 is a diagram showing the extraction of spatially related genes and functional terms with negative correlation.
  • Figure 29 is a diagram showing spatial expression patterns.
  • Figure 30 is a graph comparing target expression values by cancer type to confirm target suitability in The Cancer Genome Atlas (TCGA) database.
  • the expression value of ANTXR1 in breast cancer (BRCA) shows the difference between normal cells and tumor tissues. You can check it.
  • Figure 31 compares the expression of the target by tumor and normal cells, confirming that it has a high expression rate in tumors such as GBM ESCA STAD HNSC KIRC CHOL COAD KIRP and a high expression rate in normal cells such as LUSC LUAD PRAD THCA BLCA UCEC CESC PCPG KICH This is a drawing showing what happens.
  • Figure 32 is a diagram showing the distribution of targets in cell lines for each cancer type.
  • Figure 33 is a diagram showing the distribution of targets in a normal cell population.
  • Figure 34 is a diagram showing the distribution of targets by breast cancer subtype, confirming that the ANTXR1 expression value of the TNBC subtype is relatively low.
  • Figure 35 is a diagram showing target distribution by breast cancer subtype, including normal cell data.
  • Figure 36 is a diagram analyzing the correlation between clinical variables and targets.
  • Figure 37 is a diagram analyzing the relationship between target and tumor microenvironment, performed on the basis of epithelial cells.
  • Figure 38 is a diagram related to marker analysis and interpretation related to combined use of immunotherapy agents.
  • Figure 39 is a block diagram of a computing device that performs a method of recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis according to an exemplary embodiment of the present invention.
  • the method of proposing a target candidate through single cell transcriptome information performed by a computing device includes collecting single cell transcriptome information of abnormal tissue and normal tissue; clustering results for single cell transcriptome information; Matching with a corresponding cell group; Selecting a cell group of interest from among the cell groups; Deriving a molecular marker with a higher expression rate in abnormal tissue compared to normal tissue in the cell group of interest; and deriving a molecular marker capable of expressing surface proteins among the molecular markers.
  • first, second, A, B, (a), and (b) may be used. These terms are only used to distinguish the component from other components, and the nature, sequence, or order of the component is not limited by the term.
  • a component is described as being “connected,” “coupled,” or “connected” to another component, that component may be directly connected or connected to that other component, but there is another component between each component. It will be understood that elements may be “connected,” “combined,” or “connected.”
  • “comprises” and/or “comprising” refers to a referenced component, step, operation and/or element that includes one or more other components, steps, operations and/or elements. Does not exclude presence or addition.
  • the method of proposing a target candidate through single cell transcriptome information performed by a computing device includes the steps of collecting single cell transcriptome information of a control group and a patient based on the severity of symptoms ( S101), clustering the collected single cell transcriptome information (S102), performing matching of cell groups corresponding to clusters (S103), comparing the distribution of cell groups corresponding to the control cluster and the distribution of cell groups corresponding to the patient cluster.
  • step (S104) firstly deriving molecular markers capable of expressing surface proteins for cell clusters expressed only in patients (S105), and secondly deriving molecular markers with target functions among the firstly derived molecular markers ( S106) may be included.
  • the step of collecting single cell transcriptome information of a patient may be a step of collecting single cell transcriptome information of a patient with a specific disease directly or by crawling the single cell transcriptome information data from an external source.
  • Single cell transcriptional genome analysis is a technology that analyzes the genomic characteristics of a cell by isolating a cell, amplifying and sequencing DNA or RNA from a very small amount of material.
  • the method of directly performing single cell transcriptome analysis is as follows.
  • cell suspension is performed through a cell separation process from tissue or blood. Then obtain single cells by sorting them in order of cell size.
  • single cell transcriptional genome analysis the amount of DNA that can be obtained from a single cell is only at the picogram level, so an amplification process is required to increase it to the nanogram level where sequencing is possible.
  • the PCR process or MDA process is mainly used, and a sequencing library can be produced based on this.
  • RNA cDNA is obtained through a reverse transcription process, and then amplified cDNA is obtained and a sequencing library is created.
  • Barcodes can be attached to the sequencing library produced from each cell, enabling tens to tens of thousands of samples to be sequenced together, and data for each cell can be separated after sequencing.
  • the procedure for analyzing transcriptome data obtained through the RNA sequencing process using a single cell uses mapping tools such as TopHat and GSNAP for alignment to a reference sequence (hg18, hg19, etc. in humans), and HTs eq the obtained data. Perform procedures to measure gene expression using methods, etc.
  • the quality of the experimental data can be identified and if the quality is poor, a procedure can be performed to exclude it from the analysis.
  • a clustering process can be performed to confirm the cellular characteristics of the transcriptome data obtained through the above-mentioned preprocessing work. Through this, similarities between cells can be confirmed. Statistical methods such as edgeR and DESeq may be used to select specifically expressed genes between cells or between populations. Comparison between various groups may be possible using the group information obtained in this way. First, it may be possible to identify the stochasticity and variability of transcription through analysis within the same cell type. Additionally, regulatory network inference and allelic expression pattern analysis may be possible. Second, through analysis between cell types, it may be possible to identify biomarkers that show differences between cell types.
  • Single cell transcriptome information obtained in this way can be obtained through direct tissue examination, but can also be collected externally from public data conducted at other research institutes and hospitals.
  • This step (S101) may be a step of collecting information on patients with inflammatory diseases, and may further be a step of separately collecting information on mild patients and severe patients.
  • the step of clustering the collected single cell transcriptome information (S102) may be characterized by clustering using unsupervised learning. Referring to Figure 5, clustering results for single cell transcriptome information in severe patients, mild patients, and controls are shown.
  • the step of matching the cell group corresponding to the cluster (S103) is a step of identifying and matching the cell group to which the clustered group corresponds.
  • the step of contrasting the cell population distribution corresponding to the control cluster and the cell population distribution corresponding to the patient cluster is a step for selecting a genome specifically expressed in the patient cluster. Since the single transcript clusters of the patient group and the control group are clearly distinct, it can be easy to use this to select genomes that specifically occur in patients. Details will be described later in Figure 2.
  • the step (S105) of first deriving molecular markers capable of expressing surface proteins for cell clusters expressed only in patients is to select immune cell groups specifically expressed in patients and primarily select molecular markers capable of binding to surface proteins. This is the step where it is possible to set up a candidate group of target materials.
  • a molecular marker having a target function for example, imaging or binding function
  • Derivation of the secondary molecular marker can be implemented by deriving a molecular marker that satisfies the intersection between the primary molecular marker and a known imaging DB or binding DB.
  • Figure 2 is a flowchart showing a method for comparing cell population distribution.
  • the step of comparing the distribution of cell populations (S104) includes the step of comparing the distribution ratio of immune cells from the control cluster and the patient cluster (S104-1), respectively.
  • It may include a step (S104-2) of selecting immune cells expressed above a predetermined reference value.
  • this step (S104) may further include a step (S104-3) of normalizing the genome expression count of immune cells and listing the differences from the control group based on P-value.
  • Figure 3 is a flowchart showing a method for deriving molecular markers according to one embodiment.
  • the step 105 of first deriving molecular markers capable of expressing surface proteins includes the step 105-1 of selecting a cluster with the largest quantitative difference from the control group, and the step 105-1 of selecting cluster genome information and surface protein genome. It may include a step (105-2) of deriving a surface protein primary molecular marker corresponding to the intersection of DBs.
  • the step of deriving a molecular marker with a target function secondarily is a step of deriving a secondary potential candidate with a target function by checking the intersection between the surface protein primary molecular marker and the imaging DB. It can be included.
  • FIG. 4 is a flowchart showing a method for deriving molecular markers according to another example.
  • the difference from the control group among clusters significantly expressed above a predetermined threshold in seriously ill patients is determined by a predetermined number in descending order of quantitative size.
  • Step of selecting according to the number (105'-1) Checking the intersection of the selected number of cluster genome information and the surface protein genome database to identify the primary surface protein potential candidate (Potential Target) that can react with all of the selected clusters.
  • Step (105'-2) may be included.
  • the method for proposing a target candidate according to the present invention shown in FIGS. 1 to 4 may be performed by a device or implemented by a computer program stored in a recording medium for execution.
  • Figure 5 is a diagram analyzing single cell transcriptome information for severe patients, mild patients, and control groups for exemplary diseases.
  • the single cell transcriptome information for the virus control group (HC), mild patient (M), and severe patient (S) is clustered through unsupervised learning, and the result of matching with the immune cell group is shown.
  • Figure 6 is a graph showing the distribution of immune cell groups according to patient type. Referring to Figure 6, the results of a quantitative comparison of the degree of activation of the immune cell groups shown in Figure 5 are shown. Referring to Figure 5, it can be seen that the expression rates of M01 immune cells and M03 immune cells in seriously ill patients (S) and controls (HC) differ by more than a predetermined standard value, for example, more than 300%.
  • Figure 7 is a diagram showing the intersection of a cluster with a high expression rate with the surface protein DB and the intersection with the DB having a target function.
  • the immune cell group expressed above a predetermined reference value in a patient with a specific disease is identified, and the genome of the immune cell group is defined as a candidate pool of target candidate material.
  • primary molecular markers are derived based on the intersection with surface protein databases such as Surfaceome, and to derive secondary molecular markers that have a target function, such as imaging, between the primary molecular marker and target function. , the results of deriving target candidate molecular markers through intersection with the imaging DB are shown.
  • the expression rate in critically ill patients satisfies all requirements for intersection between the genome of M01 and M03, whose expression rate is above a predetermined reference value compared to the control group (e.g., 300%), and the surface protein DB (Surfaceome) and imaging DB (PETdb). It can be seen that the final candidates for the target molecular marker are SLC43A2, SLC2A3, and FOLR2.
  • Figure 8 shows the 10 most expressed molecular markers for each cluster, and the genome that satisfies the intersection condition with the surface protein DB among the molecular markers is marked with *.
  • the 10 most expressed genomes for each immune cell group (M01, M02, M03, M04) are selected and listed, and among the genomes, the primary molecular marker candidate that satisfies the intersection conditions with the surface protein DB is selected.
  • the results indicated by * are shown as examples.
  • the 10 genomes of CD300E, CCR1, EMP1, TNFSF13B, LILRA5, IL1R2, FPR1, LILRB2, LILRB1, and IFNGR2 have the highest expression rates, and the primary genome that satisfies the intersection condition with the double surface protein DB It can be confirmed that the molecular marker candidates correspond to CCR1 and FPR1.
  • Figure 9 is a flow chart showing a method for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis according to the present invention.
  • Step S201 is a step of collecting single cell transcriptome information of cancer tissue and normal tissue.
  • This step (S201) may be a step of collecting single cell transcriptome information of a patient with a specific disease and transcriptome information of normal cells as a comparison group directly or by crawling and collecting single cell transcriptome information data from an external source.
  • Single cell transcriptional genome analysis is a technology that separates cells, amplifies DNA or RNA from a very small amount of material, and analyzes the genomic characteristics of the cell by sequencing. Details are described above in Figure 1, so duplicate information will be omitted. .
  • the target tissue is specified as a cancer tissue, but it is not limited to this, and of course, any abnormal tissue can be the target.
  • Step S202 is a step of performing clustering on single cell transcriptome information using unsupervised learning. This step may be to group individual cells with similar overall transcript expression profiles using unsupervised clustering. Details are described above in step S102 according to FIG. 1, and redundant information is omitted. The criterion for clustering is whether to group or not based on whether the similarity of the entire transcript expression profile exceeds the standard value.
  • Step S203 is a step of matching the clustering result for single cell transcript information with the corresponding cell group.
  • the cell type represented by each cluster can be determined by calculating the expression rate of representative cell type-specific biomarkers. Details are described above in step S103 according to FIG. 1, and redundant information is omitted.
  • Step S204 is a step of selecting a cell group of interest from among the cell groups.
  • Cell population of interest may refer to a cluster of cell types of interest, such as fibroblasts or infiltrating immune cells.
  • the cell group of interest can be selected by calculating the expression rate of a specific biomarker.
  • Step S205 is a step of deriving molecular markers with a higher expression rate in cancer tissues compared to normal tissues in the cell group of interest. This may refer to the step of finding molecular targets that cause significant differences in gene expression.
  • Step S206 is a step of deriving a molecular marker capable of expressing a surface protein among the molecular markers. For example, by intersecting the list of genes with differences with the list of cell surface proteins in the Surfaceome database, molecular targets (molecular markers) expressed on the outer surface of cells can be obtained.
  • Step S207 is a step of collecting tissue transcriptome information of normal people and patients.
  • Step S208 is a step of deriving specific markers by verifying molecular markers capable of expressing surface proteins from tissue transcript information.
  • Figure 10 is a flowchart of a method for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis according to another embodiment. Each step is described with reference to FIG. 10 as follows.
  • Step S301 is a step of collecting single cell transcriptome information of cancer tissue and normal tissue.
  • This step (S301) may be a step of directly collecting single cell transcriptome information of a patient with a specific disease and normal cell transcriptome information as a comparison group, or may be a step of collecting single cell transcriptome information data by crawling from an external source.
  • Single cell transcriptional genome analysis is a technology that isolates “one” cell, amplifies DNA or RNA from a very small amount of material, and analyzes the genomic characteristics of that cell by sequencing. Details are described above in Figure 1. Omit any content that does not apply.
  • the target tissue is specified as a cancer tissue, but it is not limited to this, and of course, any abnormal tissue can be the target.
  • Step S302 is a step of performing clustering on single cell transcriptome information using unsupervised learning. This step may be to group individual cells with similar overall transcript expression profiles using unsupervised clustering. Details are described above in step S102 according to FIG. 1, and redundant information is omitted.
  • Step S303 is a step of matching the clustering result for single cell transcript information with the corresponding cell group.
  • the cell type represented by each cluster can be determined by calculating the expression rate of representative cell type-specific biomarkers. Details are described above in step S103 according to FIG. 1, and redundant information is omitted.
  • Step S304 is a step of selecting a cell group of interest from among the cell groups.
  • Cell population of interest may refer to a cluster of cell types of interest, such as fibroblasts or infiltrating immune cells.
  • Step S305 is a step of deriving molecular markers with a higher expression rate in cancer tissues compared to normal tissues in the cell group of interest. This may refer to the step of finding molecular targets that cause significant differences in gene expression.
  • Step S306 is a step of deriving a molecular marker capable of expressing a surface protein among the molecular markers. For example, by intersecting the list of genes with differences with the list of cell surface proteins in the Surfaceome database, molecular targets (molecular markers) expressed on the outer surface of cells can be obtained.
  • Step S307 is a step of deriving cancer-specific marker candidates by verifying molecular markers capable of expressing surface proteins from tissue transcriptome information.
  • Step S308 is a step of confirming the spatial transcriptome distribution for reference cells of the target carcinoma. As shown in Figure 21, based on the spatial transcriptome data of the target carcinoma, the spatial transcriptome according to cell type can be confirmed and used as the basis for verification data in the step described later.
  • Step S309 is a step of deriving the correlation between the spatial transcript distribution for the cancer-specific marker candidate and the spatial transcript distribution for the reference cells.
  • the Score distribution that confirmed the spatial transcriptome distribution of carcinoma cells using an algorithm, etc. was set as reference data, and correlation was compared with the spatial transcriptome distribution of the target found through single cell RNA req. ) can be derived.
  • Step S310 is a step to secondarily confirm the cell distribution of cells with a positive correlation and cancer-specific marker candidates. It can be confirmed that among the specific marker candidates derived as described in Figures 23 and 24, a target called ANTXR1 has a positive correlation with chondrocyte, firoblast, iDC, and stramascore. At this time, confirmation of a specific marker candidate with a high correlation with the tissue of the reference cell and its distribution can serve as a basis for confirming consistency with the target cell.
  • Step S311 is a step of determining the genetic relationship of the cancer-specific marker candidate.
  • Gene Ontology analysis and path search are performed on candidate genomes with positive correlation to derive final gene correlation and final target information for specific marker candidates. It can be.
  • This is a step of determining effective marker candidates again from the specific marker candidates whose consistency with the target cell was confirmed in the previous step S310, and may be a step of analyzing the correlation between genes rather than their consistency with the target cell. Broadly speaking, this can be a step to quantitatively determine gene correlation using Gene Ontology (GO) analysis or pathway analysis.
  • GO Gene Ontology
  • the specific marker candidate has a high correlation with not only the target cell but also certain genes, and which genes it affects can be determined by linking the quantitatively determined gene correlation. For example, if there is a carcinoma (abnormal tissue) targeted by the cell cluster target candidate recommendation method of the present invention, a gene correlation determination step (S311 ) can be determined through.
  • specific marker candidates are 1) first sorted to enable expression of cell populations and surface proteins of interest (step S306), 2) secondly sorted based on correlation with target cells (steps S309, S310), and 3) target.
  • Third sorting is performed by determining whether the number of genes related to cancer expression that affects the expression of carcinoma is greater than the standard value (step S311), and finally the final cancer-specific marker (final abnormal tissue-specific marker) can be selected.
  • Step (S312) Basically, by using spatial transcriptome information, it is possible to select at least 10% of specific marker candidates ranging from thousands to tens of thousands in the first sorting, and again at the 10% level in the second sorting.
  • effective final specific marker candidates can be easily selected, and this makes it possible to easily select effective final specific marker candidates, compared to the conventional method of selecting specific marker candidates through clinical trials.
  • Experimental efficiency can be improved by at least 10 to 100 times.
  • Figure 11 is a diagram showing a single cell transcriptome profile showing the cluster distribution after data clustering and the relative distribution of two data groups.
  • Figure 12 is a diagram comparing single cell transcriptome profiles of normal cells and cancer cells.
  • Figure 13 is a diagram showing the expression genes of the single cell transcriptome profile of normal cells according to Figure 12.
  • Figure 14 is a diagram showing a dot plot showing the cell type of each cluster.
  • Figure 15 is a diagram classifying the single cell transcriptome into three main cell types.
  • Figure 16 is a diagram illustrating the steps of determining a candidate group of stromal cell surface targets in pancreatic cancer.
  • normal tissue and abnormal tissue malignant tumor-tumor tissue
  • a marker group for surface proteins is selected.
  • Candidate materials for 333 target-specific markers obtained through intersection with the database (surfaceome DB, 2608) can be confirmed. Through this procedure, 333 target substances can be easily identified and then sorted in descending order by checking the average log 2 (FC) value to quantitatively check the gene expression level.
  • FC average log 2
  • FC Fold Change and quantitatively defines the level of gene expression.
  • FC treatment / control.
  • treatment is the comparison condition
  • the expression level can be determined by log 2 (FC), and in Figure 17, 10 gene candidates with excellent expression levels in 333 examples of pancreatic cancer are shown.
  • Figure 17 is a diagram showing the top 10 candidate molecules identified according to the process of Figure 16.
  • DPEP1 transcriptome data (GSE15471) from the GEO database of normal and pancreatic cancer tissues adjacent to cancer, it was confirmed that all 10 candidate molecules except DPEP1 were expressed significantly higher in tumors compared to normal tissues. DPEP1 was exceptionally more expressed in normal tissues. Accordingly, when selecting a final candidate substance according to the present invention, DPEP1 may be excluded if a target substance with a high expression level in abnormal tissue is to be selected.
  • FIG. 18 is a diagram showing that the expression rate of the candidate substance according to FIG. 17 is significantly different from that of adjacent normal tissue.
  • FIG. 19 is a diagram illustrating the p-value between normal cells and cancer cells of the candidate substance according to FIG. 17.
  • Figure 20 is a diagram showing that the expression rate of the candidate substance according to Figure 17 is higher in cancer cells than in normal cells at all clinical stages.
  • transcript expression data of normal pancreatic tissue in the GTEx database and PAAD (Pancreatic Adenocarcinoma) transcriptome data including clinical stage information in TCGA the expression of 10 candidate molecules except DPEP1 was found to be consistent with all clinical conditions in the tumor. It can be seen that it is significantly higher than that of normal tissue at this stage.
  • This may be an example of the step of deriving a cancer-specific marker candidate through verification from the tissue transcriptome information of the cancer-specific marker candidate according to step S307 described above.
  • Figure 21 is a diagram showing an example of extracting cell types from TNBC type and Luminal spatial transcriptome data among breast cancers to describe the distribution and target suitability in the breast cancer spatial transcriptome.
  • the spatial transcriptome distribution of tumor epidermal cells is confirmed. This is a step corresponding to step S308.
  • Figure 22 is a diagram showing the distribution of eight types of cells in a tumor.
  • Figure 23 is a diagram showing the correlation between target cells and cell types. This corresponds to step S309, which derives the correlation between the spatial transcript distribution for the cancer-specific marker candidate and the spatial transcript distribution for the reference cells.
  • the Score distribution which confirms the distribution of cells using an algorithm, is set as reference data, and the spatial transcript distribution of the target found through single cell RNA req is compared to determine the correlation. (correlation) was derived.
  • a target called ANTXR1 has a positive correlation with chondrocyte, firoblast, iDC, and stramascore.
  • Figure 24 is a diagram illustrating the expression of cells positively correlated with the target to confirm target suitability. This corresponds to the step of secondarily confirming the cell distribution of cells with positive correlation and cancer-specific marker candidates according to step S310.
  • Figure 25 is a graph comparing cell types that are correlated with targets. Figure 25 shows the results of reviewing the distribution results for target suitability within the organization.
  • Figure 26 is a diagram showing spatially related gene extraction and functional terms with positive correlation.
  • Gene Ontology analysis (GO analysis) is a structured model for individual genes based on the biological process, molecular function, and cellular component with which the gene is related to study gene function. Functional annotation can be obtained through, and in order to analyze the function of a gene, gene annotation can be performed against the Gene Ontology DB and meaningful results can be obtained through statistical methods.
  • Figure 27 is a diagram showing the results of pathway analysis targeting genes with positive correlation.
  • FIGS. 26 and 27 show that the specific marker candidate has a high correlation with not only the target cell but also with certain genes, and the gene correlation is quantitatively determined on which genes it affects. This is a connected judgment stage. For example, if there is a carcinoma (abnormal tissue) targeted by the cell cluster target candidate recommendation method of the present invention, a gene correlation determination step (S311 ), and Figure 27 is an example showing the relationship between cancer expression-related genes between specific marker candidate genes that affect the expression of breast cancer.
  • a carcinoma abnormal tissue
  • Figure 27 is an example showing the relationship between cancer expression-related genes between specific marker candidate genes that affect the expression of breast cancer.
  • the final specific marker candidate which has a positive correlation with the gene, is effective by increasing the probability of direct delivery to the gene that affects the expression of carcinoma in the target cell during subsequent drug delivery. As mentioned above, it is possible to easily select, and through this, the experimental efficiency can be improved by at least 10 to 100 times compared to the conventional method of selecting specific marker candidates through clinical experiments.
  • Figure 30 is a graph comparing target expression values by cancer type to confirm target suitability in The Cancer Genome Atlas (TCGA) database.
  • the expression value of ANTXR1 in breast cancer (BRCA) shows the difference between normal cells and tumor tissues. You can check it.
  • Figure 31 compares the expression of the target by tumor and normal cells, confirming that it has a high expression rate in tumors such as GBM ESCA STAD HNSC KIRC CHOL COAD KIRP and a high expression rate in normal cells such as LUSC LUAD PRAD THCA BLCA UCEC CESC PCPG KICH This is a drawing showing what happens.
  • Pancreatic ductal adenocarcinoma (PDAC) and normal pancreatic cells cluster into duct/tumor, stromal, and immune cell clusters, respectively, based on the expression of cell type-specific marker genes.
  • the top 10 PDAC stroma-specific cell surface marker genes were identified as MXRA8, ANTXR1, LY6E, GJB2, THY1, PLXDC2, GPNMB, SDC1, CD55, and DPEP1. To verify the results, the expression levels of the top 10 genes were compared between pancreatic cancer tissue and adjacent normal pancreatic tissue. As a result, the expression levels of these genes were found to be significantly higher in cancer tissue than in normal pancreatic tissue.
  • pancreatic cancer stroma-specific cell surface markers can be identified using single cell RNA sequencing data and SurfaceomeDB. This analysis pipeline can also be applied to other types of tumors and for other purposes.
  • the target candidate material proposal device 1000 includes a memory device 1200, a processor 1100, a storage 1300, a communication module (not shown), and an input/output interface (I/O device) 1400. It may include a power supply 1500.
  • the memory device 1200 is a computer-readable recording medium and may include a permanent mass storage device such as random access memory (RAM), read only memory (ROM), and a disk drive. Additionally, a program code for controlling a target candidate material proposal method and a pre-trained deep learning network may be temporarily or permanently stored in the memory.
  • RAM random access memory
  • ROM read only memory
  • a program code for controlling a target candidate material proposal method and a pre-trained deep learning network may be temporarily or permanently stored in the memory.
  • the processor 1100 collects single cell transcriptome information of abnormal tissue and normal tissue, matches the clustering result of the single cell transcriptome information with the corresponding cell group, selects the cell group of interest from the cell group, and abnormal tissue from the cell group of interest. It is possible to derive molecular markers with a high expression rate compared to normal tissues, and to derive molecular markers that can express surface proteins among the molecular markers.
  • the communication module may provide functions for communicating with an external server through a network. For example, a request generated by the processor of the target candidate substance proposal device according to a program code stored in a recording device such as a memory may be transmitted to an external server through a network under the control of a communication module. Conversely, control signals, commands, content, files, etc. provided under the control of the external server's processor may be received as target candidate material proposals through the communication module through the network.
  • the communication method is not limited, and may include not only a communication method utilizing a communication network that the network may include (for example, a mobile communication network, wired Internet, wireless Internet, and a broadcasting network), but also short-range wireless communication between devices.
  • networks include personal area network (PAN), local area network (LAN), campus area network (CAN), metropolitan area network (MAN), wide area network (WAN), broadband network (BBN), Internet, etc. It may include one or more arbitrary networks among the networks.
  • the network may include, but is not limited to, any one or more of network topologies including a bus network, star network, ring network, mesh network, star-bus network, tree or hierarchical network, etc. .
  • the communication module can communicate with an external server through a network.
  • the communication method is not limited, but the network may be a local area wireless communication network.
  • the network may be a Bluetooth, Bluetooth Low Energy (BLE), or Wifi communication network.
  • the input/output interface 1400 may be a means for interfacing with an input/output device.
  • an input device may include a device such as a keyboard or mouse
  • an output device may include a device such as a display for displaying a communication session of an application.
  • an input/output interface may be a means of interfacing with a device that integrates input and output functions into one, such as a touch screen.
  • the processor of the target candidate substance proposal device processes the commands of the computer program loaded in the memory, and a service screen or content constructed using data provided by an external server may be displayed on the display through an input/output interface ( there is.
  • the power supply 1500 may supply power necessary for the operation of the device 1000.
  • the target candidate substance proposal device may include more components than the components described above, but is not limited thereto.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Medical Informatics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Organic Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • Analytical Chemistry (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A method for suggesting a candidate target using single-cell transcriptome information, implemented by a computing device, according to the present invention, comprises the steps of: collecting single-cell transcriptome information of an abnormal tissue and a normal tissue; matching the result of clustering the single-cell transcriptome information to corresponding cell groups; selecting a cell group of interest from the cell groups; deriving, from the cell group of interest, molecular markers having a higher expression rate in the abnormal tissue than in the normal tissue; and deriving, from the molecular markers, a molecular marker that can be expressed as a surface protein.

Description

단일 세포 전사체 분석을 통한 암 미세 환경 내 세포 클러스터의 표적 후보 추천 방법, 그 장치 및 프로그램Method, device and program for recommending target candidates for cell clusters in cancer microenvironment through single cell transcriptome analysis
본 발명은 단일 세포 전사체 분석을 통한 암 미세 환경 내 세포 클러스터의 표적 후보 추천 방법, 그 장치 및 프로그램에 관한 것이다.The present invention relates to a method, device, and program for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis.
암 표적 기반의 신약에 대한 특정 표적 분자표적 선정을 위해 기존에는 다양한 분자생물학적 연구 결과를 바탕으로 진행하였으며, NGS기술(Nest-Generation Sequencing)이 발달함에 따라 전장 전사체 및 단백체 등의 스크리닝을 통해 대개 in vitro수준에서 표적 분자를 찾고 이를 타게팅 할 수 있는 리간드에 대해 연구하여 이를 신약개발을 위한 출발점으로 삼는 경우가 많았다. In order to select a specific molecular target for a cancer target-based new drug, it was previously conducted based on the results of various molecular biological studies. With the development of NGS technology (Nest-Generation Sequencing), screening of full-length transcriptomes and proteomes is usually used. In many cases, this was used as a starting point for new drug development by finding target molecules at the in vitro level and studying ligands that can target them.
최근 이러한 기술이 단일세포수준에서 이루어질 수 있게 됨에 따라, 세포수준에서 표적팅할 수 있는 분자를 찾을 수 있는 단일세포분석기술이 활성화 되었음. 특히 세포의 전사체로부터 각각의 세포 그룹에 대한 마커를 찾는 기술이 활용되고 있다. 하지만 다양한 데이터베이스를 활용하여 유용한 표적 마커를 발굴하는 프로세스의 확립이 되어 있지는 않다.Recently, as this technology has become possible at the single cell level, single cell analysis technology has been activated to find molecules that can be targeted at the cell level. In particular, technology is being used to find markers for each cell group from the cell transcriptome. However, there is no established process for discovering useful target markers using various databases.
상술한 바와 같은 문제점을 해결하기 위해, 본 발명은 단일 세포 전사체 정보를 통한 표적 후보 물질 제안 방법, 표적 후보 물질 제안 장치 및 프로그램 을 제공하고자 한다. In order to solve the problems described above, the present invention seeks to provide a method for proposing a target candidate using single cell transcriptome information, a device for suggesting a candidate for a target, and a program.
본 발명에 따른 컴퓨팅 장치에 의해 수행되는 단일 세포 전사체 정보를 통한 표적 후보 물질 제안 방법은, 비정상 조직과 정상조직의 단일 세포 전사체 정보를 수집하는 단계;단일 세포 전사체 정보에 대한 클러스터링 결과를 대응하는 세포군으로 매칭하는 단계;세포군 중 관심 세포군을 선정하는 단계;관심 세포군에서 비정상 조직에서 정상조직 대비 발현율이 높은 분자마커를 도출하는 단계; 및분자마커 중 표면 단백질 발현 가능한 분자마커를 도출하는 단계;를 포함한다. The method of proposing a target candidate through single cell transcriptome information performed by a computing device according to the present invention includes collecting single cell transcriptome information of abnormal tissue and normal tissue; clustering results for single cell transcriptome information; Matching with a corresponding cell group; Selecting a cell group of interest from among the cell groups; Deriving a molecular marker with a higher expression rate in abnormal tissue compared to normal tissue in the cell group of interest; and deriving a molecular marker capable of expressing surface proteins among the molecular markers.
상기 비정상 조직은 악성 종양일 수 있다. The abnormal tissue may be a malignant tumor.
상기 비정상 조직과 정상조직의 단일 세포 전사체 정보를 수집하는 단계는,The step of collecting single cell transcriptome information of the abnormal tissue and normal tissue is,
암 종류를 선정하는 단계를 포함할 수 있다. It may include the step of selecting the type of cancer.
상기 단일 세포 전사체 정보에 대한 클러스터링 결과를 대응하는 세포군으로 매칭하는 단계는, 비지도학습에 의한 클러스터링 되는 것을 특징으로 할 수 있다.The step of matching the clustering result for the single cell transcriptome information to the corresponding cell group may be characterized by clustering using unsupervised learning.
상기 단일 세포 전사체 정보에 대한 클러스터링 결과를 대응하는 세포군으로 매칭하는 단계는, 전체 전사체 발현 프로파일의 유사도가 기준치를 넘어서는 지를 기준으로 그룹화 하는 것을 특징으로 할 수 있다. The step of matching the clustering result for the single cell transcriptome information to the corresponding cell group may be characterized by grouping based on whether the similarity of the entire transcriptome expression profile exceeds a reference value.
상기 세포군 중 관심 세포군을 선정하는 단계는,The step of selecting a cell group of interest among the cell groups is,
특이적 생체 마커의 발현율을 연산하여 선정하는 것을 특징으로 할 수 있다. It may be characterized by selection by calculating the expression rate of a specific biomarker.
정상인과 환자의 조직 전사체 정보를 수집하는 단계; 및Collecting tissue transcriptome information from normal people and patients; and
표면 단백질 발현 가능한 분자마커를 조직 전사체 정보에서 검증하여 특이적 마커를 도출하는 단계;를 더 포함하는 것을 특징으로 할 수 있다. It may further include deriving specific markers by verifying molecular markers capable of expressing surface proteins from tissue transcript information.
표면 단백질 발현 가능한 분자마커를 조직 전사체 정보에서 검증하여 특이적 마커 후보를 도출하는 단계를 더 포함하는 것을 특징으로 할 수 있다. It may further include the step of deriving specific marker candidates by verifying molecular markers capable of expressing surface proteins from tissue transcript information.
타겟 암종의 레퍼런스 세포들에 대한 공간전사체 분포를 확인하는 단계를 더 포함할 수 있다.A step of confirming the spatial transcriptome distribution for reference cells of the target carcinoma may be further included.
암 특이적 마커 후보에 대한 공간전사체 분포와 레퍼런스 세포들에 대한 공간전사체 분포의 상관관계를 도출하는 단계를 더 포함할 수 있다. A step of deriving a correlation between the spatial transcript distribution for the cancer-specific marker candidate and the spatial transcript distribution for the reference cells may be further included.
양의 상관관계를 가지는 세포와 암 특이적 마커 후보의 세포 분포를 2차적으로 확인하는 단계를 더 포함할 수 있다. A step of secondary confirmation of the cell distribution of cells having a positive correlation and cancer-specific marker candidates may be further included.
암 특이적 마커 후보의 유전자 연관성을 판단하는 단계를 더 포함할 수 있다.A step of determining the genetic correlation of the cancer-specific marker candidate may be further included.
상기 암 특이적 마커 후보의 유전자 연관성을 판단하는 단계는, Gene Ontology(GO) 분석 또는 경로 분석(pathway analysis)를 이용하여 유전자 연관성을 정량적으로 판단하는 단계일 수 있다.The step of determining the genetic association of the cancer-specific marker candidate may be a step of quantitatively determining the genetic association using Gene Ontology (GO) analysis or pathway analysis.
암 특이적 마커 후보의 유전자 연관성을 판단하여 정량적으로 판단된 연관성 높은 유전자가 암 발현에 기준치 이상의 영향성을 주는 암발현 관련 유전자 인지를 판단하는 단계;를 더 포함할 수 있다.It may further include determining the gene correlation of the cancer-specific marker candidate and determining whether the quantitatively determined highly correlated gene is a cancer expression-related gene that has an influence greater than the reference value on cancer expression.
암 특이적 마커 후보 중에 암발현 관련 유전자의 숫자가 높은 순서로 최종 암 특이적 마커를 선별하는 단계;를 더 포함할 수 있다.It may further include selecting the final cancer-specific marker in the order of the highest number of cancer expression-related genes among the cancer-specific marker candidates.
암 특이적 마커 후보 최종 후보를 도출하는 단계를 더 포함할 수 있다. A step of deriving final cancer-specific marker candidates may be further included.
컴퓨팅 장치를 이용하여 전술한 방법을 실행시키기 위하여 기록매체에 저장된 컴퓨터 프로그램이 해결 수단에 포함된다.The solution includes a computer program stored in a recording medium to execute the above-described method using a computing device.
표적 후보 물질 제안 장치는, 프로세서;를 포함하고, The target candidate proposal device includes a processor;
상기 프로세서는, 비정상 조직과 정상조직의 단일 세포 전사체 정보를 수집하고, 단일 세포 전사체 정보에 대한 클러스터링 결과를 대응하는 세포군으로 매칭하고, 세포군 중 관심 세포군을 선정하고, 관심 세포군에서 비정상 조직에서 정상조직 대비 발현율이 높은 분자마커를 도출하고, 및 분자마커 중 표면 단백질 발현 가능한 분자마커를 도출하는 것을 수행한다. The processor collects single cell transcriptome information of abnormal tissue and normal tissue, matches the clustering result of the single cell transcriptome information with the corresponding cell group, selects the cell group of interest from the cell group, and selects the cell group of interest from the cell group of interest to the abnormal tissue. Molecular markers with a high expression rate compared to normal tissues are derived, and molecular markers capable of expressing surface proteins are derived among the molecular markers.
본 발명에 따른 단일 세포 전사체 분석을 통한 암 미세 환경 내 세포 클러스터의 표적 후보 추천 방법, 그 장치 및 프로그램은 i)연구 대상이 되는 암 종류를 선택하고, ii)해당 종양 조직과 인접한 정상 조직으로부터 단일 세포 시퀀싱 데이터를 공개 데이터베이스에서 수집하고, iii) 데이터 정리 및 전처리를 수행할 수 있다. The method, device, and program for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis according to the present invention include i) selecting the type of cancer to be studied, and ii) selecting the target candidate from the corresponding tumor tissue and adjacent normal tissue. Single cell sequencing data can be collected from public databases, and iii) data cleaning and preprocessing can be performed.
다음, iv)비지도 클러스터링을 사용하여 전체 전사체 발현 프로파일이 유사한 개별 세포를 그룹화할 수 있다. 이때, 각 클러스터가 나타내는 세포 유형은 대표적인 세포 유형 특이적 생체 마커의 발현율을 계산하여 결정할 수 있다. Next, iv) unsupervised clustering can be used to group individual cells with similar overall transcript expression profiles. At this time, the cell type represented by each cluster can be determined by calculating the expression rate of a representative cell type-specific biomarker.
v)유전자 발현 차이 분석을 수행한다. 구체적으로 섬유아세포나 침투성 면역세포와 같은 관심 있는 세포 유형 클러스터 및 해당 정상 조직을 선택하여 차이 분석을 수행할 수 있다.v) Perform gene expression difference analysis. Specifically, differential analysis can be performed by selecting clusters of cell types of interest, such as fibroblasts or infiltrating immune cells, and corresponding normal tissues.
vi)차이가 있는 유전자 목록과 데이터베이스(예를 들어, Surfaceome 데이터베이스)의 세포 표면 단백질 목록을 교차하여, 세포 외부 표면에서 발현되는 분자적 대상을 확정할 수 있다.vi) By intersecting the list of differential genes with the list of cell surface proteins in a database (e.g., Surfaceome database), molecular targets expressed on the outer surface of cells can be determined.
이러한 본 발명에 따른 단일 세포 전사체 분석을 통한 암 미세 환경 내 세포 클러스터의 표적 후보 추천 방법, 그 장치 및 프로그램은 해당 세포군들을 특징할 수 있는 분자마커들을 확인하고, 기존에 구축되어 있는 표적 단백질 데이터베이스와 앞서 단일 세포 전사체 발현 정보를 통해 분석된 결과들을 활용하면 표면 단백질로 발현될 수 있는 분자마커 후보들을 선정할 수 있다.The method, device, and program for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis according to the present invention identify molecular markers that can characterize the corresponding cell groups, and use a previously established target protein database. By utilizing the results previously analyzed through single cell transcript expression information, molecular marker candidates that can be expressed as surface proteins can be selected.
결과적으로, 표면 단백질에 결합할 수 있는 영상 마커의 도출로 특정 질환의 진단 및 중증도 평가에 활용할 수 있으며, 표면 단백질에 결합할 수 있는 치료제의 개발은 특정 질환의 치료에 활용할 수 있다.As a result, the derivation of imaging markers that can bind to surface proteins can be used for diagnosis and severity assessment of specific diseases, and the development of therapeutic agents that can bind to surface proteins can be used to treat specific diseases.
단일 세포 RNA 시퀀싱 데이터를 사용하여 췌장 종양인 PDAC에서 발현되는 stroma 세포 표면 마커를 식별하기 위한 바이오인포매틱 분석 파이프라인을 구축하였다. Using single-cell RNA sequencing data, we built a bioinformatic analysis pipeline to identify stroma cell surface markers expressed in PDAC, a pancreatic tumor.
Gene Expression Omnibus (GEO) 데이터베이스에서 PDAC 조직 (n=3)과 정상 췌장 조직 (n=3)으로부터 단일 세포 RNA 시퀀싱 데이터를 획득하였다.Single-cell RNA sequencing data were obtained from PDAC tissue (n=3) and normal pancreatic tissue (n=3) from the Gene Expression Omnibus (GEO) database.
Seurat 패키지를 사용하여 각 샘플의 데이터를 통합하고 세포를 클러스터링하였다.Data from each sample were integrated and cells were clustered using the Seurat package.
PDAC stroma에서 발현이 풍부한 유전자를 식별하기 위해 stroma 세포 클러스터를 선택하여 차등 분석을 수행하였다.To identify genes with enriched expression in PDAC stroma, stroma cell clusters were selected and differential analysis was performed.
PDAC stroma 특이적 유전자와 SurfaceomeDB의 유전자 세트를 교차하여 PDAC에서 높은 수준으로 발현되는 stroma 세포 표면 마커를 식별하였다.By intersecting PDAC stroma-specific genes with gene sets from SurfaceomeDB, stroma cell surface markers expressed at high levels in PDAC were identified.
이 마커 유전자들의 발현을 RNA 시퀀싱 데이터 (GSE15471)로부터 얻은 36 쌍의 PDAC 및 정상 조직 샘플, 그리고 The Cancer Genome Atlas (TCGA)에서 얻은 143개 PDAC 샘플과 Genotype-Tissue Expression (GTEx) 데이터베이스에서 얻은 165개의 정상 췌장 조직에서 검증하였다.The expression of these marker genes was measured in 36 pairs of PDAC and normal tissue samples obtained from RNA sequencing data (GSE15471), and in 143 PDAC samples obtained from The Cancer Genome Atlas (TCGA) and 165 pairs obtained from the Genotype-Tissue Expression (GTEx) database. It was verified in normal pancreatic tissue.
이러한 분석 파이프라인을 통해 PDAC에서 높은 수준으로 발현되는 stroma 세포 표면 마커를 식별하였으며, 이를 여러 독립적인 데이터셋에서 검증하였다.Through this analysis pipeline, stroma cell surface markers expressed at high levels in PDAC were identified and verified in several independent datasets.
도 1은 본 발명에 따른 표적 후보 물질 제안 방법을 나타낸 순서도이다.1 is a flowchart showing a method for proposing a target candidate according to the present invention.
도 2는 세포군 분포를 대조하는 방법을 나타낸 순서도이다.Figure 2 is a flowchart showing a method for comparing cell population distribution.
도 3은 일 실시예에 따른 분자마커를 도출하는 방법을 나타낸 순서도이다.Figure 3 is a flowchart showing a method for deriving molecular markers according to one embodiment.
도 4는 다른 실시예에 따른 분자마커를 도출하는 방법을 나타낸 순서도이다.Figure 4 is a flowchart showing a method for deriving molecular markers according to another example.
도 5는 예시적인 질환에 대해서 중증 환자, 경증 환자, 대조군에 대한 단일 세포 전사체 정보를 분석한 도면이다.Figure 5 is a diagram analyzing single cell transcriptome information for severe patients, mild patients, and control groups for exemplary diseases.
도 6은 환자 종류에 따른 면역세포군의 분포를 도시한 그래프이다.Figure 6 is a graph showing the distribution of immune cell groups according to patient type.
도 7은 발현율이 높은 클러스터에 대한 표면 단백질 DB와의 교집합 및 타겟 기능을 가지는 DB와의 교집합을 나타낸 도면이다.Figure 7 is a diagram showing the intersection of a cluster with a high expression rate with the surface protein DB and the intersection with the DB having a target function.
도 8은 클러스터 별로 가장 많이 발현되는 분자마커를 10개씩 도시하고, 분자마커 중 표면 단백질 DB와의 교집합 조건을 만족하는 유전체는 *표시가 도시된 그래프이다.Figure 8 shows the 10 most expressed molecular markers for each cluster, and the genome that satisfies the intersection condition with the surface protein DB among the molecular markers is marked with *.
도 9는 본 발명에 따른 단일 세포 전사체 분석을 통한 암 미세 환경 내 세포 클러스터의 표적 후보 추천 방법에 대한 순서도를 도시한 도면이다.Figure 9 is a flow chart showing a method for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis according to the present invention.
도 10은 다른 실시예에 따른 단일 세포 전사체 분석을 통한 암 미세 환경 내 세포 클러스터의 표적 후보 추천 방법에 대한 순서도를 도시한 도면이다.Figure 10 is a flowchart of a method for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis according to another embodiment.
도 11은 데이터 클러스터링 후 군집 분포와 두 데이터 그룹의 상대 분포를 나타낸 단일 세포 전사체 프로파일를 도시한 도면이다.Figure 11 is a diagram showing a single cell transcriptome profile showing the cluster distribution after data clustering and the relative distribution of two data groups.
도 12는 정상 세포와 암 세포의 단일 세포 전사체 프로파일을 비교 도시한 도면이다.Figure 12 is a diagram comparing single cell transcriptome profiles of normal cells and cancer cells.
도 13은 도 12에 따른 정상 세포의 단일 세포 전사체 프로파일의 발현 유전자를 도시한 도면이다.Figure 13 is a diagram showing the expression genes of the single cell transcriptome profile of normal cells according to Figure 12.
도 14는 각 클러스터의 세포 유형을 보여주는 도트 플롯을 도시한 도면이다.Figure 14 is a diagram showing a dot plot showing the cell type of each cluster.
도 15는 단일 세포 전사체를 3개의 메인 세포 유형으로 분류한 도면이다.Figure 15 is a diagram classifying the single cell transcriptome into three main cell types.
도 16은 췌장암에서 간질 세포 표면 타겟의 후보군을 결정하는 단계를 도시한 도면이다.Figure 16 is a diagram illustrating the steps of determining a candidate group of stromal cell surface targets in pancreatic cancer.
도 17은 도 16의 프로세스에 따라 식별된 상위 10개의 후보 분자를 도시한 도면이다.Figure 17 is a diagram showing the top 10 candidate molecules identified according to the process of Figure 16.
도 18은 도 17에 따른 후보 물질이 인접한 정상 조직보다 발현율이 유의미하게 차이남을 도시한 도면이다.FIG. 18 is a diagram showing that the expression rate of the candidate substance according to FIG. 17 is significantly different from that of adjacent normal tissue.
도 19는 도 17에 따른 후보 물질의 정상 세포와 암 세포 간의 p-value 를 기재한 도면이다.FIG. 19 is a diagram illustrating the p-value between normal cells and cancer cells of the candidate substance according to FIG. 17.
도 20은 도 17에 따른 후보 물질의 발현율이 모든 임상 단계에서 정상세포보다 암세포에서 발현율이 높음을 도시한 도면이다.Figure 20 is a diagram showing that the expression rate of the candidate substance according to Figure 17 is higher in cancer cells than in normal cells at all clinical stages.
도 21은 유방암 공간전사체에서의 분포 및 타겟적합성을 기술하기 위해 유방암중 TNBC 타입 및 Luminal 타의 공간전사체 데이터에서 세포 종류를 추출하는 예시를 도시한 도면이다.Figure 21 is a diagram showing an example of extracting cell types from TNBC type and Luminal spatial transcriptome data among breast cancers to describe the distribution and target suitability in the breast cancer spatial transcriptome.
도 22는 종양의 8가지 종류의 세포 분포를 도시한 도면이다.Figure 22 is a diagram showing the distribution of eight types of cells in a tumor.
도 23은 타겟 세포와 세포 유형 간의 상관관계를 도시한 도면이다.Figure 23 is a diagram showing the correlation between target cells and cell types.
도 24는 타겟 적합성을 확인하기 위해 타겟과 양의 상관관계가 있는 세포의 발현을 예시적으로 도시한 도면이다.Figure 24 is a diagram illustrating the expression of cells positively correlated with the target to confirm target suitability.
도 25는 타겟과 상관관계가 있는 세포 유형을 비교 도시한 그래프이다.Figure 25 is a graph comparing cell types that are correlated with targets.
도 26은 양의 상관관계를 가지는 공간적 관련 유전자 추출 및 기능적 term을 제시한 도면이다.Figure 26 is a diagram showing spatially related gene extraction and functional terms with positive correlation.
도 27은 양의 상관관계를 가지는 유전자를 대상으로 경로 분석(pathway analysis)를 수행한 결과를 도시한 도면이다.Figure 27 is a diagram showing the results of pathway analysis targeting genes with positive correlation.
도 28은 음의 상관관계를 가지는 공간적 관련 유전자 추출 및 기능적 term을 제시한 도면이다.Figure 28 is a diagram showing the extraction of spatially related genes and functional terms with negative correlation.
도 29는 공간적 발현 패턴을 도시한 도면이다.Figure 29 is a diagram showing spatial expression patterns.
도 30은 The Cancer Genome Atlas(TCGA) 데이터베이스에서의 타겟적합성을 확인하기 위해 암종별로 타겟 발현값을 비교 도시한 그래프로, 유방암(BRCA)에서의 ANTXR1의 발현값이 정상세포와 종양 조직 간의 차이를 확인할 수 있다.Figure 30 is a graph comparing target expression values by cancer type to confirm target suitability in The Cancer Genome Atlas (TCGA) database. The expression value of ANTXR1 in breast cancer (BRCA) shows the difference between normal cells and tumor tissues. You can check it.
도 31은 타겟의 발현을 종양 및 정상세포 별로 비교한 것으로 GBM ESCA STAD HNSC KIRC CHOL COAD KIRP 등 종양에서 높은 발현율을 가지고 LUSC LUAD PRAD THCA BLCA UCEC CESC PCPG KICH 등의 정상 세포에서 높은 발현율을 가지는 것이 확인되는 것을 도시한 도면이다.Figure 31 compares the expression of the target by tumor and normal cells, confirming that it has a high expression rate in tumors such as GBM ESCA STAD HNSC KIRC CHOL COAD KIRP and a high expression rate in normal cells such as LUSC LUAD PRAD THCA BLCA UCEC CESC PCPG KICH This is a drawing showing what happens.
도 32는 암종별 세포주에서 타겟의 분포를 도시한 도면이다. Figure 32 is a diagram showing the distribution of targets in cell lines for each cancer type.
도 33은 정상세포군에서의 타겟의 분포를 도시한 도면이다.Figure 33 is a diagram showing the distribution of targets in a normal cell population.
도 34는 유방암 서브타입별 타겟의 분포를 도시한 도면으로, TNBC subtype의 ANTXR1 발현값이 상대적으로 낮은 것을 확인할 수 있는 도면이다.Figure 34 is a diagram showing the distribution of targets by breast cancer subtype, confirming that the ANTXR1 expression value of the TNBC subtype is relatively low.
도 35는 유방암 서브타입별 타겟 분포를 도시한 도면으로, 정상 세포 데이터를 포함한 도면이다.Figure 35 is a diagram showing target distribution by breast cancer subtype, including normal cell data.
도 36은 임상 변수와 타겟과의 연관성을 분석한 도면이다.Figure 36 is a diagram analyzing the correlation between clinical variables and targets.
도 37은 타겟-종양미세환경과의 연관성을 분석한 도면으로, 상피세포 기준으로 수행된다.Figure 37 is a diagram analyzing the relationship between target and tumor microenvironment, performed on the basis of epithelial cells.
도 38은 면역치료제 병용과 관련된 마커 분석 및 해석과 관련된 도면이다.Figure 38 is a diagram related to marker analysis and interpretation related to combined use of immunotherapy agents.
도 39는 본 발명의 예시적 실시예에 따른 단일 세포 전사체 분석을 통한 암 미세 환경 내 세포 클러스터의 표적 후보 추천 방법을 수행하는 컴퓨팅 장치에 대한 블록도이다.Figure 39 is a block diagram of a computing device that performs a method of recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis according to an exemplary embodiment of the present invention.
본 발명에 따른 컴퓨팅 장치에 의해 수행되는 단일 세포 전사체 정보를 통한 표적 후보 물질 제안 방법은, 비정상 조직과 정상조직의 단일 세포 전사체 정보를 수집하는 단계;단일 세포 전사체 정보에 대한 클러스터링 결과를 대응하는 세포군으로 매칭하는 단계;세포군 중 관심 세포군을 선정하는 단계;관심 세포군에서 비정상 조직에서 정상조직 대비 발현율이 높은 분자마커를 도출하는 단계; 및분자마커 중 표면 단백질 발현 가능한 분자마커를 도출하는 단계;를 포함한다. The method of proposing a target candidate through single cell transcriptome information performed by a computing device according to the present invention includes collecting single cell transcriptome information of abnormal tissue and normal tissue; clustering results for single cell transcriptome information; Matching with a corresponding cell group; Selecting a cell group of interest from among the cell groups; Deriving a molecular marker with a higher expression rate in abnormal tissue compared to normal tissue in the cell group of interest; and deriving a molecular marker capable of expressing surface proteins among the molecular markers.
이하, 첨부된 도면을 참조하여 본 개시의 바람직한 실시예들을 상세히 설명한다. 본 개시의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 개시의 기술적 사상은 이하의 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 이하의 실시예들은 본 개시의 기술적 사상을 완전하도록 하고, 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 본 개시의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 개시의 기술적 사상은 청구항의 범주에 의해 정의될 뿐이다.Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the attached drawings. The advantages and features of the present disclosure and methods for achieving them will become clear by referring to the embodiments described in detail below along with the accompanying drawings. However, the technical idea of the present disclosure is not limited to the following embodiments and may be implemented in various different forms. The following examples are merely intended to complete the technical idea of the present disclosure and to be used in the technical field to which the present disclosure belongs. It is provided to fully inform those skilled in the art of the scope of the present disclosure, and the technical idea of the present disclosure is only defined by the scope of the claims.
각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 개시를 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 개시의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.When adding reference numerals to components in each drawing, it should be noted that identical components are given the same reference numerals as much as possible even if they are shown in different drawings. Additionally, in describing the present disclosure, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present disclosure, the detailed description will be omitted.
다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다. 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 개시를 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다.Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used with meanings that can be commonly understood by those skilled in the art to which this disclosure pertains. Additionally, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless clearly specifically defined. The terminology used herein is for the purpose of describing embodiments and is not intended to limit the disclosure. As used herein, singular forms also include plural forms, unless specifically stated otherwise in the context.
또한, 본 개시의 구성 요소를 설명하는 데 있어서, 제1, 제2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 어떤 구성 요소가 다른 구성요소에 "연결", "결합" 또는 "접속"된다고 기재된 경우, 그 구성 요소는 그 다른 구성요소에 직접적으로 연결되거나 또는 접속될 수 있지만, 각 구성 요소 사이에 또 다른 구성 요소가 "연결", "결합" 또는 "접속"될 수도 있다고 이해되어야 할 것이다.Additionally, in describing the components of the present disclosure, terms such as first, second, A, B, (a), and (b) may be used. These terms are only used to distinguish the component from other components, and the nature, sequence, or order of the component is not limited by the term. When a component is described as being “connected,” “coupled,” or “connected” to another component, that component may be directly connected or connected to that other component, but there is another component between each component. It will be understood that elements may be “connected,” “combined,” or “connected.”
본 개시에서 사용되는 "포함한다 (comprises)" 및/또는 "포함하는 (comprising)"은 언급된 구성 요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성 요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.As used in this disclosure, “comprises” and/or “comprising” refers to a referenced component, step, operation and/or element that includes one or more other components, steps, operations and/or elements. Does not exclude presence or addition.
어느 하나의 실시예에 포함된 구성요소와, 공통적인 기능을 포함하는 구성 요소는, 다른 실시예에서 동일한 명칭을 사용하여 설명될 수 있다. 반대되는 기재가 없는 이상, 어느 하나의 실시예에 기재된 설명은 다른 실시예에도 적용될 수 있으며, 중복되는 범위 또는 당해 기술 분야에 속한 통상의 기술자가 자명하게 이해할 수 있는 범위 내에서 구체적인 설명은 생략될 수 있다.Components included in one embodiment and components including common functions may be described using the same name in other embodiments. Unless stated to the contrary, the description given in one embodiment can be applied to other embodiments, and detailed description will be omitted to the extent of overlap or to the extent that it can be clearly understood by a person skilled in the art. You can.
도 1은 본 발명에 따른 표적 후보 물질 제안 방법을 나타낸 순서도이다. 도 1을 참조하면, 본 발명에 따른 컴퓨팅 장치에 의해 수행되는 단일 세포 전사체 정보를 통한 표적 후보 물질 제안 방법은, 증상의 경중을 바탕으로 대조군, 환자의 단일 세포 전사체 정보를 수집하는 단계(S101), 수집된 단일 세포 전사체 정보를 클러스터링하는 단계(S102), 클러스터에 대응하는 세포군의 매칭을 수행하는 단계(S103), 대조군 클러스터에 대응되는 세포군 분포와 환자 클러스터에 대응되는 세포군 분포를 대조하는 단계(S104), 환자에서만 발현되는 세포군 클러스터에 대해서 표면 단백질 발현 가능 분자마커를 1차로 도출하는 단계(S105) 및 1차로 도출된 분자마커 중에 타겟 기능을 보유한 분자마커를 2차로 도출하는 단계(S106)를 포함할 수 있다. 1 is a flowchart showing a method for proposing a target candidate according to the present invention. Referring to Figure 1, the method of proposing a target candidate through single cell transcriptome information performed by a computing device according to the present invention includes the steps of collecting single cell transcriptome information of a control group and a patient based on the severity of symptoms ( S101), clustering the collected single cell transcriptome information (S102), performing matching of cell groups corresponding to clusters (S103), comparing the distribution of cell groups corresponding to the control cluster and the distribution of cell groups corresponding to the patient cluster. step (S104), firstly deriving molecular markers capable of expressing surface proteins for cell clusters expressed only in patients (S105), and secondly deriving molecular markers with target functions among the firstly derived molecular markers ( S106) may be included.
환자의 단일 세포 전사체 정보를 수집하는 단계(S101)는, 특정 질환을 가지는 환자의 단일 세포 전사체 정보를 직접 수행하거나 또는 외부에서 단일 세포 전사체 정보 데이터를 크롤링하여 수집하는 단계일 수 있다.The step of collecting single cell transcriptome information of a patient (S101) may be a step of collecting single cell transcriptome information of a patient with a specific disease directly or by crawling the single cell transcriptome information data from an external source.
단일세포 전사 유전체 분석은 세포를 분리하여 극미량의 재료로부터 DNA나 RNA를 증폭하고 시퀀싱하여 해당 세포의 유전체적 특징을 분석하는 기술이다. 직접 단일세포 전사체 유전체 분석을 수행하는 방법은 하기와 같다.Single cell transcriptional genome analysis is a technology that analyzes the genomic characteristics of a cell by isolating a cell, amplifying and sequencing DNA or RNA from a very small amount of material. The method of directly performing single cell transcriptome analysis is as follows.
먼저 조직 또는 혈액에서 세포 분리과정으로 세포 서스펜션을 진행한다. 다음 세포의 크기순으로 정렬하여 단일 세포를 획득한다. 단일세포 전사 유전체 분석에 있어서 단일 세포에서 획득가능한 DNA의 양이 피코그램 레벨에 불과하므로 시퀀싱이 가능한 수준의 나노그램 레벨로 증가시키기 위한 증폭 과정이 요구된다. 증폭에 있어서는 주로 PCR 과정 또는 MDA 과정이 이용되며, 이를 바탕으로 시퀀싱 라이브러리의 제작이 가능하다. RNA의 경우에는 역전사 과정을 통해 cDNA를 얻은 뒤 증폭 cDNA를 얻고 시퀀싱 라이브러리 제작하는 단계를 거친다. First, cell suspension is performed through a cell separation process from tissue or blood. Then obtain single cells by sorting them in order of cell size. In single cell transcriptional genome analysis, the amount of DNA that can be obtained from a single cell is only at the picogram level, so an amplification process is required to increase it to the nanogram level where sequencing is possible. For amplification, the PCR process or MDA process is mainly used, and a sequencing library can be produced based on this. In the case of RNA, cDNA is obtained through a reverse transcription process, and then amplified cDNA is obtained and a sequencing library is created.
각 세포로부터 제작되는 시퀀싱 라이브러리에는 바코드를 붙여 수십-수만 개의 샘플을 함께 시퀀싱 할 수 있으며, 시퀀싱 후 세포별 데이터 분리가 가능하다. 단일 세포를 이용하여 RNA 시퀀싱 과정을 통해서 얻어진 전사 유전체 자료를 분석하는 절차는 기준 서열(사람의 경우 hg18, hg19 등)에 alignment를 위해 TopHat과 GSNAP와 같은 mapping tool을 이용하고, 얻어진 자료를 HTs eq 방법 등으로 유전자의 발현을 측정하는 절차를 수행한다. 다음으로 RNA 시퀀싱 퀄리티 관리를 통해서 실험 자료의 질적 정도를 파악하여 질이 나쁜 경우 분석에서 제외하는 절차를 수행할 수 있다.Barcodes can be attached to the sequencing library produced from each cell, enabling tens to tens of thousands of samples to be sequenced together, and data for each cell can be separated after sequencing. The procedure for analyzing transcriptome data obtained through the RNA sequencing process using a single cell uses mapping tools such as TopHat and GSNAP for alignment to a reference sequence (hg18, hg19, etc. in humans), and HTs eq the obtained data. Perform procedures to measure gene expression using methods, etc. Next, through RNA sequencing quality management, the quality of the experimental data can be identified and if the quality is poor, a procedure can be performed to exclude it from the analysis.
전술한 사전처리 작업을 통해서 얻어진 전사 유전체 자료의 세포 특성을 확인하기 위해 클러스터링(clustering) 과정이 수행될 수 있다. 이를 통해 세포 간의 유사성이 확인 가능하다. 세포 간 또는 군집 간 특이 발현 유전자를 선별하는 절차는 edgeR, DESeq 등의 통계학 방법이 이용될 수도 있다. 이렇게 얻어진 집단 정보를 이용하여 다양한 집단 간 비교가 가능할 수 있다. 첫째로 같은 집단 내(within cell type) 분석을 통해 전사의 확률 성, 가변성 파악 이 가능할 수 있다. 또한, 조절 네트워크 추론, 대립 유전자 발현 패턴 분석이 가능할 수 있다. 두 번째로 세포 집단 간(between cell types) 분석을 통해서 세포 집단 간 차이를 보이는 바이오 마커 식별이 가능할 수 있다. A clustering process can be performed to confirm the cellular characteristics of the transcriptome data obtained through the above-mentioned preprocessing work. Through this, similarities between cells can be confirmed. Statistical methods such as edgeR and DESeq may be used to select specifically expressed genes between cells or between populations. Comparison between various groups may be possible using the group information obtained in this way. First, it may be possible to identify the stochasticity and variability of transcription through analysis within the same cell type. Additionally, regulatory network inference and allelic expression pattern analysis may be possible. Second, through analysis between cell types, it may be possible to identify biomarkers that show differences between cell types.
이러한 방법으로 획득된 단일 세포 전사체 정보는 직접 조직 검사를 통해 획득될 수도 있으나, 다른 연구기관 및 병원에서 수행된 공개 자료 들을 외부에서 수집할 수도 있다.Single cell transcriptome information obtained in this way can be obtained through direct tissue examination, but can also be collected externally from public data conducted at other research institutes and hospitals.
본 단계(S101)는 염증 질환 환자의 정보를 수집하는 단계일 수 있으며, 나아가 경증환자 및 중증환자 정보를 별도로 수집하는 단계일 수 있다. This step (S101) may be a step of collecting information on patients with inflammatory diseases, and may further be a step of separately collecting information on mild patients and severe patients.
수집된 단일 세포 전사체 정보를 클러스터링하는 단계(S102)는, 비지도학습에 의한 클러스터링 되는 것을 특징으로 할 수 있다. 도 5를 참조하면, 중증 환자, 경증 환자, 대조군 단일 세포 전사체 정보에 대한 클러스터링 결과가 도시된다.The step of clustering the collected single cell transcriptome information (S102) may be characterized by clustering using unsupervised learning. Referring to Figure 5, clustering results for single cell transcriptome information in severe patients, mild patients, and controls are shown.
클러스터에 대응하는 세포군의 매칭을 수행하는 단계(S103)는, 클러스터링 된 집단이 대응되는 세포군을 확인하여 매칭하는 단계이다. The step of matching the cell group corresponding to the cluster (S103) is a step of identifying and matching the cell group to which the clustered group corresponds.
대조군 클러스터에 대응되는 세포군 분포와 환자 클러스터에 대응되는 세포군 분포를 대조하는 단계(S104)는, 환자 클러스터에서 특이적으로 발현되는 유전체를 선별하기 위한 단계이다. 환자군과 대조군의 단일 전사체 클러스터는 확연히 구별되므로 이를 이용하여 환자에서 특이적으로 발생하는 유전체의 선별이 용이할 수 있다. 상세히는 도 2에서 후술한다.The step of contrasting the cell population distribution corresponding to the control cluster and the cell population distribution corresponding to the patient cluster (S104) is a step for selecting a genome specifically expressed in the patient cluster. Since the single transcript clusters of the patient group and the control group are clearly distinct, it can be easy to use this to select genomes that specifically occur in patients. Details will be described later in Figure 2.
환자에서만 발현되는 세포군 클러스터에 대해서 표면 단백질 발현 가능 분자마커를 1차로 도출하는 단계(S105)는, 환자에서 특이적으로 발현되는 면역세포군을 선별하여, 표면 단백질 결합이 가능한 분자 마커를 1차적으로 선별함으로써 타겟 물질의 후보군의 설정이 가능한 단계이다.The step (S105) of first deriving molecular markers capable of expressing surface proteins for cell clusters expressed only in patients is to select immune cell groups specifically expressed in patients and primarily select molecular markers capable of binding to surface proteins. This is the step where it is possible to set up a candidate group of target materials.
1차로 도출된 분자마커 중에 타겟 기능을 보유한 분자마커를 2차로 도출하는 단계(S106)는, 타겟 기능 예를 들어, 이미징, 바인딩 기능을 가지는 분자 마커를 도출할 수 있다. 2차 분자마커의 도출은 1차 분자마커와 공지된 이미징 DB 또는 바인딩 DB과의 교집합을 충족하는 분자 마커를 도출함으로써 구현 가능하다.In the step (S106) of secondarily deriving a molecular marker having a target function among the firstly derived molecular markers, a molecular marker having a target function, for example, imaging or binding function, can be derived. Derivation of the secondary molecular marker can be implemented by deriving a molecular marker that satisfies the intersection between the primary molecular marker and a known imaging DB or binding DB.
도 2는 세포군 분포를 대조하는 방법을 나타낸 순서도이다. 도 2를 참조하면, 세포군 분포를 대조하는 단계(S104)는, 대조군 클러스터와 환자 클러스터로부터 면역세포의 분포 비율을 각각 비교하는 단계(S104-1) 및Figure 2 is a flowchart showing a method for comparing cell population distribution. Referring to Figure 2, the step of comparing the distribution of cell populations (S104) includes the step of comparing the distribution ratio of immune cells from the control cluster and the patient cluster (S104-1), respectively.
소정의 기준치 이상으로 발현되는 면역세포를 선별하는 단계(S104-2)를 포함할 수 있다.It may include a step (S104-2) of selecting immune cells expressed above a predetermined reference value.
나아가, 본 단계(S104)는 면역세포의 유전체 발현 카운트(count)를 정규화(normalize)하고, 대조군과의 차이를 P-value를 기준으로 나열하는 단계(S104-3)을 더 포함할 수 있다.Furthermore, this step (S104) may further include a step (S104-3) of normalizing the genome expression count of immune cells and listing the differences from the control group based on P-value.
도 3은 일 실시예에 따른 분자마커를 도출하는 방법을 나타낸 순서도이다. 도 3을 참조하면, 표면 단백질 발현 가능 분자마커를 1차로 도출하는 단계(105)는, 대조군과의 차이가 정량적으로 가장 큰 클러스터를 선별하는 단계(105-1) 및 클러스터 유전체 정보와 표면 단백질 유전체 DB의 교집합에 해당하는 표면 단백질 1차 분자마커를 도출하는 단계(105-2)를 포함할 수 있다. Figure 3 is a flowchart showing a method for deriving molecular markers according to one embodiment. Referring to FIG. 3, the step 105 of first deriving molecular markers capable of expressing surface proteins includes the step 105-1 of selecting a cluster with the largest quantitative difference from the control group, and the step 105-1 of selecting cluster genome information and surface protein genome. It may include a step (105-2) of deriving a surface protein primary molecular marker corresponding to the intersection of DBs.
도 3을 참조하면, 타겟 기능을 보유한 분자마커를 2차로 도출하는 단계(S106)는, 표면 단백질 1차 분자마커와 이미징 DB와의 교집합을 확인하여 타겟 기능을 가지는 2차적 잠재적 후보를 도출하는 단계를 포함할 수 있다. Referring to Figure 3, the step of deriving a molecular marker with a target function secondarily (S106) is a step of deriving a secondary potential candidate with a target function by checking the intersection between the surface protein primary molecular marker and the imaging DB. It can be included.
도 4는 다른 실시예에 따른 분자마커를 도출하는 방법을 나타낸 순서도이다. 도 4를 참조하면, 표면 단백질 발현 가능 분자마커를 1차로 도출하는 단계(105)는, 중증환자에서 유의미하게 소정의 기중치 이상으로 발현되는 클러스터 중 대조군과의 차이가 정량적 크기를 내림차순으로 소정의 개수 만큼 선별하는 단계 (105'-1) 선별된 개수의 클러스터 유전체 정보와 표면 단백질 유전체 DB의 교집합을 확인하여 선별된 클러스터 모두와 반응할 수 있는 표면 단백질 1차적 잠재적 후보(Potential Target)을 확인하는 단계 (105'-2)를 포함할 수 있다. Figure 4 is a flowchart showing a method for deriving molecular markers according to another example. Referring to FIG. 4, in the step 105 of first deriving molecular markers capable of expressing surface proteins, the difference from the control group among clusters significantly expressed above a predetermined threshold in seriously ill patients is determined by a predetermined number in descending order of quantitative size. Step of selecting according to the number (105'-1) Checking the intersection of the selected number of cluster genome information and the surface protein genome database to identify the primary surface protein potential candidate (Potential Target) that can react with all of the selected clusters. Step (105'-2) may be included.
도 1 내지 도 4에서 도시된 본 발명에 따른 표적 후보 물질 제안 방법은 장치에 의해 수행되거나 실행하기 위해 기록매체에 저장된 컴퓨터 프로그램에 의해 구현될 수 있다. The method for proposing a target candidate according to the present invention shown in FIGS. 1 to 4 may be performed by a device or implemented by a computer program stored in a recording medium for execution.
도 5는 예시적인 질환에 대해서 중증 환자, 경증 환자, 대조군에 대한 단일 세포 전사체 정보를 분석한 도면이다. 도 5를 참조하면, 바이러스 대조군(HC), 경증환자(M), 중증환자(S)에 대한 단일 세포 전사체 정보를 비지도학습을 통해 클러스터링 한뒤, 면역세포군과의 매칭 결과가 도시된다.Figure 5 is a diagram analyzing single cell transcriptome information for severe patients, mild patients, and control groups for exemplary diseases. Referring to Figure 5, the single cell transcriptome information for the virus control group (HC), mild patient (M), and severe patient (S) is clustered through unsupervised learning, and the result of matching with the immune cell group is shown.
도 5를 참조하면, 대조군(HC)의 면역세포군 활성화 양상과 경증환자(M) 및 중증환자(S)의 면역세포군 활성화 양상이 각기 상이한 것을 시각적으로 확인할 수 있다.Referring to Figure 5, it can be visually confirmed that the immune cell group activation pattern of the control group (HC) and the immune cell group activation pattern of the mild patient (M) and the severe patient (S) are different.
도 6은 환자 종류에 따른 면역세포군의 분포를 도시한 그래프이다.도 6을 참조하면, 도 5에서 도시된 면역세포군 활성화 정도를 정량적으로 비교 도시한 결과가 도시된다. 도 5를 참조하면, 중증환자(S)와 대조군(HC)에 있어 M01 면역세포, M03 면역세포의 발현율이 소정의 기준치 이상, 예를 들어 300% 이상 차이나는 것을 확인할 수 있다.Figure 6 is a graph showing the distribution of immune cell groups according to patient type. Referring to Figure 6, the results of a quantitative comparison of the degree of activation of the immune cell groups shown in Figure 5 are shown. Referring to Figure 5, it can be seen that the expression rates of M01 immune cells and M03 immune cells in seriously ill patients (S) and controls (HC) differ by more than a predetermined standard value, for example, more than 300%.
도 7은 발현율이 높은 클러스터에 대한 표면 단백질 DB와의 교집합 및 타겟 기능을 가지는 DB와의 교집합을 나타낸 도면이다. 도 7을 참조하면, 도 5 및 도 6을 통해 특정 질환 환자에 있어서 소정의 기준치 이상 발현된 면역세포군을 특정하고, 해당 면역세포군의 유전체를 타겟 후보 물질의 후보 풀(pool)로 정의 한 뒤, 표면 단백질 발현 가능한 분자마커의 도출을 위해, Surfaceome 등 표면 단백질 DB와의 교집합을 바탕으로 1차 분자마커를 도출하고, 1차 분자마커와 타겟 기능 예를 들어 이미징을 가지는 2차 분자마커의 도출을 위해, 이미징 DB와의 교집합을 통해 타겟 후보 분자마커의 도출을 수행한 결과가 도시된다.Figure 7 is a diagram showing the intersection of a cluster with a high expression rate with the surface protein DB and the intersection with the DB having a target function. Referring to Figure 7, through Figures 5 and 6, the immune cell group expressed above a predetermined reference value in a patient with a specific disease is identified, and the genome of the immune cell group is defined as a candidate pool of target candidate material, To derive molecular markers that can express surface proteins, primary molecular markers are derived based on the intersection with surface protein databases such as Surfaceome, and to derive secondary molecular markers that have a target function, such as imaging, between the primary molecular marker and target function. , the results of deriving target candidate molecular markers through intersection with the imaging DB are shown.
도 7을 참조하면, 중증환자에서 발현율이 대조군 대비 소정의 기준치 이상(예를 들어 300%)인 M01 및 M03의 유전체와 표면 단백질 DB(Surfaceome) 및 이미징 DB(PETdb)와의 교집합 요건을 모두 만족시키는 타겟 분자마커 최종 후보가 SLC43A2, SLC2A3, FOLR2 등으로 도출되는 것을 확인할 수 있다. Referring to Figure 7, the expression rate in critically ill patients satisfies all requirements for intersection between the genome of M01 and M03, whose expression rate is above a predetermined reference value compared to the control group (e.g., 300%), and the surface protein DB (Surfaceome) and imaging DB (PETdb). It can be seen that the final candidates for the target molecular marker are SLC43A2, SLC2A3, and FOLR2.
이러한 본 발명에 따른 단일 세포 전사체 정보를 통한 표적 후보 물질 제안 방법을 통해 타겟 기능을 가지며 타겟 질환에 대한 후보 물질의 빠르고 정밀한 도출이 가능할 수 있다. Through this method of suggesting a target candidate using single cell transcriptome information according to the present invention, it may be possible to quickly and precisely derive a candidate material for a target disease that has a target function.
도 8은 클러스터 별로 가장 많이 발현되는 분자마커를 10개씩 도시하고, 분자마커 중 표면 단백질 DB와의 교집합 조건을 만족하는 유전체는 *표시가 도시된 그래프이다.Figure 8 shows the 10 most expressed molecular markers for each cluster, and the genome that satisfies the intersection condition with the surface protein DB among the molecular markers is marked with *.
도 8을 참조하면, 각 면역세포군 별(M01, M02, M03, M04) 가장 많이 발현된 유전체를 10개 선별하여 나열하고, 해당 유전체 중 표면 단백질 DB와의 교집합 조건을 만족하는 1차 분자마커 후보가 *표시된 결과가 예시적으로 도시된다.Referring to Figure 8, the 10 most expressed genomes for each immune cell group (M01, M02, M03, M04) are selected and listed, and among the genomes, the primary molecular marker candidate that satisfies the intersection conditions with the surface protein DB is selected. The results indicated by * are shown as examples.
예를 들어, M01 면역세포의 경우 CD300E, CCR1, EMP1, TNFSF13B, LILRA5, IL1R2, FPR1, LILRB2, LILRB1, IFNGR2 의 10개 유전체가 발현율이 가장 높으며, 이중 표면 단백질 DB와의 교집합 조건을 만족하는 1차 분자마커 후보가 CCR1, FPR1에 대응되는 것을 확인할 수 있다.For example, in the case of M01 immune cells, the 10 genomes of CD300E, CCR1, EMP1, TNFSF13B, LILRA5, IL1R2, FPR1, LILRB2, LILRB1, and IFNGR2 have the highest expression rates, and the primary genome that satisfies the intersection condition with the double surface protein DB It can be confirmed that the molecular marker candidates correspond to CCR1 and FPR1.
이 1차 분자마커 후보를 대상으로 타겟 기능 예를 들어, 이미징 또는 바인딩 기능을 가지는 후보를 찾고자 하는 경우 이미징 DB(PETdb)나 바인딩DB와의 2차 교집합 조건을 만족하는 지 여부를 확인함으로써 타겟 물질 도출의 효율적인 수행이 가능할 수 있다. If you want to find a candidate with a target function, such as an imaging or binding function, for this primary molecular marker candidate, derive the target material by checking whether it satisfies the secondary intersection condition with the imaging DB (PETdb) or binding DB. efficient performance may be possible.
도 9는 본 발명에 따른 단일 세포 전사체 분석을 통한 암 미세 환경 내 세포 클러스터의 표적 후보 추천 방법에 대한 순서도를 도시한 도면이다. Figure 9 is a flow chart showing a method for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis according to the present invention.
단계 S201은, 암 조직과 정상조직의 단일 세포 전사체 정보를 수집하는 단계이다. 본 단계(S201)는 특정 질환을 가지는 환자의 단일 세포 전사체 정보 및 비교군인 정상 세포 전사체 정보 수집을 직접 수행하거나 또는 외부에서 단일 세포 전사체 정보 데이터를 크롤링하여 수집하는 단계일 수 있다. 단일세포 전사 유전체 분석은 세포를 분리하여 극미량의 재료로부터 DNA나 RNA를 증폭하고 시퀀싱하여 해당 세포의 유전체적 특징을 분석하는 기술로 자세한 내용은 도 1에 따른 내용에서 전술한바 중복되는 내용은 생략한다.Step S201 is a step of collecting single cell transcriptome information of cancer tissue and normal tissue. This step (S201) may be a step of collecting single cell transcriptome information of a patient with a specific disease and transcriptome information of normal cells as a comparison group directly or by crawling and collecting single cell transcriptome information data from an external source. Single cell transcriptional genome analysis is a technology that separates cells, amplifies DNA or RNA from a very small amount of material, and analyzes the genomic characteristics of the cell by sequencing. Details are described above in Figure 1, so duplicate information will be omitted. .
본 실시예에서는 대상 타겟 조직을 암 조직으로 특정하였으나 이에 한정되는 것은 아니고, 일체의 비정상 조직이 대상이 될 수 있음은 물론이다.In this embodiment, the target tissue is specified as a cancer tissue, but it is not limited to this, and of course, any abnormal tissue can be the target.
단계 S202는, 단일 세포 전사체 정보를 비지도학습으로 클러스터링을 수행하는 단계이다. 본 단계는 비지도 클러스터링을 사용하여 전체 전사체 발현 프로파일이 유사한 개별 세포를 그룹화하는 것 일 수 있다. 상세한 내용은 도 1에 따른 단계 S102에서 전술한바 중복되는 내용은 생략한다. 클러스터링의 기준은 전체 전사체 발현 프로파일의 유사도가 기준치를 넘어서는 지를 기준으로 그룹화 할지 여부를 판단할 수 있다. Step S202 is a step of performing clustering on single cell transcriptome information using unsupervised learning. This step may be to group individual cells with similar overall transcript expression profiles using unsupervised clustering. Details are described above in step S102 according to FIG. 1, and redundant information is omitted. The criterion for clustering is whether to group or not based on whether the similarity of the entire transcript expression profile exceeds the standard value.
단계 S203은, 단일 세포 전사체 정보에 대한 클러스터링 결과를 대응하는 세포군의 매칭을 수행하는 단계이다. 각 클러스터가 나타내는 세포 유형은 대표적인 세포 유형 특이적 생체 마커의 발현율을 계산하여 결정할 수 있다. 상세한 내용은 도 1에 따른 단계 S103에서 전술한바 중복되는 내용은 생략한다. Step S203 is a step of matching the clustering result for single cell transcript information with the corresponding cell group. The cell type represented by each cluster can be determined by calculating the expression rate of representative cell type-specific biomarkers. Details are described above in step S103 according to FIG. 1, and redundant information is omitted.
단계 S204는, 세포군 중 관심 세포군을 선정하는 단계이다. 관심 세포군은 섬유아세포나 침투성 면역세포와 같은 관심 있는 세포 유형 클러스터를 의미할 수 있다. 관심 세포군의 선정은 특이적 생체 마커의 발현율을 연산하여 선정할 수 있다.Step S204 is a step of selecting a cell group of interest from among the cell groups. Cell population of interest may refer to a cluster of cell types of interest, such as fibroblasts or infiltrating immune cells. The cell group of interest can be selected by calculating the expression rate of a specific biomarker.
단계 S205 는, 관심 세포군에서 암조직에서 정상조직 대비 발현율이 높은 분자마커를 도출하는 단계이다. 이는 유전자 발현에 큰 차이가 있는 분자적 대상을 찾는 단계를 의미할 수 있다.Step S205 is a step of deriving molecular markers with a higher expression rate in cancer tissues compared to normal tissues in the cell group of interest. This may refer to the step of finding molecular targets that cause significant differences in gene expression.
단계 S206은, 분자마커 중 표면 단백질 발현 가능한 분자마커를 도출하는 단계이다. 예를 들어, 차이가 있는 유전자 목록과 Surfaceome 데이터베이스의 세포 표면 단백질 목록을 교차하여, 세포 외부 표면에서 발현되는 분자적 대상(분자마커)을 얻을 수 있다.Step S206 is a step of deriving a molecular marker capable of expressing a surface protein among the molecular markers. For example, by intersecting the list of genes with differences with the list of cell surface proteins in the Surfaceome database, molecular targets (molecular markers) expressed on the outer surface of cells can be obtained.
단계 S207은 정상인과 환자의 조직 전사체 정보를 수집하는 단계이다. Step S207 is a step of collecting tissue transcriptome information of normal people and patients.
단계 S208은 표면 단백질 발현 가능한 분자마커를 조직 전사체 정보에서 검증하여 특이적 마커를 도출하는 단계이다. Step S208 is a step of deriving specific markers by verifying molecular markers capable of expressing surface proteins from tissue transcript information.
도 10은 다른 실시예에 따른 단일 세포 전사체 분석을 통한 암 미세 환경 내 세포 클러스터의 표적 후보 추천 방법에 대한 순서도를 도시한 도면이다. 도 10을 참조하여 각 단계를 기술하면 아래와 같다.Figure 10 is a flowchart of a method for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis according to another embodiment. Each step is described with reference to FIG. 10 as follows.
단계 S301은, 암 조직과 정상조직의 단일 세포 전사체 정보를 수집하는 단계이다. 본 단계(S301)는 특정 질환을 가지는 환자의 단일 세포 전사체 정보 및 비교군인 정상 세포 전사체 정보 수집을 직접 수행하거나 또는 외부에서 단일 세포 전사체 정보 데이터를 크롤링하여 수집하는 단계일 수 있다. 단일세포 전사 유전체 분석은 “1개”의 세포를 분리하여 극미량의 재료로부터 DNA나 RNA를 증폭하고 시퀀싱하여 해당 세포의 유전체적 특징을 분석하는 기술로 자세한 내용은 도 1에 따른 내용에서 전술한바 중복되는 내용은 생략한다.Step S301 is a step of collecting single cell transcriptome information of cancer tissue and normal tissue. This step (S301) may be a step of directly collecting single cell transcriptome information of a patient with a specific disease and normal cell transcriptome information as a comparison group, or may be a step of collecting single cell transcriptome information data by crawling from an external source. Single cell transcriptional genome analysis is a technology that isolates “one” cell, amplifies DNA or RNA from a very small amount of material, and analyzes the genomic characteristics of that cell by sequencing. Details are described above in Figure 1. Omit any content that does not apply.
본 실시예에서는 대상 타겟 조직을 암 조직으로 특정하였으나 이에 한정되는 것은 아니고, 일체의 비정상 조직이 대상이 될 수 있음은 물론이다.In this embodiment, the target tissue is specified as a cancer tissue, but it is not limited to this, and of course, any abnormal tissue can be the target.
단계 S302는, 단일 세포 전사체 정보를 비지도학습으로 클러스터링을 수행하는 단계이다. 본 단계는 비지도 클러스터링을 사용하여 전체 전사체 발현 프로파일이 유사한 개별 세포를 그룹화하는 것 일 수 있다. 상세한 내용은 도 1에 따른 단계 S102에서 전술한바 중복되는 내용은 생략한다.Step S302 is a step of performing clustering on single cell transcriptome information using unsupervised learning. This step may be to group individual cells with similar overall transcript expression profiles using unsupervised clustering. Details are described above in step S102 according to FIG. 1, and redundant information is omitted.
단계 S303은, 단일 세포 전사체 정보에 대한 클러스터링 결과를 대응하는 세포군의 매칭을 수행하는 단계이다. 각 클러스터가 나타내는 세포 유형은 대표적인 세포 유형 특이적 생체 마커의 발현율을 계산하여 결정할 수 있다. 상세한 내용은 도 1에 따른 단계 S103에서 전술한바 중복되는 내용은 생략한다. Step S303 is a step of matching the clustering result for single cell transcript information with the corresponding cell group. The cell type represented by each cluster can be determined by calculating the expression rate of representative cell type-specific biomarkers. Details are described above in step S103 according to FIG. 1, and redundant information is omitted.
단계 S304는, 세포군 중 관심 세포군을 선정하는 단계이다. 관심 세포군은 섬유아세포나 침투성 면역세포와 같은 관심 있는 세포 유형 클러스터를 의미할 수 있다.Step S304 is a step of selecting a cell group of interest from among the cell groups. Cell population of interest may refer to a cluster of cell types of interest, such as fibroblasts or infiltrating immune cells.
단계 S305 는, 관심 세포군에서 암조직에서 정상조직 대비 발현율이 높은 분자마커를 도출하는 단계이다. 이는 유전자 발현에 큰 차이가 있는 분자적 대상을 찾는 단계를 의미할 수 있다.Step S305 is a step of deriving molecular markers with a higher expression rate in cancer tissues compared to normal tissues in the cell group of interest. This may refer to the step of finding molecular targets that cause significant differences in gene expression.
단계 S306은, 분자마커 중 표면 단백질 발현 가능한 분자마커를 도출하는 단계이다. 예를 들어, 차이가 있는 유전자 목록과 Surfaceome 데이터베이스의 세포 표면 단백질 목록을 교차하여, 세포 외부 표면에서 발현되는 분자적 대상(분자마커)을 얻을 수 있다.Step S306 is a step of deriving a molecular marker capable of expressing a surface protein among the molecular markers. For example, by intersecting the list of genes with differences with the list of cell surface proteins in the Surfaceome database, molecular targets (molecular markers) expressed on the outer surface of cells can be obtained.
단계 S307은, 표면 단백질 발현 가능한 분자마커를 조직 전사체 정보에서 검증하여 암 특이적 마커 후보를 도출하는 단계이다.Step S307 is a step of deriving cancer-specific marker candidates by verifying molecular markers capable of expressing surface proteins from tissue transcriptome information.
단계 S308은, 타겟 암종의 레퍼런스 세포들에 대한 공간전사체 분포를 확인하는 단계이다. 도 21에서 참조하듯 타겟 암종의 공간전사체 데이터를 바탕으로 세포 종류에 따른 공간전사체를 확인하여 후술하는 단계에서의 검증 데이터의 기초로 삼을 수 있다.Step S308 is a step of confirming the spatial transcriptome distribution for reference cells of the target carcinoma. As shown in Figure 21, based on the spatial transcriptome data of the target carcinoma, the spatial transcriptome according to cell type can be confirmed and used as the basis for verification data in the step described later.
단계 S309 는, 암 특이적 마커 후보에 대한 공간전사체 분포와 레퍼런스 세포들에 대한 공간전사체 분포의 상관관계를 도출하는 단계이다. 도 22에서 도시된바와 같이 알고리즘 등을 이용하여 암종 세포들의 공간전사체 분포를 확인한 Score 분포를 reference data로 설정하고, single cell RNA req를 통해 찾아낸 타겟의 공간전사체 분포를 대비하여 상관관계(correlation)를 도출할 수 있다.Step S309 is a step of deriving the correlation between the spatial transcript distribution for the cancer-specific marker candidate and the spatial transcript distribution for the reference cells. As shown in Figure 22, the Score distribution that confirmed the spatial transcriptome distribution of carcinoma cells using an algorithm, etc. was set as reference data, and correlation was compared with the spatial transcriptome distribution of the target found through single cell RNA req. ) can be derived.
단계 S310은, 양의 상관관계를 가지는 세포와 암 특이적 마커 후보의 세포 분포를 2차적으로 확인하는 단계이다. 도 23 및 도 24에서 기술된바와 같이 도출된 특이적 마커 후보 중 ANTXR1이라는 타겟이 chondrocyte, firoblast, iDC, stramascore와 양의 상관관계를 가짐을 확인할 수 있다. 이때, 레퍼런스 세포의 조직과 상관관계가 높은 특이적 마커 후보 및 그 분포의 확인은 타겟 세포와의 정합성을 확인하는 근거가 될 수 있다. Step S310 is a step to secondarily confirm the cell distribution of cells with a positive correlation and cancer-specific marker candidates. It can be confirmed that among the specific marker candidates derived as described in Figures 23 and 24, a target called ANTXR1 has a positive correlation with chondrocyte, firoblast, iDC, and stramascore. At this time, confirmation of a specific marker candidate with a high correlation with the tissue of the reference cell and its distribution can serve as a basis for confirming consistency with the target cell.
단계 S311은, 암 특이적 마커 후보의 유전자 연관성을 판단하는 단계이다. 본 단계는 도 26, 27에서 도시된 바와 같이 Gene Ontology 분석 및 경로 검색을 양의 상관관계를 가지는 후보 유전체에게 수행하여 최종적인 유전자 연관성을 도출하고 특이적 마커 후보에 대한 최종 타겟 정보를 도출하는 단계일 수 있다. 이는 전 단계S310에서 타겟 세포와의 정합성을 확인한 특이적 마커 후보에 있어서 재차 유효한 마커 후보를 판별하는 단계로, 타겟 세포와의 정합성이 아닌 유전자 끼리의 연관성을 분석하는 단계일 수 있다. 크게 Gene Ontology(GO) 분석 또는 경로 분석(pathway analysis)를 이용하여 유전자 연관성을 정량적으로 판단하는 단계일 수 있다.Step S311 is a step of determining the genetic relationship of the cancer-specific marker candidate. In this step, as shown in Figures 26 and 27, Gene Ontology analysis and path search are performed on candidate genomes with positive correlation to derive final gene correlation and final target information for specific marker candidates. It can be. This is a step of determining effective marker candidates again from the specific marker candidates whose consistency with the target cell was confirmed in the previous step S310, and may be a step of analyzing the correlation between genes rather than their consistency with the target cell. Broadly speaking, this can be a step to quantitatively determine gene correlation using Gene Ontology (GO) analysis or pathway analysis.
이 경우 특이적 마커 후보가 타겟 세포뿐만 아니라 어떤 유전자들과 높은 연관성을 가지고, 어떤 유전자들에게 영향을 주는지에 대한 부분이 정량적으로 판단된 유전자 연관성가 연계되어 판단될 수 있다. 예를 들어, 본 발명에 세포 클러스터의 표적 후보 추천 방법이 타겟으로 하는 암종(비정상 조직)이 있을 경우 해당 암종의 발현에 기준치 이상의 영향성을 주는 암발현 관련 유전자를 상술한 유전자 연관성 판단 단계(S311)를 통해 판별할 수 있다. In this case, the specific marker candidate has a high correlation with not only the target cell but also certain genes, and which genes it affects can be determined by linking the quantitatively determined gene correlation. For example, if there is a carcinoma (abnormal tissue) targeted by the cell cluster target candidate recommendation method of the present invention, a gene correlation determination step (S311 ) can be determined through.
이를 통해 특이적 마커 후보가 1)관심 세포군 및 표면 단백질 발현이 가능하도록 1차 sorting 되고 (S306 단계) 2)타겟 세포와의 상관관계를 바탕으로 2차 sorting 되고 (S309, S310 단계) 3)대상 암종의 발현에 기준치 이상의 영향을 주는 암발현 관련 유전자의 숫자가 많은지 여부를 판단으로 3차 sorting 하여 (S311 단계), 최종적으로 최종 암 특이적 마커(최종 비정상조직 특이적 마커)를 선별할 수 있다(S312)단계. 이는, 기본적으로 공간전사체 정보를 활용하여 최소 수천개에서 수만개에 이르는 특이적 마커 후보에 있어서 1차 sorting에서 10% 수준으로 선별이 가능하고, 2차 sorting에서 다시 10% 수준의 선별이 가능하고 최종적으로 타겟 세포뿐 아니라 암종의 발현에 직접적으로 영향을 끼치는 암발현 관련 유전자의 숫자가 많을수록, 해당 관련 유전자와 양의 상관관계를 가지는 최종 특이적 마커 후보가 차후 약물 전달(drug delivery)시 대상 세포에 암종의 발 현에 영향을 끼치는 유전자에 직접적으로 전달할 수 있는 확률을 높힘으로써 유효한 최종 특이적 마커 후보를 용이하게 골라낼 수 있고, 이를 통해 임상실험을 통한 종래의 특이적 마커 후보 선정방법에 비해 최소 10배에서 100배까지의 실험 효율을 향상시킬 수 있다.Through this, specific marker candidates are 1) first sorted to enable expression of cell populations and surface proteins of interest (step S306), 2) secondly sorted based on correlation with target cells (steps S309, S310), and 3) target. Third sorting is performed by determining whether the number of genes related to cancer expression that affects the expression of carcinoma is greater than the standard value (step S311), and finally the final cancer-specific marker (final abnormal tissue-specific marker) can be selected. Step (S312). Basically, by using spatial transcriptome information, it is possible to select at least 10% of specific marker candidates ranging from thousands to tens of thousands in the first sorting, and again at the 10% level in the second sorting. Ultimately, the greater the number of cancer expression-related genes that directly affect the expression of not only target cells but also carcinoma, the more likely it is that the final specific marker candidate that has a positive correlation with the relevant genes will be used in the target cells during subsequent drug delivery. By increasing the probability of direct delivery to genes that affect the expression of carcinoma, effective final specific marker candidates can be easily selected, and this makes it possible to easily select effective final specific marker candidates, compared to the conventional method of selecting specific marker candidates through clinical trials. Experimental efficiency can be improved by at least 10 to 100 times.
도 11은 데이터 클러스터링 후 군집 분포와 두 데이터 그룹의 상대 분포를 나타낸 단일 세포 전사체 프로파일을 도시한 도면이다.Figure 11 is a diagram showing a single cell transcriptome profile showing the cluster distribution after data clustering and the relative distribution of two data groups.
도 12는 정상 세포와 암 세포의 단일 세포 전사체 프로파일을 비교 도시한 도면이다.Figure 12 is a diagram comparing single cell transcriptome profiles of normal cells and cancer cells.
도 13은 도 12에 따른 정상 세포의 단일 세포 전사체 프로파일의 발현 유전자를 도시한 도면이다.Figure 13 is a diagram showing the expression genes of the single cell transcriptome profile of normal cells according to Figure 12.
도 14는 각 클러스터의 세포 유형을 보여주는 도트 플롯을 도시한 도면이다.Figure 14 is a diagram showing a dot plot showing the cell type of each cluster.
도 15는 단일 세포 전사체를 3개의 메인 세포 유형으로 분류한 도면이다.Figure 15 is a diagram classifying the single cell transcriptome into three main cell types.
도 16은 췌장암에서 간질 세포 표면 타겟의 후보군을 결정하는 단계를 도시한 도면이다. 도 16을 참조하면, 정상 조직과 비정상 조직(악성종양-tumor tissue)를 종합적으로 분석하여 유의미하게 발현율에 차이를 보이는 타겟 후보 유전체들을 1차 선별(4030)하고, 표면 단백질에 대한 마커 그룹에 대한 데이터베이스(surfaceome DB,2608)과의 교집함을 통해 획득한 333개의 타겟 특이적 마커의 후보 물질을 확인할 수 있다. 이러한 절차를 통해 간이하게 333개의 타겟 물질을 확인한 뒤 유전자 발현량을 정량적으로 확인하기 위해 log2(FC) 값의 평균 값을 확인하여 내림차순 순으로 정렬할 수 있다.Figure 16 is a diagram illustrating the steps of determining a candidate group of stromal cell surface targets in pancreatic cancer. Referring to FIG. 16, normal tissue and abnormal tissue (malignant tumor-tumor tissue) are comprehensively analyzed to initially select (4030) target candidate genomes that show significant differences in expression rates, and a marker group for surface proteins is selected. Candidate materials for 333 target-specific markers obtained through intersection with the database (surfaceome DB, 2608) can be confirmed. Through this procedure, 333 target substances can be easily identified and then sorted in descending order by checking the average log 2 (FC) value to quantitatively check the gene expression level.
FC는 Fold Change의 약자로 유전자 발현량을 정량적으로 정의한 것으로 FC = treatment / control 으로 정의된다. 여기서 treatment는 비교 조건, Control은 기준 조건 값이다. 따라서, 두 조건에서의 발현량이 동일하면 FC = 1이다. 이를, 대칭적인 값으로 처리하고자 log2(FC)로 발현량을 판별할 수 있으며, 도 17에서 333개의 췌장암 실시예에 있어서의 발현량이 우수한 10개의 유전자 후보가 도시된다. FC stands for Fold Change and quantitatively defines the level of gene expression. FC = treatment / control. Here, treatment is the comparison condition, and Control is the reference condition value. Therefore, if the expression level in both conditions is the same, FC = 1. To treat this as a symmetrical value, the expression level can be determined by log 2 (FC), and in Figure 17, 10 gene candidates with excellent expression levels in 333 examples of pancreatic cancer are shown.
도 17은 도 16의 프로세스에 따라 식별된 상위 10개의 후보 분자를 도시한 도면이다. Figure 17 is a diagram showing the top 10 candidate molecules identified according to the process of Figure 16.
도 17을 살피면, 암 인접 정상 및 췌장암 조직의 GEO 데이터베이스의 전사체 데이터(GSE15471)를 사용하여, DPEP1을 제외한 10개의 후보 분자 모두가 정상 조직에 비해 종양에서 유의하게 높게 발현됨을 확인하였다. DPEP1은 예외적으로 정상 조직에서 더 많이 발현되었다. 이에 본 발명에 따른 최종 후보 물질을 선정함에 있어서 비정상 조직에서의 발현량이 높은 타겟 물질을 선정하고자 한다면 DPEP1은 제외될 수 있다.Looking at Figure 17, using transcriptome data (GSE15471) from the GEO database of normal and pancreatic cancer tissues adjacent to cancer, it was confirmed that all 10 candidate molecules except DPEP1 were expressed significantly higher in tumors compared to normal tissues. DPEP1 was exceptionally more expressed in normal tissues. Accordingly, when selecting a final candidate substance according to the present invention, DPEP1 may be excluded if a target substance with a high expression level in abnormal tissue is to be selected.
도 18은 도 17에 따른 후보 물질이 인접한 정상 조직보다 발현율이 유의미하게 차이남을 도시한 도면이다.FIG. 18 is a diagram showing that the expression rate of the candidate substance according to FIG. 17 is significantly different from that of adjacent normal tissue.
도 19는 도 17에 따른 후보 물질의 정상 세포와 암 세포 간의 p-value 를 기재한 도면이다.FIG. 19 is a diagram illustrating the p-value between normal cells and cancer cells of the candidate substance according to FIG. 17.
도 20은 도 17에 따른 후보 물질의 발현율이 모든 임상 단계에서 정상세포보다 암세포에서 발현율이 높음을 도시한 도면이다.Figure 20 is a diagram showing that the expression rate of the candidate substance according to Figure 17 is higher in cancer cells than in normal cells at all clinical stages.
도 20을 살피면, GTEx 데이터베이스에서 정상 췌장 조직의 전사체 발현 데이터와 TCGA에서 임상 단계 정보를 포함한 PAAD(Pancreatic Adenocarcinoma) 전사체 데이터를 결합하여, DPEP1을 제외한 10개 후보 분자의 발현이 종양의 모든 임상 단계에서 정상 조직보다 유의하게 높은 것을 확인할 수 있다. 이는 전술한 단계 S307에 따른 암 특이적 마커 후보의 조직 전사체 정보에서 검증을 통한 암 특이적 마커 후보를 도출하는 단계의 일 예시일 수 있다.Looking at Figure 20, by combining transcript expression data of normal pancreatic tissue in the GTEx database and PAAD (Pancreatic Adenocarcinoma) transcriptome data including clinical stage information in TCGA, the expression of 10 candidate molecules except DPEP1 was found to be consistent with all clinical conditions in the tumor. It can be seen that it is significantly higher than that of normal tissue at this stage. This may be an example of the step of deriving a cancer-specific marker candidate through verification from the tissue transcriptome information of the cancer-specific marker candidate according to step S307 described above.
도 21은 유방암 공간전사체에서의 분포 및 타겟적합성을 기술하기 위해 유방암중 TNBC 타입 및 Luminal 타의 공간전사체 데이터에서 세포 종류를 추출하는 예시를 도시한 도면이다. 상기 이미지에서는 종양 표피세포들의 공간전사체 분포가 확인된다. 이는 단계 S308에 대응되는 단계이다. Figure 21 is a diagram showing an example of extracting cell types from TNBC type and Luminal spatial transcriptome data among breast cancers to describe the distribution and target suitability in the breast cancer spatial transcriptome. In the above image, the spatial transcriptome distribution of tumor epidermal cells is confirmed. This is a step corresponding to step S308.
도 22는 종양의 8가지 종류의 세포 분포를 도시한 도면이다. 도 23은 타겟 세포와 세포 유형 간의 상관관계를 도시한 도면이다. 이는 암 특이적 마커 후보에 대한 공간전사체 분포와 레퍼런스 세포들에 대한 공간전사체 분포의 상관관계를 도출하는 단계 S309에 대응된다. 도 22 및 도 23을 살피면, 유방암 사례에 있어서, 알고리즘 등을 이용하여 세포들의 분포를 확인한 Score 분포를 reference data로 설정하고, single cell RNA req를 통해 찾아낸 타겟의 공간전사체 분포를 대비하여 상관관계(correlation)를 도출하였다. 이때 S307에서 도출된 특이적 마커 후보 중 ANTXR1이라는 타겟이 chondrocyte, firoblast, iDC, stramascore와 양의 상관관계를 가짐이 확인할 수 있다. Figure 22 is a diagram showing the distribution of eight types of cells in a tumor. Figure 23 is a diagram showing the correlation between target cells and cell types. This corresponds to step S309, which derives the correlation between the spatial transcript distribution for the cancer-specific marker candidate and the spatial transcript distribution for the reference cells. Looking at Figures 22 and 23, in the case of breast cancer, the Score distribution, which confirms the distribution of cells using an algorithm, is set as reference data, and the spatial transcript distribution of the target found through single cell RNA req is compared to determine the correlation. (correlation) was derived. At this time, it can be confirmed that among the specific marker candidates derived from S307, a target called ANTXR1 has a positive correlation with chondrocyte, firoblast, iDC, and stramascore.
도 24는 타겟 적합성을 확인하기 위해 타겟과 양의 상관관계가 있는 세포의 발현을 예시적으로 도시한 도면이다. 이는 단계 S310에 따른, 양의 관관계를 가지는 세포와 암 특이적 마커 후보의 세포 분포를 2차적으로 확인하는 단계에 대응된다.Figure 24 is a diagram illustrating the expression of cells positively correlated with the target to confirm target suitability. This corresponds to the step of secondarily confirming the cell distribution of cells with positive correlation and cancer-specific marker candidates according to step S310.
도 25는 타겟과 상관관계가 있는 세포 유형을 비교 도시한 그래프이다. 도 25는 조직내 타겟 적합성 위한 분포 결과를 검토한 결과이다. Figure 25 is a graph comparing cell types that are correlated with targets. Figure 25 shows the results of reviewing the distribution results for target suitability within the organization.
도 26은 양의 상관관계를 가지는 공간적 관련 유전자 추출 및 기능적 term을 제시한 도면이다. 도 26을 참조하면, 후보 유전자인 ANTXR1에 대한 Gene Ontology 분석 및 경로 검색을 양의 상관관계를 가지는 후보 유전체에게 수행한 결과가 도시된다. Gene Ontology 분석(GO 분석)은 유전자 기능 연구를 위해 개별 유전자에 대해 유전자가 관련된 세포 기작(biological process), 분자 기능(molecular function) 및 세포 내외 위치(cellular component)에 따라 구조화된 모델로, 해당 분석을 통해 Functional annotation을 얻을 수 있고, 유전자의 기능을 분석하기 위해 Gene Ontology DB와 대조하여 Gene annotation을 수행하고 통계적인 방법을 통해 유의미한 결과를 얻을 수 있다. Figure 26 is a diagram showing spatially related gene extraction and functional terms with positive correlation. Referring to FIG. 26, the results of Gene Ontology analysis and path search for the candidate gene ANTXR1 on candidate genomes with positive correlation are shown. Gene Ontology analysis (GO analysis) is a structured model for individual genes based on the biological process, molecular function, and cellular component with which the gene is related to study gene function. Functional annotation can be obtained through, and in order to analyze the function of a gene, gene annotation can be performed against the Gene Ontology DB and meaningful results can be obtained through statistical methods.
도 27은 양의 상관관계를 가지는 유전자를 대상으로 경로 분석(pathway analysis)를 수행한 결과를 도시한 도면이다. Figure 27 is a diagram showing the results of pathway analysis targeting genes with positive correlation.
이러한 도 26 및 도 27에 따른 분석을 통해 암 특이적 마커 후보의 세포 연관성의 확인이 가능하다(단계 S311). 이러한 도 26 및 도 27은 전술한 내용에서 말한바와 같이 특이적 마커 후보가 타겟 세포뿐만 아니라 어떤 유전자들과 높은 연관성을 가지고, 어떤 유전자들에게 영향을 주는지에 대한 부분이 정량적으로 판단된 유전자 연관성이 연계되어 판단하는 단계이다. 예를 들어, 본 발명에 세포 클러스터의 표적 후보 추천 방법이 타겟으로 하는 암종(비정상 조직)이 있을 경우 해당 암종의 발현에 기준치 이상의 영향성을 주는 암발현 관련 유전자를 상술한 유전자 연관성 판단 단계(S311)를 통해 판별할 수 있고, 도 27은 유방암의 발현에 영향을 주는 특이적 마커 후보 유전자들 간의 암발현 관련 유전자의 연관성이 도시된 예시이다. 이러한 대상 암종의 발현에 기준치 이상의 영향을 주는 암발현 관련 유전자의 숫자가 많은지 여부를 GO 분석 및 경로 분석을 통해 진행하여 2차 sorting에서 다시 10% 수준의 선별이 가능하다. 유전자와 양의 상관관계를 가지는 최종 특이적 마커 후보가 차후 약물 전달(drug delivery)시 대상 세포에 암종의 발 현에 영향을 끼치는 유전자에 직접적으로 전달할 수 있는 확률을 높힘으로써 유효한 최종 특이적 마커 후보를 용이하게 골라낼 수 있고, 이를 통해 임상실험을 통한 종래의 특이적 마커 후보 선정방법에 비해 최소 10배에서 100배까지의 실험 효율을 향상시킬 수 있음은 전술한 바와 같다.Through the analysis according to FIGS. 26 and 27, it is possible to confirm the cellular association of the cancer-specific marker candidate (step S311). As mentioned above, Figures 26 and 27 show that the specific marker candidate has a high correlation with not only the target cell but also with certain genes, and the gene correlation is quantitatively determined on which genes it affects. This is a connected judgment stage. For example, if there is a carcinoma (abnormal tissue) targeted by the cell cluster target candidate recommendation method of the present invention, a gene correlation determination step (S311 ), and Figure 27 is an example showing the relationship between cancer expression-related genes between specific marker candidate genes that affect the expression of breast cancer. By conducting GO analysis and pathway analysis to determine whether there are a large number of genes related to cancer expression that affect the expression of these target carcinomas beyond the standard value, selection at the 10% level is possible again in the second sorting. The final specific marker candidate, which has a positive correlation with the gene, is effective by increasing the probability of direct delivery to the gene that affects the expression of carcinoma in the target cell during subsequent drug delivery. As mentioned above, it is possible to easily select, and through this, the experimental efficiency can be improved by at least 10 to 100 times compared to the conventional method of selecting specific marker candidates through clinical experiments.
도 30은 The Cancer Genome Atlas(TCGA) 데이터베이스에서의 타겟적합성을 확인하기 위해 암종별로 타겟 발현값을 비교 도시한 그래프로, 유방암(BRCA)에서의 ANTXR1의 발현값이 정상세포와 종양 조직 간의 차이를 확인할 수 있다.Figure 30 is a graph comparing target expression values by cancer type to confirm target suitability in The Cancer Genome Atlas (TCGA) database. The expression value of ANTXR1 in breast cancer (BRCA) shows the difference between normal cells and tumor tissues. You can check it.
도 31은 타겟의 발현을 종양 및 정상세포 별로 비교한 것으로 GBM ESCA STAD HNSC KIRC CHOL COAD KIRP 등 종양에서 높은 발현율을 가지고 LUSC LUAD PRAD THCA BLCA UCEC CESC PCPG KICH 등의 정상 세포에서 높은 발현율을 가지는 것이 확인되는 것을 도시한 도면이다.Figure 31 compares the expression of the target by tumor and normal cells, confirming that it has a high expression rate in tumors such as GBM ESCA STAD HNSC KIRC CHOL COAD KIRP and a high expression rate in normal cells such as LUSC LUAD PRAD THCA BLCA UCEC CESC PCPG KICH This is a drawing showing what happens.
[췌장암 실시예][Pancreatic cancer example]
Pancreatic ductal adenocarcinoma(PDAC)와 정상 췌장의 세포들은 각각 세포 유형 특이적인 마커 유전자의 발현에 따라 덕트/종양, 간질 및 면역 세포 클러스터로 군집화한다. Pancreatic ductal adenocarcinoma (PDAC) and normal pancreatic cells cluster into duct/tumor, stromal, and immune cell clusters, respectively, based on the expression of cell type-specific marker genes.
다음 간질 세포 클러스터 내에서 PDAC와 정상 세포 간에 발현 차이가 있는 유전자를 식별한다. PDAC 간질 특이적인 유전자 중 SurfaceomeDB를 사용하여 333개의 세포 표면 마커를 얻는다. Next, identify genes with expression differences between PDAC and normal cells within the stromal cell cluster. Among PDAC stroma-specific genes, 333 cell surface markers were obtained using SurfaceomeDB.
상위 10개 PDAC 간질 특이적인 세포 표면 마커 유전자는 MXRA8, ANTXR1, LY6E, GJB2, THY1, PLXDC2, GPNMB, SDC1, CD55, DPEP1로 확인되었다. 결과를 검증하기 위해, 상위 10개 유전자의 발현 수준을 췌장암 조직과 인접한 정상 췌장 조직 간에 비교했다. 결과적으로, 이들 유전자의 발현 수준은 암 조직에서 정상 췌장 조직보다 유의하게 높은 것으로 나타났다. The top 10 PDAC stroma-specific cell surface marker genes were identified as MXRA8, ANTXR1, LY6E, GJB2, THY1, PLXDC2, GPNMB, SDC1, CD55, and DPEP1. To verify the results, the expression levels of the top 10 genes were compared between pancreatic cancer tissue and adjacent normal pancreatic tissue. As a result, the expression levels of these genes were found to be significantly higher in cancer tissue than in normal pancreatic tissue.
또한, GTEx에서 건강한 인간 췌장 조직과 TCGA에서 췌장암 환자로부터 얻은 RNA 시퀀싱 데이터를 병합하여 모든 PDAC 단계에서 10개의 유전자 발현 수준이 정상 조직보다 유의하게 높음을 확인했다. 정리하면, 단일 세포 RNA 시퀀싱 데이터와 SurfaceomeDB를 사용하여 췌장암 간질 특이적인 세포 표면 마커를 식별할 수 있다. 이 분석 파이프라인은 다른 종류의 종양 및 다른 목적에도 적용될 수 있다.Additionally, by merging RNA sequencing data obtained from healthy human pancreatic tissue in GTEx and pancreatic cancer patients in TCGA, we found that the expression levels of 10 genes were significantly higher than normal tissue in all PDAC stages. In summary, pancreatic cancer stroma-specific cell surface markers can be identified using single cell RNA sequencing data and SurfaceomeDB. This analysis pipeline can also be applied to other types of tumors and for other purposes.
도 39를 참조하면, 표적 후보 물질 제안 장치(1000)는 메모리 장치(1200), 프로세서(1100), 스토리지(1300), 통신 모듈(미도시) 및 입출력 인터페이스(I/O 장치)(1400), 파워 서플라이(1500)를 포함할 수 있다.Referring to FIG. 39, the target candidate material proposal device 1000 includes a memory device 1200, a processor 1100, a storage 1300, a communication module (not shown), and an input/output interface (I/O device) 1400. It may include a power supply 1500.
메모리 장치(1200)는 컴퓨터에서 판독 가능한 기록 매체로서, RAM(random access memory), ROM(read only memory) 및 디스크 드라이브와 같은 비소멸성 대용량 기록장치(permanent mass storage device)를 포함할 수 있다. 또한, 메모리에는 표적 후보 물질 제안 방법을 제어하기 위한 프로그램 코드 및 미리 학습된 딥러닝 네트워크가 일시적 또는 영구적으로 저장될 수 있다.The memory device 1200 is a computer-readable recording medium and may include a permanent mass storage device such as random access memory (RAM), read only memory (ROM), and a disk drive. Additionally, a program code for controlling a target candidate material proposal method and a pre-trained deep learning network may be temporarily or permanently stored in the memory.
프로세서(1100)는 비정상 조직과 정상조직의 단일 세포 전사체 정보를 수집하고, 단일 세포 전사체 정보에 대한 클러스터링 결과를 대응하는 세포군으로 매칭하고, 세포군 중 관심 세포군을 선정하고, 관심 세포군에서 비정상 조직에서 정상조직 대비 발현율이 높은 분자마커를 도출하고, 및 분자마커 중 표면 단백질 발현 가능한 분자마커를 도출하는 것을 수행 할 수 있다.The processor 1100 collects single cell transcriptome information of abnormal tissue and normal tissue, matches the clustering result of the single cell transcriptome information with the corresponding cell group, selects the cell group of interest from the cell group, and abnormal tissue from the cell group of interest. It is possible to derive molecular markers with a high expression rate compared to normal tissues, and to derive molecular markers that can express surface proteins among the molecular markers.
통신 모듈은 네트워크를 통해 외부 서버와 통신하기 위한 기능을 제공할 수 있다. 일례로, 표적 후보 물질 제안 장치의 프로세서가 메모리와 같은 기록 장치에 저장된 프로그램 코드에 따라 생성한 요청이 통신 모듈의 제어에 따라 네트워크를 통해 외부 서버로 전달될 수 있다. 역으로, 외부 서버의 프로세서의 제어에 따라 제공되는 제어 신호나 명령, 컨텐츠, 파일 등이 네트워크를 거쳐 통신 모듈을 통해 표적 후보 물질 제안 로 수신될 수 있다. The communication module may provide functions for communicating with an external server through a network. For example, a request generated by the processor of the target candidate substance proposal device according to a program code stored in a recording device such as a memory may be transmitted to an external server through a network under the control of a communication module. Conversely, control signals, commands, content, files, etc. provided under the control of the external server's processor may be received as target candidate material proposals through the communication module through the network.
통신 방식은 제한되지 않으며, 네트워크가 포함할 수 있는 통신망(일례로, 이동통신망, 유선 인터넷, 무선 인터넷, 방송망)을 활용하는 통신 방식뿐만 아니라 기기들간의 근거리 무선 통신 역시 포함될 수 있다. 예를 들어, 네트워크는, PAN(personal area network), LAN(local area network), CAN(campus area network), MAN(metropolitan area network), WAN(wide area network), BBN(broadband network), 인터넷 등의 네트워크 중 하나 이상의 임의의 네트워크를 포함할 수 있다. 또한, 네트워크는 버스 네트워크, 스타 네트워크, 링 네트워크, 메쉬 네트워크, 스타-버스 네트워크, 트리 또는 계층적(hierarchical) 네트워크 등을 포함하는 네트워크 토폴로지 중 임의의 하나 이상을 포함할 수 있으나, 이에 제한되지 않는다.The communication method is not limited, and may include not only a communication method utilizing a communication network that the network may include (for example, a mobile communication network, wired Internet, wireless Internet, and a broadcasting network), but also short-range wireless communication between devices. For example, networks include personal area network (PAN), local area network (LAN), campus area network (CAN), metropolitan area network (MAN), wide area network (WAN), broadband network (BBN), Internet, etc. It may include one or more arbitrary networks among the networks. Additionally, the network may include, but is not limited to, any one or more of network topologies including a bus network, star network, ring network, mesh network, star-bus network, tree or hierarchical network, etc. .
또한, 통신 모듈은 외부 서버와 네트워크를 통해 통신할 수 있다. 통신 방식은 제한되지 않지만, 네트워크는 근거리 무선통신망일 수 있다. 예를 들어, 네트워크는 블루투스(Bluetooth), BLE(Bluetooth Low Energy), Wifi 통신망일 수 있다.Additionally, the communication module can communicate with an external server through a network. The communication method is not limited, but the network may be a local area wireless communication network. For example, the network may be a Bluetooth, Bluetooth Low Energy (BLE), or Wifi communication network.
입출력 인터페이스(1400)는 입출력 장치와의 인터페이스를 위한 수단일 수 있다. 예를 들어, 입력 장치는 키보드 또는 마우스 등의 장치를, 그리고 출력 장치는 어플리케이션의 통신 세션을 표시하기 위한 디스플레이와 같은 장치를 포함할 수 있다. 다른 예로 입출력 인터페이스는 터치스크린과 같이 입력과 출력을 위한 기능이 하나로 통합된 장치와의 인터페이스를 위한 수단일 수도 있다. 보다 구체적인 예로, 표적 후보 물질 제안 장치의 프로세서는 메모리에 로딩된 컴퓨터 프로그램의 명령을 처리함에 있어서 외부 서버가 제공하는 데이터를 이용하여 구성되는 서비스 화면이나 컨텐츠가 입출력 인터페이스(를 통해 디스플레이에 표시될 수 있다.The input/output interface 1400 may be a means for interfacing with an input/output device. For example, an input device may include a device such as a keyboard or mouse, and an output device may include a device such as a display for displaying a communication session of an application. As another example, an input/output interface may be a means of interfacing with a device that integrates input and output functions into one, such as a touch screen. As a more specific example, the processor of the target candidate substance proposal device processes the commands of the computer program loaded in the memory, and a service screen or content constructed using data provided by an external server may be displayed on the display through an input/output interface ( there is.
파워 서플라이(1500)는 장치(1000)의 동작에 필요한 전원을 공급할 수 있다.The power supply 1500 may supply power necessary for the operation of the device 1000.
또한, 다른 실시예들에서 표적 후보 물질 제안 장치는 상술한 구성요소들보다 더 많은 구성요소들을 포함할 수도 있으며 한정되지 않는다.Additionally, in other embodiments, the target candidate substance proposal device may include more components than the components described above, but is not limited thereto.
이상에서와 같이 도면과 명세서에서 예시적인 실시예들이 개시되었다. 본 명세서에서 특정한 용어를 사용하여 실시예들을 설명되었으나, 이는 단지 본 개시의 기술적 사상을 설명하기 위한 목적에서 사용된 것이지 의미 한정이나 청구범위에 기재된 본 개시의 범위를 제한하기 위하여 사용된 것은 아니다. 그러므로 본 기술분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 개시의 진정한 기술적 보호범위는 첨부된 청구범위의 기술적 사상에 의해 정해져야 할 것이다.As above, exemplary embodiments have been disclosed in the drawings and specification. In this specification, embodiments have been described using specific terms, but this is only used for the purpose of explaining the technical idea of the present disclosure and is not used to limit the meaning or scope of the present disclosure described in the claims. Therefore, those skilled in the art will understand that various modifications and other equivalent embodiments are possible therefrom. Therefore, the true technical protection scope of the present disclosure should be determined by the technical spirit of the attached claims.

Claims (17)

  1. 컴퓨팅 장치에 의해 수행되는 단일 세포 전사체 정보를 통한 표적 후보 물질 제안 방법에 있어서,In the method of suggesting target candidates through single cell transcriptome information performed by a computing device,
    비정상 조직과 정상조직의 단일 세포 전사체 정보를 수집하는 단계;Collecting single cell transcriptome information of abnormal and normal tissues;
    단일 세포 전사체 정보에 대한 클러스터링 결과를 대응하는 세포군으로 매칭하는 단계;Matching the clustering result for single cell transcriptome information to a corresponding cell group;
    세포군 중 관심 세포군을 선정하는 단계;Selecting a cell group of interest from among the cell groups;
    관심 세포군에서 비정상 조직에서 정상조직 대비 발현율이 높은 분자마커를 도출하는 단계; 및Deriving a molecular marker with a higher expression rate in abnormal tissue compared to normal tissue in the cell group of interest; and
    분자마커 중 표면 단백질 발현 가능한 분자마커를 도출하는 단계;를 포함하는 것을 특징으로 하는 표적 후보 물질 제안 방법.A method for proposing a candidate target material, comprising the step of deriving a molecular marker capable of expressing a surface protein among molecular markers.
  2. 제1 항에 있어서,According to claim 1,
    상기 비정상 조직은 악성 종양인 것을 특징으로 하는 표적 후보 물질 제안 방법.A method for proposing a target candidate material, wherein the abnormal tissue is a malignant tumor.
  3. 제1항에 있어서,According to paragraph 1,
    상기 비정상 조직과 정상조직의 단일 세포 전사체 정보를 수집하는 단계는,The step of collecting single cell transcriptome information of the abnormal tissue and normal tissue is,
    암 종류를 선정하는 단계를 포함하는 것을 특징으로 하는 표적 후보 물질 제안 방법.A method for proposing a target candidate material, comprising the step of selecting a cancer type.
  4. 제1 항에 있어서,According to claim 1,
    상기 단일 세포 전사체 정보에 대한 클러스터링 결과를 대응하는 세포군으로 매칭하는 단계는, 비지도학습에 의한 클러스터링 되는 것을 특징으로 하는 표적 후보 물질 제안 방법.A method for proposing a target candidate material, wherein the step of matching the clustering result for the single cell transcriptome information with the corresponding cell group is clustered by unsupervised learning.
  5. 제1 항에 있어서,According to claim 1,
    상기 단일 세포 전사체 정보에 대한 클러스터링 결과를 대응하는 세포군으로 매칭하는 단계는, 전체 전사체 발현 프로파일의 유사도가 기준치를 넘어서는 지를 기준으로 그룹화 하는 것을 특징으로 하는 표적 후보 물질 제안 방법.The step of matching the clustering result for the single cell transcriptome information with the corresponding cell group is a target candidate proposal method, characterized in that the grouping is based on whether the similarity of the entire transcriptome expression profile exceeds a reference value.
  6. 제1 항에 있어서,According to claim 1,
    상기 세포군 중 관심 세포군을 선정하는 단계는,The step of selecting a cell group of interest among the cell groups is,
    특이적 생체 마커의 발현율을 연산하여 선정하는 것을 특징으로 하는 표적 후보 물질 제안 방법.A target candidate proposal method characterized by calculating and selecting the expression rate of a specific biomarker.
  7. 제1 항에 있어서,According to claim 1,
    정상인과 환자의 조직 전사체 정보를 수집하는 단계; 및Collecting tissue transcriptome information from normal people and patients; and
    표면 단백질 발현 가능한 분자마커를 조직 전사체 정보에서 검증하여 특이적 마커를 도출하는 단계;를 더 포함하는 것을 특징으로 하는 표적 후보 물질 제안 방법.A method for suggesting a target candidate material, further comprising the step of deriving a specific marker by verifying a molecular marker capable of expressing a surface protein from tissue transcript information.
  8. 제1 항에 있어서,According to claim 1,
    표면 단백질 발현 가능한 분자마커를 조직 전사체 정보에서 검증하여 특이적 마커 후보를 도출하는 단계를 더 포함하는 것을 특징으로 하는 표적 후보 물질 제안 방법.A method for proposing a candidate target material, further comprising the step of deriving a specific marker candidate by verifying a molecular marker capable of expressing a surface protein from tissue transcript information.
  9. 제8 항에 있어서,According to clause 8,
    타겟 암종의 레퍼런스 세포들에 대한 공간전사체 분포를 확인하는 단계를 더 포함하는 것을 특징으로 하는 표적 후보 물질 제안 방법.A method for proposing a target candidate material, further comprising the step of confirming the spatial transcript distribution of reference cells of the target carcinoma.
  10. 제9 항에 있어서,According to clause 9,
    암 특이적 마커 후보에 대한 공간전사체 분포와 레퍼런스 세포들에 대한 공간전사체 분포의 상관관계를 도출하는 단계를 더 포함하는 것을 특징으로 하는 표적 후보 물질 제안 방법.A method for proposing a target candidate, further comprising the step of deriving a correlation between the spatial transcript distribution for the cancer-specific marker candidate and the spatial transcript distribution for the reference cells.
  11. 제10 항에 있어서,According to claim 10,
    양의 상관관계를 가지는 세포와 암 특이적 마커 후보의 세포 분포를 2차적으로 확인하는 단계를 더 포함하는 것을 특징으로 하는 표적 후보 물질 제안 방법.A method for proposing a target candidate material, further comprising the step of secondarily confirming the cell distribution of the positively correlated cells and the cancer-specific marker candidate.
  12. 제11 항에 있어서,According to claim 11,
    암 특이적 마커 후보의 유전자 연관성을 판단하는 단계를 더 포함하는 것을 특징으로 하는 표적 후보 물질 제안 방법.A method for proposing a target candidate material, further comprising the step of determining the genetic correlation of the cancer-specific marker candidate.
  13. 제12 항에 있어서,According to claim 12,
    상기 암 특이적 마커 후보의 유전자 연관성을 판단하는 단계는,The step of determining the genetic correlation of the cancer-specific marker candidate is,
    Gene Ontology(GO) 분석 또는 경로 분석(pathway analysis)를 이용하여 유전자 연관성을 정량적으로 판단하는 단계인 것을 특징으로 하는 표적 후보 물질 제안 방법.A method for suggesting target candidates, characterized in that it involves quantitatively determining gene correlation using Gene Ontology (GO) analysis or pathway analysis.
  14. 제13 항에 있어서,According to claim 13,
    암 특이적 마커 후보의 유전자 연관성을 판단하여 정량적으로 판단된 연관성 높은 유전자가 암 발현에 기준치 이상의 영향성을 주는 암발현 관련 유전자 인지를 판단하는 단계;를 더 포함하는 것을 특징으로 하는 표적 후보 물질 제안 방법.Proposal of a target candidate material further comprising the step of determining the gene correlation of the cancer-specific marker candidate and determining whether the quantitatively determined highly correlated gene is a cancer expression-related gene that influences cancer expression more than a reference value. method.
  15. 제14 항에 있어서,According to claim 14,
    암 특이적 마커 후보 중에 암발현 관련 유전자의 숫자가 높은 순서로 최종 암 특이적 마커를 선별하는 단계;를 더 포함하는 것을 특징으로 하는 표적 후보 물질 제안 방법.A method for proposing a target candidate material, further comprising: selecting a final cancer-specific marker in the order of the highest number of cancer expression-related genes among the cancer-specific marker candidates.
  16. 컴퓨팅 장치를 이용하여 제1 항 내지 제15 항 중 어느 한 항의 방법을 실행시키기 위하여 기록매체에 저장된 컴퓨터 프로그램.A computer program stored in a recording medium to execute the method of any one of claims 1 to 15 using a computing device.
  17. 프로세서;를 포함하고, Including a processor;
    상기 프로세서는, 비정상 조직과 정상조직의 단일 세포 전사체 정보를 수집하고, 단일 세포 전사체 정보에 대한 클러스터링 결과를 대응하는 세포군으로 매칭하고, 세포군 중 관심 세포군을 선정하고, 관심 세포군에서 비정상 조직에서 정상조직 대비 발현율이 높은 분자마커를 도출하고, 및 분자마커 중 표면 단백질 발현 가능한 분자마커를 도출하는 것을 수행하는 것을 특징으로 하는 표적 후보 물질 제안 장치.The processor collects single cell transcriptome information of abnormal tissue and normal tissue, matches the clustering result of the single cell transcriptome information with the corresponding cell group, selects the cell group of interest from the cell group, and selects the cell group of interest from the cell group of interest to the abnormal tissue. A target candidate material proposal device characterized by deriving molecular markers with a high expression rate compared to normal tissues, and deriving molecular markers capable of expressing surface proteins among the molecular markers.
PCT/KR2023/004192 2022-03-29 2023-03-29 Method for recommending candidate target of cell cluster in cancer microenvironment through single-cell transcriptome analysis, and apparatus and program therefor WO2023191503A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR10-2022-0038613 2022-03-29
KR20220038613 2022-03-29
KR1020230041265A KR20230140439A (en) 2022-03-29 2023-03-29 A method of recommending target candidates of a cell cluster in cancer microenvironment by Single-cell Transcriptome Analysis, device and program thereof
KR10-2023-0041265 2023-03-29

Publications (1)

Publication Number Publication Date
WO2023191503A1 true WO2023191503A1 (en) 2023-10-05

Family

ID=88202815

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2023/004192 WO2023191503A1 (en) 2022-03-29 2023-03-29 Method for recommending candidate target of cell cluster in cancer microenvironment through single-cell transcriptome analysis, and apparatus and program therefor

Country Status (1)

Country Link
WO (1) WO2023191503A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200038660A (en) * 2018-10-04 2020-04-14 사회복지법인 삼성생명공익재단 Method for selecting biomarker and method for providing information for diagnosis of cancer using thereof
KR20210089094A (en) * 2020-01-07 2021-07-15 한국과학기술원 Method and System for Screening Neoantigens, and Use thereof
KR20210144353A (en) * 2020-05-22 2021-11-30 연세대학교 산학협력단 Method for Predicting Colorectal Cancer Prognosis Based on Single Cell Transcriptome Analysis
US20210388450A1 (en) * 2018-10-29 2021-12-16 Samsung Life Public Welfare Foundation Biomarker panel for determining molecular subtype of lung cancer, and use thereof
CN113674800B (en) * 2021-08-25 2022-02-08 中国农业科学院蔬菜花卉研究所 Cell clustering method based on single cell transcriptome sequencing data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200038660A (en) * 2018-10-04 2020-04-14 사회복지법인 삼성생명공익재단 Method for selecting biomarker and method for providing information for diagnosis of cancer using thereof
US20210388450A1 (en) * 2018-10-29 2021-12-16 Samsung Life Public Welfare Foundation Biomarker panel for determining molecular subtype of lung cancer, and use thereof
KR20210089094A (en) * 2020-01-07 2021-07-15 한국과학기술원 Method and System for Screening Neoantigens, and Use thereof
KR20210144353A (en) * 2020-05-22 2021-11-30 연세대학교 산학협력단 Method for Predicting Colorectal Cancer Prognosis Based on Single Cell Transcriptome Analysis
CN113674800B (en) * 2021-08-25 2022-02-08 中国农业科学院蔬菜花卉研究所 Cell clustering method based on single cell transcriptome sequencing data

Similar Documents

Publication Publication Date Title
WO2017014469A1 (en) Disease risk prediction method, and device for performing same
WO2021154060A1 (en) Method of predicting disease, gene or protein related to queried entity and prediction system built by using the same
WO2018143540A1 (en) Method, device, and program for predicting prognosis of stomach cancer by using artificial neural network
WO2017116135A1 (en) System and method for analyzing genotype using genetic variation information on individual's genome
Borcherding et al. A transcriptomic map of murine and human alopecia areata
WO2019139363A1 (en) Method for detecting circulating tumor dna in sample including acellular dna and use thereof
WO2017135496A1 (en) Method and device for analyzing relationship between drug and protein
WO2020022733A1 (en) Whole genome sequencing-based chromosomal abnormality detection method and use thereof
Abedini et al. Spatially resolved human kidney multi-omics single cell atlas highlights the key role of the fibrotic microenvironment in kidney disease progression
WO2023191503A1 (en) Method for recommending candidate target of cell cluster in cancer microenvironment through single-cell transcriptome analysis, and apparatus and program therefor
WO2022114631A1 (en) Artificial-intelligence-based cancer diagnosis and cancer type prediction method
WO2017116139A1 (en) System for analyzing bioactive variation using genetic variation information on individual's genome
WO2018147608A2 (en) Target gene identifying method for tumor treatment
WO2023167448A1 (en) Method and apparatus for analyzing pathological slide image
WO2023090709A1 (en) Apparatus and method for analyzing cells by using state information of chromosome structure
Ruan et al. Transcriptional signatures of human peripheral blood mononuclear cells can identify the risk of tuberculosis progression from latent infection among individuals with silicosis
CN113793638A (en) Reading method for homologous recombination repair gene variation
WO2021006523A1 (en) Method for diagnosing brain tumor through bacterial metagenomic analysis
KR20230140439A (en) A method of recommending target candidates of a cell cluster in cancer microenvironment by Single-cell Transcriptome Analysis, device and program thereof
WO2022250512A1 (en) Artificial intelligence-based method for early diagnosis of cancer, using cell-free dna distribution in tissue-specific regulatory region
WO2022203437A1 (en) Artificial-intelligence-based method for detecting tumor-derived mutation of cell-free dna, and method for early diagnosis of cancer, using same
WO2020111451A1 (en) Novel target protein, and companion diagnosis biomarker discovery system and method therefor
WO2023080586A1 (en) Method for diagnosing cancer by using sequence frequency and size at each position of cell-free nucleic acid fragment
WO2023244046A1 (en) Method for diagnosing cancer and predicting type of cancer based on single nucleotide variant in cell-free dna
Karami et al. CHAC1 as a novel biomarker for distinguishing alopecia from other dermatological diseases and determining its severity

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23781356

Country of ref document: EP

Kind code of ref document: A1