WO2023191503A1

WO2023191503A1 - Method for recommending candidate target of cell cluster in cancer microenvironment through single-cell transcriptome analysis, and apparatus and program therefor

Info

Publication number: WO2023191503A1
Application number: PCT/KR2023/004192
Authority: WO
Inventors: 임형준; 이성준; 박정빈; 이대승
Original assignee: 주식회사 포트래이; 서울대학교산학협력단
Priority date: 2022-03-29
Filing date: 2023-03-29
Publication date: 2023-10-05

Abstract

A method for suggesting a candidate target using single-cell transcriptome information, implemented by a computing device, according to the present invention, comprises the steps of: collecting single-cell transcriptome information of an abnormal tissue and a normal tissue; matching the result of clustering the single-cell transcriptome information to corresponding cell groups; selecting a cell group of interest from the cell groups; deriving, from the cell group of interest, molecular markers having a higher expression rate in the abnormal tissue than in the normal tissue; and deriving, from the molecular markers, a molecular marker that can be expressed as a surface protein.

Description

Method, device and program for recommending target candidates for cell clusters in cancer microenvironment through single cell transcriptome analysis

The present invention relates to a method, device, and program for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis.

In order to select a specific molecular target for a cancer target-based new drug, it was previously conducted based on the results of various molecular biological studies. With the development of NGS technology (Nest-Generation Sequencing), screening of full-length transcriptomes and proteomes is usually used. In many cases, this was used as a starting point for new drug development by finding target molecules at the in vitro level and studying ligands that can target them.

Recently, as this technology has become possible at the single cell level, single cell analysis technology has been activated to find molecules that can be targeted at the cell level. In particular, technology is being used to find markers for each cell group from the cell transcriptome. However, there is no established process for discovering useful target markers using various databases.

In order to solve the problems described above, the present invention seeks to provide a method for proposing a target candidate using single cell transcriptome information, a device for suggesting a candidate for a target, and a program.

The method of proposing a target candidate through single cell transcriptome information performed by a computing device according to the present invention includes collecting single cell transcriptome information of abnormal tissue and normal tissue; clustering results for single cell transcriptome information; Matching with a corresponding cell group; Selecting a cell group of interest from among the cell groups; Deriving a molecular marker with a higher expression rate in abnormal tissue compared to normal tissue in the cell group of interest; and deriving a molecular marker capable of expressing surface proteins among the molecular markers.

The abnormal tissue may be a malignant tumor.

The step of collecting single cell transcriptome information of the abnormal tissue and normal tissue is,

It may include the step of selecting the type of cancer.

The step of matching the clustering result for the single cell transcriptome information to the corresponding cell group may be characterized by clustering using unsupervised learning.

The step of matching the clustering result for the single cell transcriptome information to the corresponding cell group may be characterized by grouping based on whether the similarity of the entire transcriptome expression profile exceeds a reference value.

The step of selecting a cell group of interest among the cell groups is,

It may be characterized by selection by calculating the expression rate of a specific biomarker.

Collecting tissue transcriptome information from normal people and patients; and

It may further include deriving specific markers by verifying molecular markers capable of expressing surface proteins from tissue transcript information.

It may further include the step of deriving specific marker candidates by verifying molecular markers capable of expressing surface proteins from tissue transcript information.

A step of confirming the spatial transcriptome distribution for reference cells of the target carcinoma may be further included.

A step of deriving a correlation between the spatial transcript distribution for the cancer-specific marker candidate and the spatial transcript distribution for the reference cells may be further included.

A step of secondary confirmation of the cell distribution of cells having a positive correlation and cancer-specific marker candidates may be further included.

A step of determining the genetic correlation of the cancer-specific marker candidate may be further included.

The step of determining the genetic association of the cancer-specific marker candidate may be a step of quantitatively determining the genetic association using Gene Ontology (GO) analysis or pathway analysis.

It may further include determining the gene correlation of the cancer-specific marker candidate and determining whether the quantitatively determined highly correlated gene is a cancer expression-related gene that has an influence greater than the reference value on cancer expression.

It may further include selecting the final cancer-specific marker in the order of the highest number of cancer expression-related genes among the cancer-specific marker candidates.

A step of deriving final cancer-specific marker candidates may be further included.

The solution includes a computer program stored in a recording medium to execute the above-described method using a computing device.

The target candidate proposal device includes a processor;

The processor collects single cell transcriptome information of abnormal tissue and normal tissue, matches the clustering result of the single cell transcriptome information with the corresponding cell group, selects the cell group of interest from the cell group, and selects the cell group of interest from the cell group of interest to the abnormal tissue. Molecular markers with a high expression rate compared to normal tissues are derived, and molecular markers capable of expressing surface proteins are derived among the molecular markers.

The method, device, and program for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis according to the present invention include i) selecting the type of cancer to be studied, and ii) selecting the target candidate from the corresponding tumor tissue and adjacent normal tissue. Single cell sequencing data can be collected from public databases, and iii) data cleaning and preprocessing can be performed.

Next, iv) unsupervised clustering can be used to group individual cells with similar overall transcript expression profiles. At this time, the cell type represented by each cluster can be determined by calculating the expression rate of a representative cell type-specific biomarker.

v) Perform gene expression difference analysis. Specifically, differential analysis can be performed by selecting clusters of cell types of interest, such as fibroblasts or infiltrating immune cells, and corresponding normal tissues.

vi) By intersecting the list of differential genes with the list of cell surface proteins in a database (e.g., Surfaceome database), molecular targets expressed on the outer surface of cells can be determined.

The method, device, and program for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis according to the present invention identify molecular markers that can characterize the corresponding cell groups, and use a previously established target protein database. By utilizing the results previously analyzed through single cell transcript expression information, molecular marker candidates that can be expressed as surface proteins can be selected.

As a result, the derivation of imaging markers that can bind to surface proteins can be used for diagnosis and severity assessment of specific diseases, and the development of therapeutic agents that can bind to surface proteins can be used to treat specific diseases.

Using single-cell RNA sequencing data, we built a bioinformatic analysis pipeline to identify stroma cell surface markers expressed in PDAC, a pancreatic tumor.

Single-cell RNA sequencing data were obtained from PDAC tissue (n=3) and normal pancreatic tissue (n=3) from the Gene Expression Omnibus (GEO) database.

Data from each sample were integrated and cells were clustered using the Seurat package.

To identify genes with enriched expression in PDAC stroma, stroma cell clusters were selected and differential analysis was performed.

By intersecting PDAC stroma-specific genes with gene sets from SurfaceomeDB, stroma cell surface markers expressed at high levels in PDAC were identified.

The expression of these marker genes was measured in 36 pairs of PDAC and normal tissue samples obtained from RNA sequencing data (GSE15471), and in 143 PDAC samples obtained from The Cancer Genome Atlas (TCGA) and 165 pairs obtained from the Genotype-Tissue Expression (GTEx) database. It was verified in normal pancreatic tissue.

Through this analysis pipeline, stroma cell surface markers expressed at high levels in PDAC were identified and verified in several independent datasets.

1 is a flowchart showing a method for proposing a target candidate according to the present invention.

Figure 2 is a flowchart showing a method for comparing cell population distribution.

Figure 3 is a flowchart showing a method for deriving molecular markers according to one embodiment.

Figure 4 is a flowchart showing a method for deriving molecular markers according to another example.

Figure 5 is a diagram analyzing single cell transcriptome information for severe patients, mild patients, and control groups for exemplary diseases.

Figure 6 is a graph showing the distribution of immune cell groups according to patient type.

Figure 7 is a diagram showing the intersection of a cluster with a high expression rate with the surface protein DB and the intersection with the DB having a target function.

Figure 8 shows the 10 most expressed molecular markers for each cluster, and the genome that satisfies the intersection condition with the surface protein DB among the molecular markers is marked with *.

Figure 9 is a flow chart showing a method for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis according to the present invention.

Figure 10 is a flowchart of a method for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis according to another embodiment.

Figure 11 is a diagram showing a single cell transcriptome profile showing the cluster distribution after data clustering and the relative distribution of two data groups.

Figure 12 is a diagram comparing single cell transcriptome profiles of normal cells and cancer cells.

Figure 13 is a diagram showing the expression genes of the single cell transcriptome profile of normal cells according to Figure 12.

Figure 14 is a diagram showing a dot plot showing the cell type of each cluster.

Figure 15 is a diagram classifying the single cell transcriptome into three main cell types.

Figure 16 is a diagram illustrating the steps of determining a candidate group of stromal cell surface targets in pancreatic cancer.

Figure 17 is a diagram showing the top 10 candidate molecules identified according to the process of Figure 16.

FIG. 18 is a diagram showing that the expression rate of the candidate substance according to FIG. 17 is significantly different from that of adjacent normal tissue.

FIG. 19 is a diagram illustrating the p-value between normal cells and cancer cells of the candidate substance according to FIG. 17.

Figure 20 is a diagram showing that the expression rate of the candidate substance according to Figure 17 is higher in cancer cells than in normal cells at all clinical stages.

Figure 21 is a diagram showing an example of extracting cell types from TNBC type and Luminal spatial transcriptome data among breast cancers to describe the distribution and target suitability in the breast cancer spatial transcriptome.

Figure 22 is a diagram showing the distribution of eight types of cells in a tumor.

Figure 23 is a diagram showing the correlation between target cells and cell types.

Figure 24 is a diagram illustrating the expression of cells positively correlated with the target to confirm target suitability.

Figure 25 is a graph comparing cell types that are correlated with targets.

Figure 26 is a diagram showing spatially related gene extraction and functional terms with positive correlation.

Figure 27 is a diagram showing the results of pathway analysis targeting genes with positive correlation.

Figure 28 is a diagram showing the extraction of spatially related genes and functional terms with negative correlation.

Figure 29 is a diagram showing spatial expression patterns.

Figure 30 is a graph comparing target expression values by cancer type to confirm target suitability in The Cancer Genome Atlas (TCGA) database. The expression value of ANTXR1 in breast cancer (BRCA) shows the difference between normal cells and tumor tissues. You can check it.

Figure 31 compares the expression of the target by tumor and normal cells, confirming that it has a high expression rate in tumors such as GBM ESCA STAD HNSC KIRC CHOL COAD KIRP and a high expression rate in normal cells such as LUSC LUAD PRAD THCA BLCA UCEC CESC PCPG KICH This is a drawing showing what happens.

Figure 32 is a diagram showing the distribution of targets in cell lines for each cancer type.

Figure 33 is a diagram showing the distribution of targets in a normal cell population.

Figure 34 is a diagram showing the distribution of targets by breast cancer subtype, confirming that the ANTXR1 expression value of the TNBC subtype is relatively low.

Figure 35 is a diagram showing target distribution by breast cancer subtype, including normal cell data.

Figure 36 is a diagram analyzing the correlation between clinical variables and targets.

Figure 37 is a diagram analyzing the relationship between target and tumor microenvironment, performed on the basis of epithelial cells.

Figure 38 is a diagram related to marker analysis and interpretation related to combined use of immunotherapy agents.

Figure 39 is a block diagram of a computing device that performs a method of recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis according to an exemplary embodiment of the present invention.

Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the attached drawings. The advantages and features of the present disclosure and methods for achieving them will become clear by referring to the embodiments described in detail below along with the accompanying drawings. However, the technical idea of the present disclosure is not limited to the following embodiments and may be implemented in various different forms. The following examples are merely intended to complete the technical idea of the present disclosure and to be used in the technical field to which the present disclosure belongs. It is provided to fully inform those skilled in the art of the scope of the present disclosure, and the technical idea of the present disclosure is only defined by the scope of the claims.

When adding reference numerals to components in each drawing, it should be noted that identical components are given the same reference numerals as much as possible even if they are shown in different drawings. Additionally, in describing the present disclosure, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present disclosure, the detailed description will be omitted.

Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used with meanings that can be commonly understood by those skilled in the art to which this disclosure pertains. Additionally, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless clearly specifically defined. The terminology used herein is for the purpose of describing embodiments and is not intended to limit the disclosure. As used herein, singular forms also include plural forms, unless specifically stated otherwise in the context.

Additionally, in describing the components of the present disclosure, terms such as first, second, A, B, (a), and (b) may be used. These terms are only used to distinguish the component from other components, and the nature, sequence, or order of the component is not limited by the term. When a component is described as being “connected,” “coupled,” or “connected” to another component, that component may be directly connected or connected to that other component, but there is another component between each component. It will be understood that elements may be “connected,” “combined,” or “connected.”

As used in this disclosure, “comprises” and/or “comprising” refers to a referenced component, step, operation and/or element that includes one or more other components, steps, operations and/or elements. Does not exclude presence or addition.

Components included in one embodiment and components including common functions may be described using the same name in other embodiments. Unless stated to the contrary, the description given in one embodiment can be applied to other embodiments, and detailed description will be omitted to the extent of overlap or to the extent that it can be clearly understood by a person skilled in the art. You can.

1 is a flowchart showing a method for proposing a target candidate according to the present invention. Referring to Figure 1, the method of proposing a target candidate through single cell transcriptome information performed by a computing device according to the present invention includes the steps of collecting single cell transcriptome information of a control group and a patient based on the severity of symptoms ( S101), clustering the collected single cell transcriptome information (S102), performing matching of cell groups corresponding to clusters (S103), comparing the distribution of cell groups corresponding to the control cluster and the distribution of cell groups corresponding to the patient cluster. step (S104), firstly deriving molecular markers capable of expressing surface proteins for cell clusters expressed only in patients (S105), and secondly deriving molecular markers with target functions among the firstly derived molecular markers ( S106) may be included.

The step of collecting single cell transcriptome information of a patient (S101) may be a step of collecting single cell transcriptome information of a patient with a specific disease directly or by crawling the single cell transcriptome information data from an external source.

Single cell transcriptional genome analysis is a technology that analyzes the genomic characteristics of a cell by isolating a cell, amplifying and sequencing DNA or RNA from a very small amount of material. The method of directly performing single cell transcriptome analysis is as follows.

First, cell suspension is performed through a cell separation process from tissue or blood. Then obtain single cells by sorting them in order of cell size. In single cell transcriptional genome analysis, the amount of DNA that can be obtained from a single cell is only at the picogram level, so an amplification process is required to increase it to the nanogram level where sequencing is possible. For amplification, the PCR process or MDA process is mainly used, and a sequencing library can be produced based on this. In the case of RNA, cDNA is obtained through a reverse transcription process, and then amplified cDNA is obtained and a sequencing library is created.

Barcodes can be attached to the sequencing library produced from each cell, enabling tens to tens of thousands of samples to be sequenced together, and data for each cell can be separated after sequencing. The procedure for analyzing transcriptome data obtained through the RNA sequencing process using a single cell uses mapping tools such as TopHat and GSNAP for alignment to a reference sequence (hg18, hg19, etc. in humans), and HTs eq the obtained data. Perform procedures to measure gene expression using methods, etc. Next, through RNA sequencing quality management, the quality of the experimental data can be identified and if the quality is poor, a procedure can be performed to exclude it from the analysis.

A clustering process can be performed to confirm the cellular characteristics of the transcriptome data obtained through the above-mentioned preprocessing work. Through this, similarities between cells can be confirmed. Statistical methods such as edgeR and DESeq may be used to select specifically expressed genes between cells or between populations. Comparison between various groups may be possible using the group information obtained in this way. First, it may be possible to identify the stochasticity and variability of transcription through analysis within the same cell type. Additionally, regulatory network inference and allelic expression pattern analysis may be possible. Second, through analysis between cell types, it may be possible to identify biomarkers that show differences between cell types.

Single cell transcriptome information obtained in this way can be obtained through direct tissue examination, but can also be collected externally from public data conducted at other research institutes and hospitals.

This step (S101) may be a step of collecting information on patients with inflammatory diseases, and may further be a step of separately collecting information on mild patients and severe patients.

The step of clustering the collected single cell transcriptome information (S102) may be characterized by clustering using unsupervised learning. Referring to Figure 5, clustering results for single cell transcriptome information in severe patients, mild patients, and controls are shown.

The step of matching the cell group corresponding to the cluster (S103) is a step of identifying and matching the cell group to which the clustered group corresponds.

The step of contrasting the cell population distribution corresponding to the control cluster and the cell population distribution corresponding to the patient cluster (S104) is a step for selecting a genome specifically expressed in the patient cluster. Since the single transcript clusters of the patient group and the control group are clearly distinct, it can be easy to use this to select genomes that specifically occur in patients. Details will be described later in Figure 2.

The step (S105) of first deriving molecular markers capable of expressing surface proteins for cell clusters expressed only in patients is to select immune cell groups specifically expressed in patients and primarily select molecular markers capable of binding to surface proteins. This is the step where it is possible to set up a candidate group of target materials.

In the step (S106) of secondarily deriving a molecular marker having a target function among the firstly derived molecular markers, a molecular marker having a target function, for example, imaging or binding function, can be derived. Derivation of the secondary molecular marker can be implemented by deriving a molecular marker that satisfies the intersection between the primary molecular marker and a known imaging DB or binding DB.

Figure 2 is a flowchart showing a method for comparing cell population distribution. Referring to Figure 2, the step of comparing the distribution of cell populations (S104) includes the step of comparing the distribution ratio of immune cells from the control cluster and the patient cluster (S104-1), respectively.

It may include a step (S104-2) of selecting immune cells expressed above a predetermined reference value.

Furthermore, this step (S104) may further include a step (S104-3) of normalizing the genome expression count of immune cells and listing the differences from the control group based on P-value.

Figure 3 is a flowchart showing a method for deriving molecular markers according to one embodiment. Referring to FIG. 3, the step 105 of first deriving molecular markers capable of expressing surface proteins includes the step 105-1 of selecting a cluster with the largest quantitative difference from the control group, and the step 105-1 of selecting cluster genome information and surface protein genome. It may include a step (105-2) of deriving a surface protein primary molecular marker corresponding to the intersection of DBs.

Referring to Figure 3, the step of deriving a molecular marker with a target function secondarily (S106) is a step of deriving a secondary potential candidate with a target function by checking the intersection between the surface protein primary molecular marker and the imaging DB. It can be included.

Figure 4 is a flowchart showing a method for deriving molecular markers according to another example. Referring to FIG. 4, in the step 105 of first deriving molecular markers capable of expressing surface proteins, the difference from the control group among clusters significantly expressed above a predetermined threshold in seriously ill patients is determined by a predetermined number in descending order of quantitative size. Step of selecting according to the number (105'-1) Checking the intersection of the selected number of cluster genome information and the surface protein genome database to identify the primary surface protein potential candidate (Potential Target) that can react with all of the selected clusters. Step (105'-2) may be included.

The method for proposing a target candidate according to the present invention shown in FIGS. 1 to 4 may be performed by a device or implemented by a computer program stored in a recording medium for execution.

Figure 5 is a diagram analyzing single cell transcriptome information for severe patients, mild patients, and control groups for exemplary diseases. Referring to Figure 5, the single cell transcriptome information for the virus control group (HC), mild patient (M), and severe patient (S) is clustered through unsupervised learning, and the result of matching with the immune cell group is shown.

Referring to Figure 5, it can be visually confirmed that the immune cell group activation pattern of the control group (HC) and the immune cell group activation pattern of the mild patient (M) and the severe patient (S) are different.

Figure 6 is a graph showing the distribution of immune cell groups according to patient type. Referring to Figure 6, the results of a quantitative comparison of the degree of activation of the immune cell groups shown in Figure 5 are shown. Referring to Figure 5, it can be seen that the expression rates of M01 immune cells and M03 immune cells in seriously ill patients (S) and controls (HC) differ by more than a predetermined standard value, for example, more than 300%.

Figure 7 is a diagram showing the intersection of a cluster with a high expression rate with the surface protein DB and the intersection with the DB having a target function. Referring to Figure 7, through Figures 5 and 6, the immune cell group expressed above a predetermined reference value in a patient with a specific disease is identified, and the genome of the immune cell group is defined as a candidate pool of target candidate material, To derive molecular markers that can express surface proteins, primary molecular markers are derived based on the intersection with surface protein databases such as Surfaceome, and to derive secondary molecular markers that have a target function, such as imaging, between the primary molecular marker and target function. , the results of deriving target candidate molecular markers through intersection with the imaging DB are shown.

Referring to Figure 7, the expression rate in critically ill patients satisfies all requirements for intersection between the genome of M01 and M03, whose expression rate is above a predetermined reference value compared to the control group (e.g., 300%), and the surface protein DB (Surfaceome) and imaging DB (PETdb). It can be seen that the final candidates for the target molecular marker are SLC43A2, SLC2A3, and FOLR2.

Through this method of suggesting a target candidate using single cell transcriptome information according to the present invention, it may be possible to quickly and precisely derive a candidate material for a target disease that has a target function.

Referring to Figure 8, the 10 most expressed genomes for each immune cell group (M01, M02, M03, M04) are selected and listed, and among the genomes, the primary molecular marker candidate that satisfies the intersection conditions with the surface protein DB is selected. The results indicated by * are shown as examples.

For example, in the case of M01 immune cells, the 10 genomes of CD300E, CCR1, EMP1, TNFSF13B, LILRA5, IL1R2, FPR1, LILRB2, LILRB1, and IFNGR2 have the highest expression rates, and the primary genome that satisfies the intersection condition with the double surface protein DB It can be confirmed that the molecular marker candidates correspond to CCR1 and FPR1.

If you want to find a candidate with a target function, such as an imaging or binding function, for this primary molecular marker candidate, derive the target material by checking whether it satisfies the secondary intersection condition with the imaging DB (PETdb) or binding DB. efficient performance may be possible.

Step S201 is a step of collecting single cell transcriptome information of cancer tissue and normal tissue. This step (S201) may be a step of collecting single cell transcriptome information of a patient with a specific disease and transcriptome information of normal cells as a comparison group directly or by crawling and collecting single cell transcriptome information data from an external source. Single cell transcriptional genome analysis is a technology that separates cells, amplifies DNA or RNA from a very small amount of material, and analyzes the genomic characteristics of the cell by sequencing. Details are described above in Figure 1, so duplicate information will be omitted. .

In this embodiment, the target tissue is specified as a cancer tissue, but it is not limited to this, and of course, any abnormal tissue can be the target.

Step S202 is a step of performing clustering on single cell transcriptome information using unsupervised learning. This step may be to group individual cells with similar overall transcript expression profiles using unsupervised clustering. Details are described above in step S102 according to FIG. 1, and redundant information is omitted. The criterion for clustering is whether to group or not based on whether the similarity of the entire transcript expression profile exceeds the standard value.

Step S203 is a step of matching the clustering result for single cell transcript information with the corresponding cell group. The cell type represented by each cluster can be determined by calculating the expression rate of representative cell type-specific biomarkers. Details are described above in step S103 according to FIG. 1, and redundant information is omitted.

Step S204 is a step of selecting a cell group of interest from among the cell groups. Cell population of interest may refer to a cluster of cell types of interest, such as fibroblasts or infiltrating immune cells. The cell group of interest can be selected by calculating the expression rate of a specific biomarker.

Step S205 is a step of deriving molecular markers with a higher expression rate in cancer tissues compared to normal tissues in the cell group of interest. This may refer to the step of finding molecular targets that cause significant differences in gene expression.

Step S206 is a step of deriving a molecular marker capable of expressing a surface protein among the molecular markers. For example, by intersecting the list of genes with differences with the list of cell surface proteins in the Surfaceome database, molecular targets (molecular markers) expressed on the outer surface of cells can be obtained.

Step S207 is a step of collecting tissue transcriptome information of normal people and patients.

Step S208 is a step of deriving specific markers by verifying molecular markers capable of expressing surface proteins from tissue transcript information.

Figure 10 is a flowchart of a method for recommending target candidates for cell clusters in a cancer microenvironment through single cell transcriptome analysis according to another embodiment. Each step is described with reference to FIG. 10 as follows.

Step S301 is a step of collecting single cell transcriptome information of cancer tissue and normal tissue. This step (S301) may be a step of directly collecting single cell transcriptome information of a patient with a specific disease and normal cell transcriptome information as a comparison group, or may be a step of collecting single cell transcriptome information data by crawling from an external source. Single cell transcriptional genome analysis is a technology that isolates “one” cell, amplifies DNA or RNA from a very small amount of material, and analyzes the genomic characteristics of that cell by sequencing. Details are described above in Figure 1. Omit any content that does not apply.

Step S302 is a step of performing clustering on single cell transcriptome information using unsupervised learning. This step may be to group individual cells with similar overall transcript expression profiles using unsupervised clustering. Details are described above in step S102 according to FIG. 1, and redundant information is omitted.

Step S303 is a step of matching the clustering result for single cell transcript information with the corresponding cell group. The cell type represented by each cluster can be determined by calculating the expression rate of representative cell type-specific biomarkers. Details are described above in step S103 according to FIG. 1, and redundant information is omitted.

Step S304 is a step of selecting a cell group of interest from among the cell groups. Cell population of interest may refer to a cluster of cell types of interest, such as fibroblasts or infiltrating immune cells.

Step S305 is a step of deriving molecular markers with a higher expression rate in cancer tissues compared to normal tissues in the cell group of interest. This may refer to the step of finding molecular targets that cause significant differences in gene expression.

Step S306 is a step of deriving a molecular marker capable of expressing a surface protein among the molecular markers. For example, by intersecting the list of genes with differences with the list of cell surface proteins in the Surfaceome database, molecular targets (molecular markers) expressed on the outer surface of cells can be obtained.

Step S307 is a step of deriving cancer-specific marker candidates by verifying molecular markers capable of expressing surface proteins from tissue transcriptome information.

Step S308 is a step of confirming the spatial transcriptome distribution for reference cells of the target carcinoma. As shown in Figure 21, based on the spatial transcriptome data of the target carcinoma, the spatial transcriptome according to cell type can be confirmed and used as the basis for verification data in the step described later.

Step S309 is a step of deriving the correlation between the spatial transcript distribution for the cancer-specific marker candidate and the spatial transcript distribution for the reference cells. As shown in Figure 22, the Score distribution that confirmed the spatial transcriptome distribution of carcinoma cells using an algorithm, etc. was set as reference data, and correlation was compared with the spatial transcriptome distribution of the target found through single cell RNA req. ) can be derived.

Step S310 is a step to secondarily confirm the cell distribution of cells with a positive correlation and cancer-specific marker candidates. It can be confirmed that among the specific marker candidates derived as described in Figures 23 and 24, a target called ANTXR1 has a positive correlation with chondrocyte, firoblast, iDC, and stramascore. At this time, confirmation of a specific marker candidate with a high correlation with the tissue of the reference cell and its distribution can serve as a basis for confirming consistency with the target cell.

Step S311 is a step of determining the genetic relationship of the cancer-specific marker candidate. In this step, as shown in Figures 26 and 27, Gene Ontology analysis and path search are performed on candidate genomes with positive correlation to derive final gene correlation and final target information for specific marker candidates. It can be. This is a step of determining effective marker candidates again from the specific marker candidates whose consistency with the target cell was confirmed in the previous step S310, and may be a step of analyzing the correlation between genes rather than their consistency with the target cell. Broadly speaking, this can be a step to quantitatively determine gene correlation using Gene Ontology (GO) analysis or pathway analysis.

In this case, the specific marker candidate has a high correlation with not only the target cell but also certain genes, and which genes it affects can be determined by linking the quantitatively determined gene correlation. For example, if there is a carcinoma (abnormal tissue) targeted by the cell cluster target candidate recommendation method of the present invention, a gene correlation determination step (S311 ) can be determined through.

Through this, specific marker candidates are 1) first sorted to enable expression of cell populations and surface proteins of interest (step S306), 2) secondly sorted based on correlation with target cells (steps S309, S310), and 3) target. Third sorting is performed by determining whether the number of genes related to cancer expression that affects the expression of carcinoma is greater than the standard value (step S311), and finally the final cancer-specific marker (final abnormal tissue-specific marker) can be selected. Step (S312). Basically, by using spatial transcriptome information, it is possible to select at least 10% of specific marker candidates ranging from thousands to tens of thousands in the first sorting, and again at the 10% level in the second sorting. Ultimately, the greater the number of cancer expression-related genes that directly affect the expression of not only target cells but also carcinoma, the more likely it is that the final specific marker candidate that has a positive correlation with the relevant genes will be used in the target cells during subsequent drug delivery. By increasing the probability of direct delivery to genes that affect the expression of carcinoma, effective final specific marker candidates can be easily selected, and this makes it possible to easily select effective final specific marker candidates, compared to the conventional method of selecting specific marker candidates through clinical trials. Experimental efficiency can be improved by at least 10 to 100 times.

Figure 16 is a diagram illustrating the steps of determining a candidate group of stromal cell surface targets in pancreatic cancer. Referring to FIG. 16, normal tissue and abnormal tissue (malignant tumor-tumor tissue) are comprehensively analyzed to initially select (4030) target candidate genomes that show significant differences in expression rates, and a marker group for surface proteins is selected. Candidate materials for 333 target-specific markers obtained through intersection with the database (surfaceome DB, 2608) can be confirmed. Through this procedure, 333 target substances can be easily identified and then sorted in descending order by checking the average log ₂ (FC) value to quantitatively check the gene expression level.

FC stands for Fold Change and quantitatively defines the level of gene expression. FC = treatment / control. Here, treatment is the comparison condition, and Control is the reference condition value. Therefore, if the expression level in both conditions is the same, FC = 1. To treat this as a symmetrical value, the expression level can be determined by log ₂ (FC), and in Figure 17, 10 gene candidates with excellent expression levels in 333 examples of pancreatic cancer are shown.

Looking at Figure 17, using transcriptome data (GSE15471) from the GEO database of normal and pancreatic cancer tissues adjacent to cancer, it was confirmed that all 10 candidate molecules except DPEP1 were expressed significantly higher in tumors compared to normal tissues. DPEP1 was exceptionally more expressed in normal tissues. Accordingly, when selecting a final candidate substance according to the present invention, DPEP1 may be excluded if a target substance with a high expression level in abnormal tissue is to be selected.

Looking at Figure 20, by combining transcript expression data of normal pancreatic tissue in the GTEx database and PAAD (Pancreatic Adenocarcinoma) transcriptome data including clinical stage information in TCGA, the expression of 10 candidate molecules except DPEP1 was found to be consistent with all clinical conditions in the tumor. It can be seen that it is significantly higher than that of normal tissue at this stage. This may be an example of the step of deriving a cancer-specific marker candidate through verification from the tissue transcriptome information of the cancer-specific marker candidate according to step S307 described above.

Figure 21 is a diagram showing an example of extracting cell types from TNBC type and Luminal spatial transcriptome data among breast cancers to describe the distribution and target suitability in the breast cancer spatial transcriptome. In the above image, the spatial transcriptome distribution of tumor epidermal cells is confirmed. This is a step corresponding to step S308.

Figure 22 is a diagram showing the distribution of eight types of cells in a tumor. Figure 23 is a diagram showing the correlation between target cells and cell types. This corresponds to step S309, which derives the correlation between the spatial transcript distribution for the cancer-specific marker candidate and the spatial transcript distribution for the reference cells. Looking at Figures 22 and 23, in the case of breast cancer, the Score distribution, which confirms the distribution of cells using an algorithm, is set as reference data, and the spatial transcript distribution of the target found through single cell RNA req is compared to determine the correlation. (correlation) was derived. At this time, it can be confirmed that among the specific marker candidates derived from S307, a target called ANTXR1 has a positive correlation with chondrocyte, firoblast, iDC, and stramascore.

Figure 24 is a diagram illustrating the expression of cells positively correlated with the target to confirm target suitability. This corresponds to the step of secondarily confirming the cell distribution of cells with positive correlation and cancer-specific marker candidates according to step S310.

Figure 25 is a graph comparing cell types that are correlated with targets. Figure 25 shows the results of reviewing the distribution results for target suitability within the organization.

Figure 26 is a diagram showing spatially related gene extraction and functional terms with positive correlation. Referring to FIG. 26, the results of Gene Ontology analysis and path search for the candidate gene ANTXR1 on candidate genomes with positive correlation are shown. Gene Ontology analysis (GO analysis) is a structured model for individual genes based on the biological process, molecular function, and cellular component with which the gene is related to study gene function. Functional annotation can be obtained through, and in order to analyze the function of a gene, gene annotation can be performed against the Gene Ontology DB and meaningful results can be obtained through statistical methods.

Through the analysis according to FIGS. 26 and 27, it is possible to confirm the cellular association of the cancer-specific marker candidate (step S311). As mentioned above, Figures 26 and 27 show that the specific marker candidate has a high correlation with not only the target cell but also with certain genes, and the gene correlation is quantitatively determined on which genes it affects. This is a connected judgment stage. For example, if there is a carcinoma (abnormal tissue) targeted by the cell cluster target candidate recommendation method of the present invention, a gene correlation determination step (S311 ), and Figure 27 is an example showing the relationship between cancer expression-related genes between specific marker candidate genes that affect the expression of breast cancer. By conducting GO analysis and pathway analysis to determine whether there are a large number of genes related to cancer expression that affect the expression of these target carcinomas beyond the standard value, selection at the 10% level is possible again in the second sorting. The final specific marker candidate, which has a positive correlation with the gene, is effective by increasing the probability of direct delivery to the gene that affects the expression of carcinoma in the target cell during subsequent drug delivery. As mentioned above, it is possible to easily select, and through this, the experimental efficiency can be improved by at least 10 to 100 times compared to the conventional method of selecting specific marker candidates through clinical experiments.

[췌장암 실시예][Pancreatic cancer example]

Pancreatic ductal adenocarcinoma (PDAC) and normal pancreatic cells cluster into duct/tumor, stromal, and immune cell clusters, respectively, based on the expression of cell type-specific marker genes.

Next, identify genes with expression differences between PDAC and normal cells within the stromal cell cluster. Among PDAC stroma-specific genes, 333 cell surface markers were obtained using SurfaceomeDB.

The top 10 PDAC stroma-specific cell surface marker genes were identified as MXRA8, ANTXR1, LY6E, GJB2, THY1, PLXDC2, GPNMB, SDC1, CD55, and DPEP1. To verify the results, the expression levels of the top 10 genes were compared between pancreatic cancer tissue and adjacent normal pancreatic tissue. As a result, the expression levels of these genes were found to be significantly higher in cancer tissue than in normal pancreatic tissue.

Additionally, by merging RNA sequencing data obtained from healthy human pancreatic tissue in GTEx and pancreatic cancer patients in TCGA, we found that the expression levels of 10 genes were significantly higher than normal tissue in all PDAC stages. In summary, pancreatic cancer stroma-specific cell surface markers can be identified using single cell RNA sequencing data and SurfaceomeDB. This analysis pipeline can also be applied to other types of tumors and for other purposes.

Referring to FIG. 39, the target candidate material proposal device 1000 includes a memory device 1200, a processor 1100, a storage 1300, a communication module (not shown), and an input/output interface (I/O device) 1400. It may include a power supply 1500.

The memory device 1200 is a computer-readable recording medium and may include a permanent mass storage device such as random access memory (RAM), read only memory (ROM), and a disk drive. Additionally, a program code for controlling a target candidate material proposal method and a pre-trained deep learning network may be temporarily or permanently stored in the memory.

The processor 1100 collects single cell transcriptome information of abnormal tissue and normal tissue, matches the clustering result of the single cell transcriptome information with the corresponding cell group, selects the cell group of interest from the cell group, and abnormal tissue from the cell group of interest. It is possible to derive molecular markers with a high expression rate compared to normal tissues, and to derive molecular markers that can express surface proteins among the molecular markers.

The communication module may provide functions for communicating with an external server through a network. For example, a request generated by the processor of the target candidate substance proposal device according to a program code stored in a recording device such as a memory may be transmitted to an external server through a network under the control of a communication module. Conversely, control signals, commands, content, files, etc. provided under the control of the external server's processor may be received as target candidate material proposals through the communication module through the network.

The communication method is not limited, and may include not only a communication method utilizing a communication network that the network may include (for example, a mobile communication network, wired Internet, wireless Internet, and a broadcasting network), but also short-range wireless communication between devices. For example, networks include personal area network (PAN), local area network (LAN), campus area network (CAN), metropolitan area network (MAN), wide area network (WAN), broadband network (BBN), Internet, etc. It may include one or more arbitrary networks among the networks. Additionally, the network may include, but is not limited to, any one or more of network topologies including a bus network, star network, ring network, mesh network, star-bus network, tree or hierarchical network, etc. .

Additionally, the communication module can communicate with an external server through a network. The communication method is not limited, but the network may be a local area wireless communication network. For example, the network may be a Bluetooth, Bluetooth Low Energy (BLE), or Wifi communication network.

The input/output interface 1400 may be a means for interfacing with an input/output device. For example, an input device may include a device such as a keyboard or mouse, and an output device may include a device such as a display for displaying a communication session of an application. As another example, an input/output interface may be a means of interfacing with a device that integrates input and output functions into one, such as a touch screen. As a more specific example, the processor of the target candidate substance proposal device processes the commands of the computer program loaded in the memory, and a service screen or content constructed using data provided by an external server may be displayed on the display through an input/output interface ( there is.

The power supply 1500 may supply power necessary for the operation of the device 1000.

Additionally, in other embodiments, the target candidate substance proposal device may include more components than the components described above, but is not limited thereto.

As above, exemplary embodiments have been disclosed in the drawings and specification. In this specification, embodiments have been described using specific terms, but this is only used for the purpose of explaining the technical idea of the present disclosure and is not used to limit the meaning or scope of the present disclosure described in the claims. Therefore, those skilled in the art will understand that various modifications and other equivalent embodiments are possible therefrom. Therefore, the true technical protection scope of the present disclosure should be determined by the technical spirit of the attached claims.

Claims

In the method of suggesting target candidates through single cell transcriptome information performed by a computing device,

Collecting single cell transcriptome information of abnormal and normal tissues;

Matching the clustering result for single cell transcriptome information to a corresponding cell group;

Selecting a cell group of interest from among the cell groups;

Deriving a molecular marker with a higher expression rate in abnormal tissue compared to normal tissue in the cell group of interest; and

A method for proposing a candidate target material, comprising the step of deriving a molecular marker capable of expressing a surface protein among molecular markers.
According to claim 1,

A method for proposing a target candidate material, wherein the abnormal tissue is a malignant tumor.
According to paragraph 1,

The step of collecting single cell transcriptome information of the abnormal tissue and normal tissue is,

A method for proposing a target candidate material, comprising the step of selecting a cancer type.
According to claim 1,

A method for proposing a target candidate material, wherein the step of matching the clustering result for the single cell transcriptome information with the corresponding cell group is clustered by unsupervised learning.
According to claim 1,

The step of matching the clustering result for the single cell transcriptome information with the corresponding cell group is a target candidate proposal method, characterized in that the grouping is based on whether the similarity of the entire transcriptome expression profile exceeds a reference value.
According to claim 1,

The step of selecting a cell group of interest among the cell groups is,

A target candidate proposal method characterized by calculating and selecting the expression rate of a specific biomarker.
According to claim 1,

Collecting tissue transcriptome information from normal people and patients; and

A method for suggesting a target candidate material, further comprising the step of deriving a specific marker by verifying a molecular marker capable of expressing a surface protein from tissue transcript information.
According to claim 1,

A method for proposing a candidate target material, further comprising the step of deriving a specific marker candidate by verifying a molecular marker capable of expressing a surface protein from tissue transcript information.
According to clause 8,

A method for proposing a target candidate material, further comprising the step of confirming the spatial transcript distribution of reference cells of the target carcinoma.
According to clause 9,

A method for proposing a target candidate, further comprising the step of deriving a correlation between the spatial transcript distribution for the cancer-specific marker candidate and the spatial transcript distribution for the reference cells.
According to claim 10,

A method for proposing a target candidate material, further comprising the step of secondarily confirming the cell distribution of the positively correlated cells and the cancer-specific marker candidate.
According to claim 11,

A method for proposing a target candidate material, further comprising the step of determining the genetic correlation of the cancer-specific marker candidate.
According to claim 12,

The step of determining the genetic correlation of the cancer-specific marker candidate is,

A method for suggesting target candidates, characterized in that it involves quantitatively determining gene correlation using Gene Ontology (GO) analysis or pathway analysis.
According to claim 13,

Proposal of a target candidate material further comprising the step of determining the gene correlation of the cancer-specific marker candidate and determining whether the quantitatively determined highly correlated gene is a cancer expression-related gene that influences cancer expression more than a reference value. method.
According to claim 14,

A method for proposing a target candidate material, further comprising: selecting a final cancer-specific marker in the order of the highest number of cancer expression-related genes among the cancer-specific marker candidates.
A computer program stored in a recording medium to execute the method of any one of claims 1 to 15 using a computing device.
Including a processor;

The processor collects single cell transcriptome information of abnormal tissue and normal tissue, matches the clustering result of the single cell transcriptome information with the corresponding cell group, selects the cell group of interest from the cell group, and selects the cell group of interest from the cell group of interest to the abnormal tissue. A target candidate material proposal device characterized by deriving molecular markers with a high expression rate compared to normal tissues, and deriving molecular markers capable of expressing surface proteins among the molecular markers.