CN117437974A - Method and system for predicting tumor cell metastasis risk - Google Patents

Method and system for predicting tumor cell metastasis risk Download PDF

Info

Publication number
CN117437974A
CN117437974A CN202311246385.5A CN202311246385A CN117437974A CN 117437974 A CN117437974 A CN 117437974A CN 202311246385 A CN202311246385 A CN 202311246385A CN 117437974 A CN117437974 A CN 117437974A
Authority
CN
China
Prior art keywords
tumor cells
tumor
cells
metastasis
gene expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311246385.5A
Other languages
Chinese (zh)
Inventor
王伟楠
张泽民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202311246385.5A priority Critical patent/CN117437974A/en
Publication of CN117437974A publication Critical patent/CN117437974A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Data Mining & Analysis (AREA)
  • Analytical Chemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Physiology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention belongs to the field of biological information, and particularly relates to a method and a system for predicting tumor cell metastasis risk. The method predicts the S of the tumor cells to be detected based on the gene expression information of the tumor cells to be detected EMT 、S Tra And S is comm The method comprises the steps of carrying out a first treatment on the surface of the Wherein S is EMT Epithelial-mesenchymal transition Capacity for representing tumor cells, S Travel Indicating the viability of tumor cells in blood circulation, S comm Indicating the ability of tumor cells to invade a tissue or organ; then calculating to obtain the risk coefficient S of the metastasis of the tumor cells to be detected Merta . By the method of the invention, the doctor can be assisted in the early stage of cancer metastasis to the cancerThe patients can diagnose and target treatment in time, and can guide the research and development of new anticancer drugs, thereby being beneficial to improving the prevention and cure rate of cancers.

Description

Method and system for predicting tumor cell metastasis risk
Technical Field
The invention belongs to the field of biological information, and particularly relates to a method and a system for predicting tumor cell metastasis risk.
Background
Cancer metastasis refers to the process by which cancer spreads from a primary site to other tissue organs. Undoubtedly, metastasis of cancer is a major cause of cancer-related disease death. Due to the drawbacks of detection techniques, cancer metastasis is difficult to detect by clinical means early in cancer metastasis, which makes it often the case that optimal treatment opportunities have been missed when cancer metastasis is found. The prediction of the risk of cancer metastasis and the metastatic site by means of statistical and analytical means is therefore of great importance for the treatment of cancer.
Currently, many clinical DNA sequencing techniques have been used to sequence the genome of cancer patients, and the prior art has also proposed models of metastasis prediction based on copy number variation, tumor burden factors, and the like. However, these studies ignore the complex heterogeneity of tumor microenvironments and lack studies of the function of different cell types in cancer metastasis.
Disclosure of Invention
With the development of second generation sequencing technologies (especially single cell sequencing technologies), the present invention proposes deeper and more interpretable models based on the obtained transcriptome sequencing data of more primary and metastatic cancer patients to predict the risk of metastasis and organ tendencies of the full cancer species and to have a deeper understanding of the cancer metastasis mechanism.
Specifically, based on cancer metastasis mechanism, the invention combines key influencing factors of cancer metastasis, collects and digs a plurality of cancer transcriptome sequencing data, and provides a systematic biological method for predicting the possibility and metastasis distribution of cancer metastasis after statistical analysis and verification. The method of the invention is a prediction method which is based on new transcriptome sequencing data and is suitable for all cancer species. The prediction result obtained by the method can accurately predict the risk of the patient for cancer metastasis and accurately predict the risk coefficient of the patient for different organ metastasis.
Based on the above findings, the present invention provides a method for predicting the risk of metastasis of tumor cells, comprising:
obtaining gene expression information of tumor cells to be detected;
predicting S of the tumor cell to be tested based on the gene expression information EMT 、S Tra And S is comm The method comprises the steps of carrying out a first treatment on the surface of the Wherein S is EMT Epithelial-mesenchymal transition (EMT) capability for representing tumor cells, S Trave Indicating the viability of tumor cells in blood circulation, S comm Indicating the ability of tumor cells to invade a tissue or organ;
then the risk coefficient S of the metastasis of the tumor cells to be detected is calculated by using the following formula Meta
S when tumor cells to be tested Meta Higher, it is predicted to have a higher risk of metastasis.
The invention further provides a method of predicting organ transfer tendencies of tumor cells comprising: a method of predicting the risk of metastasis of a tumor cell using the method as described above;
wherein S of tumor cells to be detected invading different tissues or organs is obtained comm When the tumor cells to be tested are transferred to a certain tissue or organ S comm Higher, a higher propensity for tumor cells to metastasize to the tissue or organ is predicted.
The invention further provides a method of predicting a prognosis of a tumor patient, comprising: a method of predicting the risk of metastasis of a tumor cell using the method as described above;
Wherein S of tumor cells to be detected is obtained Meta When S is derived from a test tumor cell of a tumor patient Meta Higher, the tumor patient is predicted to have a worse prognosis.
In a second aspect, the present invention also provides a tumor cell detection system comprising:
the acquisition module is configured to acquire gene expression information of tumor cells to be detected;
a calculation module configured to predict S of the tumor cells to be tested based on the gene expression information EMT 、S Trave And S is comm The method comprises the steps of carrying out a first treatment on the surface of the Wherein S is EMT Epithelial-mesenchymal transition (EMT) capability for representing tumor cells, S Tra Indicating the viability of tumor cells in blood circulation, S comm Indicating the ability of tumor cells to invade a tissue or organ; then the risk coefficient S of the metastasis of the tumor cells to be detected is calculated by using the following formula Meta
And an output module configured to output the signal when the tumor cell is detected Meta Higher, it is predicted to have a higher risk of metastasis.
In a third aspect, the invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements: a method of predicting the risk of metastasis of a tumor cell as described above, or a method of predicting the propensity of an organ to metastasize of a tumor cell as described above, or a method of predicting the prognosis of a tumor patient as described above.
The invention further provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing when executing the computer program: a method of predicting the risk of metastasis of a tumor cell as described above, or a method of predicting the propensity of an organ to metastasize of a tumor cell as described above, or a method of predicting the prognosis of a tumor patient as described above.
The invention further provides a computer program product comprising a computer program which, when read and executed by a computing device, causes the computing device to perform: a method of predicting the risk of metastasis of a tumor cell as described above, or a method of predicting the propensity of an organ to metastasize of a tumor cell as described above, or a method of predicting the prognosis of a tumor patient as described above.
The method of the invention has at least the following advantages:
1) Comprehensively: the invention can widely predict the transfer rule from different cancers to different organs at the level of all cancers, and the research on single cancer is more comprehensive than before.
2) Interpretability: the method can trace back cells and genes related to cancer metastasis according with biological rules, and has important biological research significance and value.
3) Accuracy: the invention has higher consistency with clinical statistics and cognitive results for many years on the prediction of most cancer species and patients.
The method can assist doctors in diagnosing and targeting cancer patients in time at the early stage of cancer metastasis, can guide the research and development of new anticancer drugs, and is beneficial to improving the prevention and cure rate of cancers.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of the prediction principle in an embodiment of the present invention; in the figure, A is a schematic diagram of the metastasis path of tumor cells; and B, the design idea of the embodiment of the invention is shown in the figure.
FIG. 2 is a distribution of overall population transfer tendencies in an embodiment of the present invention; in the figure, a diagram A is a predicted transition proportion condition; and B is the comparison of the predicted result and the clinical statistical result.
FIG. 3 is a graph showing the result of analysis of metastasis of a lung adenocarcinoma patient according to an embodiment of the present invention; in the figure, a is a graph of the transfer of 4 patients predicted by the method of the present invention; panel B shows the patient's duty cycle with high, medium, and low predisposition to metastasis for each metastasis; panel C shows the results of survival analysis for different groups of people.
FIG. 4 shows the results of analysis of colorectal cancer liver metastasis single cell data set in the examples of the present invention; in the figure, panel A shows the classification of CD 45-cell populations; panel B shows copy number variation of cell populations; c and D are the results of interaction analysis between two cancer cell data sets and normal liver microenvironment; figure E is a ligand-receptor pair that plays an important role in cancer cells with endothelial cells, macrophages and centromere cells.
FIG. 5 is a graph showing the results of analysis of lung adenocarcinoma liver metastasis single-cell data set in the example of the present invention; in the figure, a diagram shows the classification result of single cells; panel B shows copy number variation of cell populations; c, the information condition of each sample is shown in the figure; d, respectively carrying out interaction difference analysis results on two cancer cell data sets and normal bone microenvironment; figure E is a ligand-receptor pair that plays an important role in cancer cells with endothelial cells, macrophages and centromere cells. And F, comparing predicted results.
Detailed Description
The following describes specific embodiments of the present invention in detail. It will be understood that the embodiments described herein are for the purpose of illustration and explanation only and are not intended to limit the present invention, as many modifications and variations of the present invention may be made by those skilled in the art without departing from the scope or spirit thereof. For example, features illustrated or described as part of one embodiment can be used on another embodiment to yield still a further embodiment.
Unless otherwise defined, all terms (including technical and scientific terms) used to describe the invention have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. By way of further guidance, the following definitions are used to better understand the teachings of the present invention. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The term "and/or," "and/or," as used herein, includes any one of two or more of the listed items in relation to each other, as well as any and all combinations of the listed items in relation to each other, including any two of the listed items in relation to each other, any more of the listed items in relation to each other, or all combinations of the listed items in relation to each other. It should be noted that, when at least three items are connected by a combination of at least two conjunctions selected from "and/or", "or/and", "and/or", it should be understood that, in this application, the technical solutions certainly include technical solutions that all use "logical and" connection, and also certainly include technical solutions that all use "logical or" connection. For example, "a and/or B" includes three parallel schemes A, B and a+b. For another example, the technical schemes of "a, and/or B, and/or C, and/or D" include any one of A, B, C, D (i.e., the technical scheme of "logical or" connection), and also include any and all combinations of A, B, C, D, i.e., any two or three of A, B, C, D, and also include four combinations of A, B, C, D (i.e., the technical scheme of "logical and" connection).
The terms "comprising," "including," and "comprising," as used herein, are synonymous, inclusive or open-ended, and do not exclude additional, unrecited members, elements, or method steps.
The recitation of numerical ranges by endpoints of the present invention includes all numbers and fractions subsumed within that range, as well as the recited endpoint.
In the present invention, the terms "plurality", and the like refer to, unless otherwise specified, 2 or more in number.
In the invention, the technical characteristics described in an open mode comprise a closed technical scheme composed of the listed characteristics and also comprise an open technical scheme comprising the listed characteristics.
In the present invention, "preferred", "better", "preferred" are merely embodiments or examples which are better described, and it should be understood that they do not limit the scope of the present invention. In the present invention, "optional" means optional or not, that is, means any one selected from two parallel schemes of "with" or "without". If multiple "alternatives" occur in a technical solution, if no particular description exists and there is no contradiction or mutual constraint, then each "alternative" is independent.
In the present invention, the term "tumor" refers to a new organism (newgrowth) formed by local tissue cell proliferation under the action of various tumorigenic factors, because the new organism is often in the form of a occupying massive protrusion, which is also called neoplasm (neoplasm). The "tumor" in the present invention includes "benign tumor" and "malignant tumor", however, benign tumor generally does not undergo metastasis and prognosis is generally good, so the present invention is more applicable to detection of malignant tumor cells.
The term "malignant tumor" referred to herein, also called "cancer", refers to cancer, tumor in the traditional Chinese medicine, and refers to abnormal proliferation of cells, and these proliferated cells may invade other parts of the body, which is a disease caused by the disorder of the mechanism of controlling cell division proliferation. In addition to uncontrolled division, cancer cells can also invade locally surrounding normal tissues and metastasize to other parts of the body via even the circulatory or lymphatic systems in the body.
The term "tumor cell metastasis" as used herein refers to the process by which tumor cells continue to grow from the site where they originally occurred by invading the circulatory system to other parts of the body. Often benign tumors do not produce distant metastases, and patients with metastases have very poor prognosis. Cancer therapy becomes more difficult also because cancer cells metastasize to various parts of the body.
In the present invention, the term "tumor microenvironment" refers to the surrounding microenvironment in which tumor cells are present, including surrounding blood vessels, immune cells, fibroblasts, bone marrow-derived inflammatory cells, various signaling molecules, and extracellular matrix. The tumor is closely related to the surrounding environment and continuously interacts, the tumor can influence the microenvironment by releasing cell signal molecules, the angiogenesis of the tumor is promoted and the immune tolerance is induced, and immune cells in the microenvironment can influence the growth and development of cancer cells.
Method for predicting tumor cell metastasis risk
The present invention found that the following three steps have a key impact in the metastasis of tumor cells (as shown in fig. 1A): first, tumor cells invade from the primary site to the surrounding and enter the blood circulation through the epithelial-mesenchymal transition (EMT) process, a major process regulated by many transcription factors associated with the EMT process. The tumor cells then enter the blood circulation system and evade the immune cells, which circulate along with the blood to other organs. Finally, when the tumor cells reach other organs, the tumor cells need to adapt to and invade the microenvironment in the new organs, thereby completing metastasis. According to several steps of cancer metastasis, the invention further combes out several key factors determining cancer metastasis such as epithelial cells, blood cells, tissue microenvironment, etc. for further research. The statistical analysis is carried out by collecting the data of a plurality of databases, a technical prediction model is established, and the collected clinical statistical results are compared through the prediction model, so that the accuracy of the prediction results is verified.
Based on the above findings, the present invention provides a method for predicting the risk of metastasis of tumor cells, comprising:
obtaining gene expression information of tumor cells to be detected;
predicting S of the tumor cell to be tested based on the gene expression information EMT 、S Tra And S is comm The method comprises the steps of carrying out a first treatment on the surface of the Wherein S is EMT Epithelial-mesenchymal transition (EMT) capability for representing tumor cells, S Travel Indicating the viability of tumor cells in blood circulation, S comm Indicating the ability of tumor cells to invade a tissue or organ;
then the risk coefficient S of the metastasis of the tumor cells to be detected is calculated by using the following formula Meta
S when tumor cells to be tested Meta Higher, it is predicted to have a higher risk of metastasis.
In some embodiments, the test tumor cells are S EMT The method is based on the following steps: gene expression information of tumor cells to be testedComparing EMT related gene sets, calculating the gene expression similarity, thereby obtaining the S of the tumor cells to be detected EMT
In some specific embodiments, the EMT-related gene set is derived from the dbEMT 2.0 database.
In some specific embodiments, the similarity of gene expression of tumor cells to the EMT-related gene set is calculated by ssGSEA algorithm.
In some embodiments, the test tumor cells are S Travel The method is based on the following steps: predicting interaction strength of the tumor cells to be detected and different immune cells in blood based on the gene expression information of the tumor cells to be detected and the gene expression information of the cells in the blood, setting weights based on the existence proportion of the immune cells in the blood, and obtaining S of the tumor cells to be detected by weighting and summing Trave
In some preferred embodiments, the immune cells in the blood include one or more of centromeres, platelets, and natural killer cells. Wherein, S of tumor cells to be detected Travel The interaction strength of the tumor cells and the centromere cells is positively correlated, and S of the tumor cells to be detected Travel The interaction strength of the tumor cells and the platelets is positively correlated, and S of the tumor cells to be detected Tra The interaction strength with tumor cells and natural killer cells is inversely related.
Among them, the centromere and platelets have the ability to protect cancer cells from survival, while natural killer cells kill cancer cells.
In some embodiments, the test tumor cells are S Travel The method is based on the following steps:
generating gene expression information of tumor cells and S based on the existing tumor metastasis data set, tumor non-metastasis data set and normal blood immune cell data set Travel An associated first predictive model;
obtaining key data from gene expression information of the tumor cells to be detected by using the first prediction model and distributing weights to calculate S of the tumor cells to be detected Travel
In some embodiments, the test tumor cells are S comm The method is based on the following steps:
predicting the interaction condition of tumor cells and the main cells of the tissue microenvironment based on the gene expression information of the tumor cells to be detected and the gene expression information of the main cells of the tissue microenvironment, setting weights based on the proportion of the main cells in the tissue microenvironment, and subtracting the weighted sum of the interaction intensities between the tumor cells and the main cells in the tissue microenvironment from the weighted sum of the interaction intensities between the tumor cells and the main cells in the tissue microenvironment to obtain S of the tumor cells to be detected comm
In practice, the person skilled in the art can ascertain the type of the main cells of each tissue microenvironment in combination with the general knowledge.
In some embodiments, the major cell types within the tissue microenvironment are confirmed by: the R software package Harmony was used to integrate the human single cell map datasets from the different studies and re-annotate the cell type of each tissue or organ separately to obtain the main cell type of each tissue microenvironment.
By way of example, the tissue or organ may include bone, brain, lung, liver, and the like, common sites of metastasis.
In some embodiments, the test tumor cells are S comm The method is based on the following steps:
generating gene expression information of tumor cells and S based on the existing tumor metastasis data set, tumor non-metastasis data set and normal tissue cell data set comm An associated second predictive model;
obtaining key data from the gene expression information of the tumor cells to be detected by using the second prediction model and distributing weights to calculate S of the tumor cells to be detected comm
In particular implementations, the weighted sum of the interactions mentioned in the present invention can be determined in combination with the corresponding scalar weights of actual scales of different magnitudes, by linear or nonlinear, algebraic, trigonometric orThe related method further combines the same into a single scalar value via algebraic algorithm, statistical learning algorithm, bayesian algorithm, regression algorithm or similar algorithm, and provides the first prediction model or the second prediction model together with a mathematically derived decision function of the scalar value, through which the gene expression information from the tumor cells to be tested can be resolved into corresponding S Trave Value or S comm Values. The first predictive model or the second predictive model may also be developed by one of skill in the art with a set of representative expression profiles (or other representative expression profiles of relevant characterizations) of an existing tumor metastasis dataset by learning weights and decision thresholds under cross-validation, self-help or similar sampling techniques to optimize sensitivity, specificity, negative and positive predictive values, hazard ratios, or any combination thereof.
The invention refers to a weighted sum of interactions, where the individual weights may be positive or negative.
For sequencing technology reasons, some existing datasets contain many different kinds of cells for each sample, while removing gene expression information from other cells (including epithelial cells, immune cells, fibroblasts, etc.) would be beneficial to improve the accuracy of the predictive model. Thus, in some embodiments, when obtaining the gene expression information of tumor cells from existing tumor metastasis data sets and tumor non-metastasis data sets, only one sample is selected for each individual, and only the gene expression information of primary tumor cells is retained, while the gene expression information of other cells is removed.
In some preferred embodiments, the gene expression information of other cells is screened for by marker genes and/or copy number variation.
In some cases, it is not possible to distinguish tumor cells from other cells by the marker gene, in which case copy number variation can be used for differentiation.
In some specific embodiments, the cibersortx algorithm is used to remove gene expression information from other cells.
In some embodiments, the InferCNV algorithm is used to infer the copy number variation of cells, and cells in which the copy number variation is greater are inferred as tumor cells.
Reference herein to "existing tumor metastasis data sets" includes reference to the U.S. tumor gene map (TCGA) data.
In detecting a particular type of cancer, one skilled in the art can also introduce existing "tumor metastasis data sets" and "tumor non-metastasis data sets" of the corresponding types. For example, in the method for the metastasis risk of liver cancer cells, the existing "liver cancer metastasis data set" and "liver cancer non-metastasis data set" may be combined to generate or perfect a corresponding prediction model.
The term "normal blood immune cell data set" as referred to in the present invention means a data set containing information on the expression of normal blood immune cells. The term "normal tissue cell data set" as used herein refers to a data set containing information on expression of normal tissue cells. In some embodiments, the person skilled in the art can obtain the above information simultaneously based on well-known human normal cell data sets.
In some embodiments, the method further comprises:
providing the step S Meta A threshold associated with a risk of transfer;
s of tumor cells to be tested Meta When compared with the threshold value, when S Meta Above the threshold value, the tumor cells to be tested are predicted to have a risk of metastasis.
In some embodiments, the threshold may be validated based on a comparative analysis of existing tumor metastasis data sets and tumor non-metastasis data sets.
In some embodiments, multiple cutoff values may be validated based on a comparative analysis of existing tumor metastasis data sets and tumor non-metastasis data sets, and the S of tumor cells to be tested Meta And comparing with the cut-off value, thereby obtaining the conclusion that the tumor cells to be detected have different degrees of metastasis risks (such as lower metastasis risk, medium metastasis risk or higher metastasis risk and the like).
In some embodiments, the gene expression information is transcriptome information.
The person skilled in the art can combine the general knowledge to confirm specific methods of obtaining information on gene expression, such as microarrays, Q-PCR, immunohistochemistry, ELISA or other techniques capable of quantifying mRNA or protein expression.
In some embodiments, the gene expression information in the present invention is obtained by ribonucleic acid sequencing or single cell sequencing.
The term "ribonucleic acid sequencing" referred to in the present invention is abbreviated as RNA sequencing, also called whole transcript profiling (Whole Transcriptome Shotgun Sequencing). Is a transcriptomics research method based on a second generation sequencing technology. RNA sequencing is a technique that uses the ability of second generation sequencing to reveal one snapshot of the presence and quantity of RNA from one genome at a given moment.
The term "single cell sequencing" as used herein refers to the detection of single cell sequences by using the optimized next generation sequencing technology (NGS), and cell sequence differences under specific microenvironments can be obtained to facilitate the study of functional differences, etc. DNA sequencing of individual cells can help to understand the variation of small range cells, for example, in cancer; the RNA sequencing can help to know and identify different cell types and the expressed genes, and has great benefits in research on developmental biology and the like.
In some embodiments, the test tumor cells are derived from a biological sample of an individual to be tested that is suspected of having a tumor or has been diagnosed with a tumor.
As referred to herein, an individual "suspected of having a tumor" includes an individual in an observation period after a tumor has healed.
In some embodiments, the nucleic acid may comprise an RNA or DNA nucleic acid, e.g., mRNA, cRNA, cDNA, etc., provided that the sample retains information on the expression of the host cell or tissue from which it was obtained. The sample may be prepared in a number of different ways known in the art, for example, by isolating mRNA from cells, as known in the art of differential gene expression, wherein the isolated mRNA is used as isolated, amplified, or used to prepare cDNA, cRNA, and the like. Thus, determining the level of mRNA in a sample includes preparing cDNA or cRNA from the mRNA, and then measuring the cDNA or cRNA. Samples are typically prepared using standard protocols from cells or tissues harvested from a subject in need of treatment, wherein the cell type or tissue from which the nucleic acid may be produced includes any tissue in which the expression type of the phenotype to be determined is present, including, but not limited to, diseased cells or tissues, body fluids, and the like.
"biological sample," "sample," and "test sample" are used interchangeably herein to refer to any substance, biological fluid, tissue, or cell obtained or otherwise obtained from an individual. It includes blood (including whole blood, white blood cells, peripheral blood mononuclear cells, buffy coat, plasma, and serum), sputum, tears, mucus, nasal washes, nasal aspirates, breath-like, urine, semen, saliva, meningeal fluid, amniotic fluid, glandular fluid, lymph, nipple aspirates, bronchial aspirates, synovial fluid, joint aspirates, ascites, cells, cell extracts, and cerebrospinal fluid. It also includes all of the experimentally isolated fractions described above. For example, a blood sample may be fractionated into serum or into fractions containing specific types of blood cells, such as red blood cells or white blood cells (white blood cells). If desired, the sample may be a combination of samples from an individual, such as a combination of tissue and fluid samples. The term "biological sample" also includes substances containing homogeneous solid matter, such as from a fecal sample, a tissue sample or biopsy. The term "biological sample" also includes materials derived from tissue culture or cell culture. Any suitable method for obtaining a biological sample may be utilized; exemplary methods include, for example, phlebotomy, wiping (e.g., oral wiping), and fine needle biopsy procedures. Samples may also be collected by, for example, microdissection (e.g., laser Capture Microdissection (LCM) or Laser Microdissection (LMD)), bladder irrigation, smear (e.g., PAP smear), or catheter lavage. "biological sample" obtained from or derived from an individual includes any such sample that has been treated in any suitable manner after being obtained from the individual.
In some embodiments, the tumor comprises one or more of breast cancer, colorectal cancer, lung cancer, liver cancer, pancreatic cancer, skin cancer.
Method for predicting organ metastasis tendencies of tumor cells
The present invention also provides a method of predicting organ transfer tendencies of tumor cells, comprising: a method of predicting the risk of metastasis of a tumor cell using the method as described above;
wherein S of tumor cells to be detected invading different tissues or organs is obtained comm When the tumor cells to be tested are transferred to a certain tissue or organ S comm Higher, a higher propensity for tumor cells to metastasize to the tissue or organ is predicted.
In some embodiments, the method further comprises:
providing the step S comm A threshold associated with a tendency to transfer;
s of tumor cells to be tested comm When compared with the threshold value, when S comm Above the threshold value, the tumor cells to be tested are predicted to have a tendency to metastasize to the tissue or organ.
In some embodiments, the threshold may be validated based on a comparative analysis of existing tumor metastasis data sets and tumor non-metastasis data sets.
In some embodiments, multiple cutoff values may be validated based on a comparative analysis of existing tumor metastasis data sets and tumor non-metastasis data sets, and the S of tumor cells to be tested comm Comparing with the cut-off value, thereby obtaining the conclusion that the tumor cells to be tested have different degrees of metastasis (such as high metastasis tendency, medium metastasis tendency or low metastasis tendency, etc.) to the tissue or organ.
Method for predicting prognosis of tumor patient
The invention also provides a method of predicting a prognosis of a tumor patient, comprising: a method of predicting the risk of metastasis of a tumor cell using the method as described above;
wherein S of tumor cells to be detected is obtained Meta When S is derived from a test tumor cell of a tumor patient Meta Higher, predictionThe tumor patient had a worse prognosis.
In some embodiments, the method further comprises:
providing the step S Meta A threshold value associated with a prognostic condition;
s of tumor cells to be tested Meta When compared with the threshold value, when S Meta Above the threshold, the test tumor cells are predicted to have a poor prognosis.
In some embodiments, the threshold may be validated based on a comparative analysis of existing tumor metastasis data sets and tumor non-metastasis data sets.
In some embodiments, multiple cutoff values may be validated based on a comparative analysis of existing tumor metastasis data sets and tumor non-metastasis data sets, and the S of tumor cells to be tested Meta And comparing with the cut-off value, thereby obtaining the conclusion that the tumor cells to be detected have different degrees of prognosis conditions (such as better prognosis, poor prognosis or worse prognosis, etc.).
Other applications
The method for predicting the tumor cell metastasis risk can be applied to auxiliary diagnosis of tumor patients. The method can be used for clinically measuring the metastasis risk of tumor cells, and comprehensively diagnosing the tumor metastasis risk, organ tendency, prognosis condition and the like of a patient by combining the age, past medical history, clinical symptoms, related detection indexes and the like of the tumor patient.
The method for predicting the tumor cell metastasis risk can be applied to the research and development of new anticancer drugs. The effect of the drug in reducing the risk of tumor cell metastasis can be determined by determining the risk of tumor cell metastasis before and after dosing (or by setting the dosing experimental group and the non-dosing control group).
Tumor cell detection system
The present invention further provides a tumor cell detection system comprising:
the acquisition module is configured to acquire gene expression information of tumor cells to be detected;
a calculation module configured to predict S of the tumor cells to be tested based on the gene expression information EMT 、S Travel And S is comm The method comprises the steps of carrying out a first treatment on the surface of the Wherein S is EMT Epithelial-mesenchymal transition (EMT) capability for representing tumor cells, S Travel Indicating the viability of tumor cells in blood circulation, S comm Indicating the ability of tumor cells to invade a tissue or organ; then the risk coefficient S of the metastasis of the tumor cells to be detected is calculated by using the following formula Meta
And an output module configured to output the signal when the tumor cell is detected Meta Higher, it is predicted to have a higher risk of metastasis.
In some embodiments, the computing module includes S EMT A calculation unit, the S EMT The calculating unit is configured to compare the gene expression information of the tumor cells to be detected with the EMT related gene set, calculate the similarity of the gene expression, and thereby obtain S of the tumor cells to be detected EMT
In some embodiments, the computing module includes S Trave A calculation unit, the S Travel The calculation unit is configured to predict interaction strength of the tumor cells to be detected and different immune cells in the blood based on the gene expression information of the tumor cells to be detected and the gene expression information of the cells in the blood, set weights based on the existence proportion of the immune cells in the blood, and obtain S of the tumor cells to be detected by weighted summation Trave
In some preferred embodiments, the immune cells in the blood include one or more of centromeres, platelets, and natural killer cells. Wherein, S of tumor cells to be detected Trave The interaction strength of the tumor cells and the centromere cells is positively correlated, and S of the tumor cells to be detected Trave The interaction strength of the tumor cells and the platelets is positively correlated, and S of the tumor cells to be detected Trave With tumor cells and natureThe intensity of killer cell interactions is inversely related.
In some embodiments, the computing module includes S comm A calculation unit, the S comm The calculation unit is configured to predict the interaction condition of the tumor cells and the main cells of the tissue microenvironment based on the gene expression information of the tumor cells to be detected and the gene expression information of the main cells of the tissue microenvironment, set weights based on the proportion of the main cells in the tissue microenvironment, and subtract the weighted sum of the interaction intensities between the tumor cells and the main cells in the tissue microenvironment to obtain S of the tumor cells to be detected comm
In some embodiments, the computing module includes a model generation unit; s, S Tr Computing unit and/or S comm A calculation unit;
the model generation unit is configured to generate a first predictive model based on the existing tumor metastasis data set, tumor non-metastasis data set and normal blood immune cell data set, and/or to generate a second predictive model based on the existing tumor metastasis data set, tumor non-metastasis data set and normal tissue cell data set;
The first prediction model combines the gene expression information of tumor cells with S Trave Associating;
the second prediction model combines the gene expression information of tumor cells with S comm Associating;
the S is Travel The calculation unit is configured to obtain key data from the gene expression information of the tumor cells to be detected by using the first prediction model and assign weights to calculate S of the tumor cells to be detected Travel
The S is comm The calculation unit is configured to obtain key data from the gene expression information of the tumor cells to be detected by using the second prediction model and assign weights to calculate S of the tumor cells to be detected comm
In some embodiments, when obtaining the gene expression information of tumor cells from existing tumor metastasis data sets and tumor non-metastasis data sets, each individual selects only one sample, and only retains the gene expression information of primary tumor cells, while removing the gene expression information of other cells; more preferably, the gene expression information of other cells is screened out by the marker gene and/or copy number variation.
In some embodiments, the output module includes a first output unit configured to provide the step of inputting the S Meta A threshold associated with a risk of transfer; s of tumor cells to be tested Meta And comparing with the threshold value, and predicting that the tumor cells to be tested have metastasis risk when the test score is higher than the threshold value.
In some embodiments, the output module further comprises a second output unit configured to obtain S of invasion of the tumor cells to be tested into different tissues or organs comm When the tumor cells to be tested are transferred to a certain tissue or organ S comm Higher, a higher propensity for tumor cells to metastasize to the tissue or organ is predicted.
In some embodiments, the output module further comprises a third output unit configured to obtain S of the tumor cells to be tested Met When S is derived from a test tumor cell of a tumor patient Meta Higher, the tumor patient is predicted to have a worse prognosis.
In some embodiments, the tumor cell detection system further comprises:
the detection end is used for detecting gene expression information of tumor cells;
the detection end is connected with the acquisition module through communication.
In practice, the skilled artisan can set up the corresponding detection end in combination with conventional methods of detecting gene expression information, such as microarray, Q-PCR, immunohistochemistry, ELISA, or other techniques capable of quantifying mRNA or protein expression, and the like.
In some preferred embodiments, the gene expression information is transcriptome information.
In some preferred embodiments, the detection end comprises a ribonucleic acid sequencing device or a single cell sequencing device.
In some embodiments, the test tumor cells are derived from a biological sample of an individual to be tested who is suspected of having a tumor or has been diagnosed with a tumor.
In some specific embodiments, the tumor comprises one or more of breast cancer, colorectal cancer, lung cancer, liver cancer, pancreatic cancer, skin cancer.
In some alternative embodiments, the tumor cell detection system further comprises:
the device control module is used for controlling the operation of the detection end, the acquisition module, the calculation module and the output module, and the device control module is connected with the detection end, the acquisition module, the calculation module and the output module through communication.
In practice, the person skilled in the art can also configure the corresponding tumor cell detection system based on the embodiments described in the methods and applications section above.
Non-transitory computer readable storage medium
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements: a method of predicting the risk of metastasis of a tumor cell as described above, or a method of predicting the propensity of an organ to metastasize of a tumor cell as described above, or a method of predicting the prognosis of a tumor patient as described above.
Electronic equipment
The invention further provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing when executing the computer program: a method of predicting the risk of metastasis of a tumor cell as described above, or a method of predicting the propensity of an organ to metastasize of a tumor cell as described above, or a method of predicting the prognosis of a tumor patient as described above.
Computer program product
The invention further provides a computer program product comprising a computer program which, when read and executed by a computing device, causes the computing device to perform: a method of predicting the risk of metastasis of a tumor cell as described above, or a method of predicting the propensity of an organ to metastasize of a tumor cell as described above, or a method of predicting the prognosis of a tumor patient as described above.
Embodiments of the present invention will be described in detail below with reference to examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. The experimental methods in the following examples, in which specific conditions are not noted, are preferably referred to in the guidelines given in the present invention, and may be according to the experimental manuals or conventional conditions in the art, and may be referred to other experimental methods known in the art, or according to the conditions suggested by the manufacturer.
In the specific examples described below, the measurement parameters relating to the raw material components, unless otherwise specified, may have fine deviations within the accuracy of weighing. Temperature and time parameters are involved, allowing acceptable deviations from instrument testing accuracy or operational accuracy.
Example 1
1. Research method
The research method of the invention is mainly divided into the following steps (shown in fig. 1B). First, U.S. tumor genetic map (TCGA) data and human single cell map data were collected as cancer cell and normal tissue cell gene expression data, respectively. Next, different scores are presented based on several important steps of metastasis occurrence, which are used to measure the likelihood of metastasis occurrence. Finally, models were applied to single cell datasets of colorectal and breast cancer liver metastases, respectively, and the variability of patients with and without metastases was compared.
1.1TCGA data gathering and deconvolution
Transcriptome sequencing data and patient clinical information were downloaded from the U.S. tumor genome map grid (https:// portal. Gdc. Cancer. Gov /). Based on the known types of cancer that frequently metastasize, 7 important cancer types were selected for prediction, including breast cancer (BRCA), colorectal Cancer (COAD), lung cancer (LUAD, lucc), liver cancer (LIHC), pancreatic cancer (PAAD), and skin cancer (SKCM). For each cancer, only primary cancer was retained for prediction of metastatic status, and only one sample was selected for each patient. For sequencing technology reasons, each sample contains many different kinds of cells. In order to obtain purer tumor cell expression, the expression parts of epithelial cells, immune cells and fibroblasts in the sample are removed by adopting a cibersortx algorithm, and the expression parts of the remaining tumor cells are used for the subsequent study. In addition, clinical data from these patients has been collected including the size or extent of the tumor itself, whether node invasion and metastasis has occurred, distant metastasis, and the like.
1.2 human Single cell map collection and pretreatment
In order to understand the expression of transcriptome in the microenvironment inside different tissues and organs under normal conditions, the expression of tissues and organs as common metastasis sites including bones, brain, lung, liver, etc. were collected from several important human single cell map studies. Single cell map datasets from different studies were integrated using the R software package Harmony. The cell types of each tissue organ are then re-annotated separately, resulting in the primary cell types of each tissue microenvironment.
1.3 calculation of the likelihood of transfer scoring
The epithelial-mesenchymal transition (EMT) process plays a vital role in the occurrence of tumor metastasis. From the dbEMT 2.0 database, important gene sets related to EMT were found. Calculating the gene expression similarity of tumor cells through a single gene enrichment analysis algorithm ssGSEA, and marking the similarity as S EMT
Another factor that determines tumor metastasis is the ability of cancer cells to travel to different tissues and organs with blood circulation. Here, the interaction of cancer cells with immune cells in blood is calculated to determine whether cancer cells survive in blood circulation, denoted S Trave . In particular the number of the elements,the central granulocytes and platelets in the blood help to protect cancer cells from survival in the blood, while natural killer cells kill cancer cells. So that the interaction intensity of the cancer cells and the cells is respectively considered, the interaction intensity of the tumor cells to be detected and different immune cells in the blood is predicted based on the gene expression information of the tumor cells to be detected and the gene expression information of the cells in the blood, weights are set based on the existence proportion of the immune cells in the blood, and then the weights are weighted and summed to obtain S Trave
Finally, the present invention contemplates that cancer cells can only invade a new microenvironment when they interact with cells within the tissue microenvironment more strongly than they do in the tissue. Based on the gene expression information of the tumor cells to be detected and the gene expression information of the main cells of the tissue microenvironment, the interaction condition of the cancer cells and the main cells of the tissue microenvironment and the interaction condition inside the microenvironment are calculated respectively. Setting weight based on the proportion of each main cell in the tissue microenvironment, subtracting the weighted sum of the interaction intensities between the main cells in the tissue microenvironment from the weighted sum of the interaction intensities between the tumor cells and the main cells in the tissue microenvironment, comparing whether the interaction intensity between the tumor cells and the tissue cells can be stronger than the interaction intensity between the cells in the tissue, calculating the capability of the cancer cells to invade the tissue organ, and marking as S comm
Since cancer metastasis is determined by a combination of these three effects, S is used Meta Representing the risk factor for a patient to develop a metastasis.
Because it is calculated separately for each tissue organ, this score also measures organ transfer tendencies. All samples of a certain cancer in the TCGA database were used to model all patients with that cancer, so that the desire to metastasize to different locations could be calculated to predict the overall metastatic rate of the population.
2. Results of the study
2.1 prediction of overall metastatic organ propensity for population
The model is applied to predict the tendencies of the metastatic organs of six common cancers (breast cancer, lung cancer, liver cancer, colorectal cancer, pancreatic cancer, skin cancer) to seven tissues and organs (brain, bone marrow, liver, lung, bladder, kidney, lymph nodes) which frequently undergo metastasis. Further, to illustrate the accuracy of the predictions, a recent study was also chosen to obtain a distribution of clinical statistics of various cancer metastases to different organs. This clinical statistic was used as the gold standard for the evaluation algorithm. FIG. 2A shows the predicted metastatic ratio using a sankey plot, with the left representing a primary carcinoma and the right representing a metastatic site. From this figure, lung cancer can be found to be the most metastatic cancer, followed by colorectal and pancreatic cancer. In another aspect, the lung, liver is the location where cancer metastasis is most likely to occur, wherein cancers associated with the digestive system would be more prone to metastasize to the liver. The comparison of the predicted outcome with the clinical statistics is shown in fig. 2B. In fig. 2B, each dot represents the probability of a cancer metastasizing to an organ, wherein the color of the dot represents a primary cancer and the shape of the dot represents a metastasis to a tissue organ. The X-axis coordinates represent the results of clinical statistics, while the Y-axis coordinates represent the predicted results. Intuitively, the result is mostly located near the diagonal; the calculated average error bit is 13.62%, and the p value of the paired wilcoxon test is 0.854, which shows that the predicted result is consistent with statistics in the actual population.
2.2 analysis of metastasis of patients with lung adenocarcinoma
The lung adenocarcinoma patient will be exemplified below. For each lung adenocarcinoma patient of the TCGA database, the preamble S can be calculated Meta To obtain their risk scores for metastasis to different tissues and organs. Figure 3A shows the transfer of the first 4 patients. They all tend to metastasize into bone marrow, liver and lymph nodes, with lower probability of brain and kidney metastasis. At the same time, there is a large difference in the tendency of patients to metastasize, the first patient tending to metastasize to the lymph nodes and the second patient tending to metastasize to the liver. Based on the transfer risk factor score, the overall disease is treatedHumans are classified into three categories, high transfer tendencies (greater than 0.4), medium transfer tendencies (less than 0.4 greater than 0.2) and low transfer tendencies (less than 0.2). Three types of patients in which each transfer occurs are shown in figure 3B. It can be seen that a large percentage of people may develop at least one metastasis, whereas bone metastasis and kidney metastasis are more at lower risk than others. Survival analysis was performed on different groups of people in combination with clinical data (fig. 3C). It is evident that patients at high risk of metastasis have a worse prognosis, while patients at lower risk of metastasis have a longer survival. The classification of the side explanation can reasonably distinguish the transfer risk of patients, and patients with low transfer risk can be selected.
2.3 analysis of colorectal cancer liver metastasis data
In order to verify that the model of the invention is also applicable to single cell datasets, data from two recent studies on colorectal cancer were chosen. All patients in the study (Zhang, lei, et al Cell 181.2 (2020): 442-459) did not find liver metastasis in the patients (Liu, yeman, et al cancer Cell 40.4 (2022): 424-437). The present invention seeks to compare the variability of the primary cancer data of the two articles. For this, a population of CD 45-cells from (Liu, yeman, et al cancer Cell 40.4 (2022): 424-437) was first selected for clustering, and these cells were divided into epithelial/cancerous cells, endothelial cells, fibroblasts, and other 4 major classes by expression of some major marker genes (FIG. 4A). Since cancer cells were heavily pooled with epithelial cells, without a specific marker gene, the present invention uses the InterCNV algorithm to infer their copy number variation and compares it with endothelial cells (FIG. 4B). Cells in which the copy number variation is large are considered as cancer cells. Thus, by the first two steps of screening, so-called clean cancer cells were found from among CD 45-cells. Cancer cells in the dataset were found by the same method (Zhang, lei, et al cell 181.2 (2020): 442-459). The two cancer cell datasets were each analyzed for interactions with normal liver microenvironment, as shown in fig. 4C and fig. 4D. By comparison, it was found that liver metastasis occurred more in interaction with macrophages, monocytes, neutrophils, etc. inside the liver. Indicating that the interaction of cancer cells with them may contribute to invasion of cancer cells into the liver, promoting tumor metastasis. Further, ligand-receptor pairs that play an important role in cancer cells and endothelial cells, macrophages and centromeres can be found by Cellchat calculations (fig. 4E). It can be seen that the ANXA family, COL family ligands and ITGAM receptors play a key role in the interaction of cancer cells with the liver microenvironment, which is also consistent with previous reports. These important ligand-receptors are of great importance for a deeper disclosure of the tumor metastasis mechanism.
2.4 analysis of lung adenocarcinoma metastasis data
To investigate the applicability of the method of the invention to different cancer species, this example further gathered a dataset from recent studies of lung adenocarcinoma (Laughney, ashley m., et al Nature media 26.2 (2020): 259-269). Likewise, the primary cell type of the data was first found using a single cell clustering algorithm in this example (fig. 5A). And the copy number variation was found using the Copykat algorithm and the cancer cell proportion was extrapolated therefrom (fig. 5B). For 16 patient samples of the primary data, 10 were derived from primary lung cancer, 4 were derived from brain metastases, and 2 were derived from bone metastases, all from different samples (fig. 5C). By studying the interactions between cells, the differences in interactions between cancers from the primary site and metastatic samples were compared (fig. 5D). It can be seen that cancer cells of the metastatic sample interact more significantly with the microenvironment than the primary cancer sample, especially with neutrophils, macrophages, etc. This interaction promotes faster metastasis of cancer cells into the brain or bone. Among them, ligand receptors are particularly apparent for CCL2-CXCR1 and ITGA2-IGBRE2 in the interaction of cancer cells with macrophages (FIG. 5E). In order to verify the rationality of the method of the invention, the data of the primary sample and the samples of known metastases are respectively predicted, and their probabilities of metastasis to the bone and brain are deduced. The predicted outcome showed that the bone metastasis samples predicted a probability of developing bone metastasis of 57.4% that was far greater than the 32.1% probability of metastasis for samples without bone metastasis. Likewise, the probability of occurrence of brain metastasis predicted by the brain metastasis sample is 62.3% greater than 36.7% of non-occurrence (fig. 5F). These results demonstrate that the predictive method of the present invention provides a good basis for the likelihood of metastasis in different locations in a patient, and helps to predict metastasis.
The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention.

Claims (24)

1. A method of predicting the risk of metastasis of a tumor cell, comprising:
obtaining gene expression information of tumor cells to be detected;
predicting S of the tumor cell to be tested based on the gene expression information EMT 、S Travel And S is comm The method comprises the steps of carrying out a first treatment on the surface of the Wherein S is EMT Epithelial-mesenchymal transition (EMT) capability for representing tumor cells, S Trave Indicating the viability of tumor cells in blood circulation, S comm Indicating the ability of tumor cells to invade a tissue or organ;
then the risk coefficient S of the metastasis of the tumor cells to be detected is calculated by using the following formula Meta
S when tumor cells to be tested Meta Higher, it is predicted to have a higher risk of metastasis.
2. The method of predicting the risk of metastasis of tumor cells of claim 1, wherein the tumor cells to be tested are S EMT The method is based on the following steps:
comparing the gene expression information of the tumor cells to be detected with the EMT related gene set, and calculating the gene expression similarity so as to obtain S of the tumor cells to be detected EMT
3. The method of predicting the risk of metastasis of tumor cells according to claim 1 or 2, wherein the tumor cells to be tested are S Travel The method is based on the following steps:
predicting interaction strength of the tumor cells to be detected and different immune cells in blood based on the gene expression information of the tumor cells to be detected and the gene expression information of the cells in the blood, setting weights based on the existence proportion of the immune cells in the blood, and obtaining S of the tumor cells to be detected by weighted summation Travel
Wherein immune cells in blood include central granulocytes, platelets and natural killer cells; s of tumor cells to be tested Travel The interaction strength of the tumor cells and the centromere cells is positively correlated, and S of the tumor cells to be detected Travel The interaction strength of the tumor cells and the platelets is positively correlated, and S of the tumor cells to be detected Travel The interaction strength with tumor cells and natural killer cells is inversely related.
4. A method according to claim 1 or 3, wherein the tumor cells to be tested are S Trave The method is based on the following steps:
generating gene expression information of tumor cells and S based on the existing tumor metastasis data set tumor non-metastasis data set and normal blood immune cell data set Travel An associated first predictive model;
obtaining key data from gene expression information of the tumor cells to be detected by using the first prediction model and distributing weights to calculate S of the tumor cells to be detected Trave
5. The method for predicting the risk of metastasis of tumor cells according to any one of claims 1 to 4, wherein the tumor cells to be tested are S comm The method is based on the following steps:
based on the gene expression information of the tumor cells to be detected and the gene expression information of the main cells of the tissue microenvironment, the method comprises the following steps ofMeasuring the interaction condition of tumor cells and main cells in the tissue microenvironment, setting weights based on the proportion of the main cells in the tissue microenvironment, and subtracting the weighted sum of the interaction intensities of the tumor cells and the main cells in the tissue microenvironment from the weighted sum of the interaction intensities of the tumor cells and the main cells in the tissue microenvironment to obtain S of the tumor cells to be measured comm
6. The method of predicting the risk of metastasis of tumor cells of claim 1 or 5, wherein the tumor cells to be tested are S comm The method is based on the following steps:
generating gene expression information of tumor cells and S based on the existing tumor metastasis data set, tumor non-metastasis data set and normal tissue cell data set comm An associated second predictive model;
obtaining key data from the gene expression information of the tumor cells to be detected by using the second prediction model and distributing weights to calculate S of the tumor cells to be detected comm
7. The method for predicting the risk of metastasis of tumor cells according to claim 4 or 6, wherein,
when the gene expression information of tumor cells is obtained from the existing tumor metastasis data set and tumor non-metastasis data set, only one sample is selected for each individual, and only the gene expression information of primary tumor cells is reserved, and the gene expression information of other cells is removed;
the gene expression information of other cells is preferably screened for by marker genes and/or copy number variation.
8. The method of predicting the risk of metastasis of a tumor cell of any one of claims 1-7, wherein the method further comprises:
providing the step S Meta A threshold associated with a risk of transfer;
s of tumor cells to be tested Meta When compared with the threshold value, when S Meta Above the thresholdAnd predicting that the tumor cells to be tested have a metastasis risk when the values are obtained.
9. The method of predicting the risk of metastasis of a tumor cell of any one of claims 1-8, wherein the gene expression information is transcriptome information.
10. The method of predicting the risk of metastasis of tumor cells of any one of claims 1-9, wherein the test tumor cells are derived from a biological sample of a test individual suspected of having a tumor or who has been diagnosed with a tumor;
preferably, the tumor comprises one or more of breast cancer, colorectal cancer, lung cancer, liver cancer, pancreatic cancer, skin cancer.
11. A method of predicting organ transfer tendencies of tumor cells, comprising: a method of predicting the risk of metastasis of a tumor cell using the method of any one of claims 1-10;
wherein S of tumor cells to be detected invading different tissues or organs is obtained comm When the tumor cells to be tested are transferred to a certain tissue or organ S comm Higher, a higher propensity for tumor cells to metastasize to the tissue or organ is predicted.
12. A method of predicting a prognosis of a tumor patient, comprising: a method of predicting the risk of metastasis of a tumor cell using the method of any one of claims 1-10;
wherein S of tumor cells to be detected is obtained Meta When S is derived from a test tumor cell of a tumor patient Meta Higher, the tumor patient is predicted to have a worse prognosis.
13. A tumor cell detection system, comprising:
The acquisition module is configured to acquire gene expression information of tumor cells to be detected;
a calculation module configured to predict the test based on the gene expression informationS of tumor cells EMT 、S Trave And S is comm The method comprises the steps of carrying out a first treatment on the surface of the Wherein S is EMT Epithelial-mesenchymal transition (EMT) capability for representing tumor cells, S Trav Indicating the viability of tumor cells in blood circulation, S comm Indicating the ability of tumor cells to invade a tissue or organ; then the risk coefficient S of the metastasis of the tumor cells to be detected is calculated by using the following formula Meta
And an output module configured to output the signal when the tumor cell is detected Meta Higher, it is predicted to have a higher risk of metastasis.
14. The tumor cell detection system according to claim 13, wherein,
the calculation module comprises S EMT A calculation unit, the S EMT The calculating unit is configured to compare the gene expression information of the tumor cells to be detected with the EMT related gene set, calculate the similarity of the gene expression, and thereby obtain S of the tumor cells to be detected EMT
15. The tumor cell detection system according to claim 13 or 14, wherein,
the calculation module comprises S Trav A calculation unit, the S Trave The calculation unit is configured to predict interaction strength of the tumor cells to be detected and different immune cells in the blood based on the gene expression information of the tumor cells to be detected and the gene expression information of the cells in the blood, set weights based on the existence proportion of the immune cells in the blood, and obtain S of the tumor cells to be detected by weighted summation Travel
Wherein immune cells in blood include central granulocytes, platelets and natural killer cells; s of tumor cells to be tested Tra The interaction strength of the tumor cells and the centromere cells is positively correlated, and S of the tumor cells to be detected Trave And the interaction strength of tumor cells and plateletsPositively correlated, S of tumor cells to be tested Trave The interaction strength with tumor cells and natural killer cells is inversely related.
16. The tumor cell detection system according to any one of claims 13 to 15, wherein,
the calculation module comprises S comm A calculation unit, the S comm The calculation unit is configured to predict the interaction condition of the tumor cells and the main cells of the tissue microenvironment based on the gene expression information of the tumor cells to be detected and the gene expression information of the main cells of the tissue microenvironment, set weights based on the proportion of the main cells in the tissue microenvironment, and subtract the weighted sum of the interaction intensities between the tumor cells and the main cells in the tissue microenvironment to obtain S of the tumor cells to be detected comm
17. The tumor cell detection system according to claim 15 or 16, wherein,
The calculation module comprises a model generation unit; s, S Travel Computing unit and/or S comm A calculation unit;
the model generation unit is configured to generate a first predictive model based on the existing tumor metastasis data set, tumor non-metastasis data set and normal blood immune cell data set, and/or to generate a second predictive model based on the existing tumor metastasis data set, tumor non-metastasis data set and normal tissue cell data set;
the first prediction model combines the gene expression information of tumor cells with S Travel Associating;
the second prediction model combines the gene expression information of tumor cells with S comm Associating;
the S is Travel The calculation unit is configured to obtain key data from the gene expression information of the tumor cells to be detected by using the first prediction model and assign weights to calculate S of the tumor cells to be detected Tra
The S is comm The calculation unit is configured to obtain key data from the gene expression information of the tumor cells to be detected by using the second prediction model and assign weights to calculate S of the tumor cells to be detected comm
Preferably, when obtaining the gene expression information of tumor cells from the existing tumor metastasis data set and tumor non-metastasis data set, each individual selects only one sample, and only retains the gene expression information of the primary tumor cells, while removing the gene expression information of other cells; more preferably, the gene expression information of other cells is screened out by the marker gene and/or copy number variation.
18. The tumor cell detection system according to any one of claims 13 to 17, wherein,
the output module comprises a first output unit configured to provide the output signal S Meta A threshold associated with a risk of transfer; s of tumor cells to be tested Meta When compared with the threshold value, when S Meta Above the threshold value, the tumor cells to be tested are predicted to have a risk of metastasis.
19. The tumor cell detection system according to any one of claims 13 to 18, wherein,
the output module further comprises a second output unit configured to obtain S of invasion of the tumor cells to be tested into different tissues or organs comm When the tumor cells to be tested are transferred to a certain tissue or organ S comm Higher, a higher propensity for tumor cells to metastasize to the tissue or organ is predicted.
20. The tumor cell detection system according to any one of claims 13 to 19, wherein,
the output module further comprises a third output unit configured to obtain S of tumor cells to be tested Meta When S is derived from a test tumor cell of a tumor patient Meta Higher, the tumor is predictedPatients have a worse prognosis.
21. The tumor cell detection system according to any one of claims 13-20, further comprising:
The detection end is used for detecting gene expression information of tumor cells; preferably, the detection end comprises ribonucleic acid sequencing equipment or single cell sequencing equipment;
the detection end is connected with the acquisition module through communication.
22. A non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements: a method of predicting the risk of metastasis of a tumor cell according to any one of claims 1 to 10, or a method of predicting the propensity of a tumor cell to metastasize to an organ according to claim 11, or a method of predicting the prognosis of a tumor patient according to claim 12.
23. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing when the computer program is executed: a method of predicting the risk of metastasis of a tumor cell according to any one of claims 1 to 10, or a method of predicting the propensity of a tumor cell to metastasize to an organ according to claim 11, or a method of predicting the prognosis of a tumor patient according to claim 12.
24. A computer program product comprising a computer program that, when read and executed by a computing device, causes the computing device to perform: a method of predicting the risk of metastasis of a tumor cell according to any one of claims 1 to 10, or a method of predicting the propensity of a tumor cell to metastasize to an organ according to claim 11, or a method of predicting the prognosis of a tumor patient according to claim 12.
CN202311246385.5A 2023-09-25 2023-09-25 Method and system for predicting tumor cell metastasis risk Pending CN117437974A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311246385.5A CN117437974A (en) 2023-09-25 2023-09-25 Method and system for predicting tumor cell metastasis risk

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311246385.5A CN117437974A (en) 2023-09-25 2023-09-25 Method and system for predicting tumor cell metastasis risk

Publications (1)

Publication Number Publication Date
CN117437974A true CN117437974A (en) 2024-01-23

Family

ID=89547104

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311246385.5A Pending CN117437974A (en) 2023-09-25 2023-09-25 Method and system for predicting tumor cell metastasis risk

Country Status (1)

Country Link
CN (1) CN117437974A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108474723A (en) * 2015-12-02 2018-08-31 克莱尔莱特诊断有限责任公司 Prepare and analyze the method that neoplasmic tissue sample is used to detecting and monitoring cancer
US20200071773A1 (en) * 2017-04-12 2020-03-05 Massachusetts Eye And Ear Infirmary Tumor signature for metastasis, compositions of matter methods of use thereof
CN113930506A (en) * 2021-09-23 2022-01-14 江苏大学附属医院 Glutamine metabolism gene label scoring system for predicting hepatocellular carcinoma prognosis and treatment resistance
CN115798723A (en) * 2023-01-18 2023-03-14 北京泽桥医疗科技股份有限公司 Construction method of cancer recurrence risk prediction model
RU2802141C1 (en) * 2023-01-09 2023-08-22 Федеральное государственное бюджетное учреждение "Российский научный центр радиологии и хирургических технологий имени академика А.М. Гранова" Министерства здравоохранения Российской Федерации Method of predicting the development of metastases in patients with unresectable triple-negative breast cancer
CN116699137A (en) * 2023-06-26 2023-09-05 深圳市瑞格生物科技有限公司 Method for assessing risk of suffering from tumor or specific tumor

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108474723A (en) * 2015-12-02 2018-08-31 克莱尔莱特诊断有限责任公司 Prepare and analyze the method that neoplasmic tissue sample is used to detecting and monitoring cancer
US20200071773A1 (en) * 2017-04-12 2020-03-05 Massachusetts Eye And Ear Infirmary Tumor signature for metastasis, compositions of matter methods of use thereof
CN113930506A (en) * 2021-09-23 2022-01-14 江苏大学附属医院 Glutamine metabolism gene label scoring system for predicting hepatocellular carcinoma prognosis and treatment resistance
RU2802141C1 (en) * 2023-01-09 2023-08-22 Федеральное государственное бюджетное учреждение "Российский научный центр радиологии и хирургических технологий имени академика А.М. Гранова" Министерства здравоохранения Российской Федерации Method of predicting the development of metastases in patients with unresectable triple-negative breast cancer
CN115798723A (en) * 2023-01-18 2023-03-14 北京泽桥医疗科技股份有限公司 Construction method of cancer recurrence risk prediction model
CN116699137A (en) * 2023-06-26 2023-09-05 深圳市瑞格生物科技有限公司 Method for assessing risk of suffering from tumor or specific tumor

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
KAIGE YANG ET AL.: "M2 tumor‑associated macrophage mediates the maintenance of stemness to promote cisplatin resistance by secreting TGF‑β1 in esophageal squamous cell carcinoma", 《JOURNAL OF TRANSLATIONAL MEDICINE》, 14 January 2023 (2023-01-14) *
SNAHLATA SINGH ET AL.: "Consequences of EMT-Driven Changes in the Immune Microenvironment of Breast Cancer and Therapeutic Response of Cancer Cells", 《J. CLIN. MED.》, 9 May 2019 (2019-05-09) *
VANESSA BARRIGA ET AL.: "The Complex Interaction between the Tumor Micro-Environment and Immune Checkpoints in Breast Cancer", 《CANCERS》, 19 August 2019 (2019-08-19) *
XIAOCUI ZHENG ET AL.: "Single-cell analyses implicate ascites in remodeling the ecosystems of primary and metastatic tumors in ovarian cancer", 《NATURE CANCER》, 24 July 2023 (2023-07-24) *
刘书中: "伴发骨转移的乳腺癌ceRNA网络构建及乳腺癌骨转移灶的转录组学研究", 《中国博士学位论文全文数据库 医药卫生科技辑》, 15 February 2022 (2022-02-15) *
周海军;钦伦秀;: "肿瘤转移机制研究中几个值得关注的问题", 复旦学报(医学版), no. 05, 15 September 2011 (2011-09-15) *
薛珂: "基于机器学习的胰腺癌转移预测方法的研究和实现", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》, 15 June 2022 (2022-06-15) *

Similar Documents

Publication Publication Date Title
EP3013986B1 (en) Assessment of the pi3k cellular signaling pathway activity using mathematical modelling of target gene expression
US11776661B2 (en) Determination of MAPK-AP-1 pathway activity using unique combination of target genes
JP2019076096A (en) Assessment of cellular signaling pathway activity using linear combinations of target gene expressions
US20140040264A1 (en) Method for estimation of information flow in biological networks
US8911940B2 (en) Methods of assessing a risk of cancer progression
CN104560697A (en) Detection device for instability of genome copy number
CN113061655B (en) Gene labels for predicting breast cancer neoadjuvant chemotherapy sensitivity and application thereof
CN113444804B (en) Cervical cancer prognosis related gene and application thereof in preparation of cervical cancer prognosis prediction and diagnosis product
CN114093517A (en) Cancer screening method and system based on blood indexes and cfDNA
CN108588230A (en) A kind of marker and its screening technique for breast cancer diagnosis
CN104694384B (en) Mitochondrial DNA copy index variability detecting device
KR101914348B1 (en) Method of detecting a risk of cancer
CN113345592B (en) Construction and diagnosis equipment for acute myeloid leukemia prognosis risk model
CN117437974A (en) Method and system for predicting tumor cell metastasis risk
US20220042106A1 (en) Systems and methods of using cell-free nucleic acids to tailor cancer treatment
Yu et al. Association between SNAP25 and human glioblastoma multiform: a comprehensive bioinformatic analysis
Shin et al. TC-VGC: a tumor classification system using variations in genes’ correlation
WO2020260226A1 (en) Identification of the cellular function of an active nfkb pathway
US11614434B2 (en) Genetic information analysis platform oncobox
CN111172285A (en) miRNA group for early diagnosis and/or prognosis monitoring of pancreatic cancer and application thereof
CN113969315A (en) Marker for assessing responsiveness of colorectal cancer patient to immunotherapeutic drug
Deaglio et al. A new taxonomy for splenic marginal zone lymphoma
Zhang et al. Sequencing and validation of exosomal miRNAs panel as novel plasma biomarkers for early diagnosis and prognosis prediction in laryngeal cancer
Gim et al. Evaluation of the severity of nonalcoholic fatty liver disease by the analysis of serum exosomal miRNA expression
Malapelle et al. Liquid biopsy in lung cancer: tertiary prevention potential

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination