WO2023245827A1 - Method for identifying tissue sources of mesenchymal stem cells in sample and use thereof - Google Patents

Method for identifying tissue sources of mesenchymal stem cells in sample and use thereof Download PDF

Info

Publication number
WO2023245827A1
WO2023245827A1 PCT/CN2022/110507 CN2022110507W WO2023245827A1 WO 2023245827 A1 WO2023245827 A1 WO 2023245827A1 CN 2022110507 W CN2022110507 W CN 2022110507W WO 2023245827 A1 WO2023245827 A1 WO 2023245827A1
Authority
WO
WIPO (PCT)
Prior art keywords
mscs
sample
feature vector
machine learning
hmscs
Prior art date
Application number
PCT/CN2022/110507
Other languages
French (fr)
Chinese (zh)
Inventor
张可华
孟淑芳
纳涛
贾春翠
韩晓燕
吴婷婷
张丽霞
吴雪伶
Original Assignee
中国食品药品检定研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国食品药品检定研究院 filed Critical 中国食品药品检定研究院
Publication of WO2023245827A1 publication Critical patent/WO2023245827A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Definitions

  • the present application relates to the fields of biology and medicine, specifically to a method of constructing a model for identifying the tissue origin of mesenchymal stem cells (MSCs), a method and device for identifying the tissue origin of MSCs in a sample, and a method for determining the tissue origin of MSCs.
  • MSCs mesenchymal stem cells
  • a method and device for identifying the tissue origin of MSCs in a sample and a method for determining the tissue origin of MSCs.
  • hMSCs Human mesenchymal stem cells
  • GvHD graft-versus-host disease
  • hMSCs were initially isolated and identified in bone marrow. Subsequent studies have shown that hMSCs are widely present in various tissues of the human body, such as adipose tissue, dental tissues such as dental pulp and dental follicles, hair follicles, and perinatal tissues such as fetal umbilical cord and placenta. Many studies have observed that hMSCs derived from different tissues have great differences in cell properties in addition to their different origins.
  • hMSCs derived from bone marrow have strong osteogenic differentiation ability and weak proliferation ability
  • hMSCs derived from fat have strong adipogenic differentiation ability, proliferation ability, and stronger IDO1 activity
  • hMSCs derived from perinatal period have the strongest The proliferation ability, osteogenic and adipogenic differentiation abilities are weak (Front. Med., 20 September 2021
  • omics studies have shown that hMSCs from different tissue sources have unique transcriptome expression profiles (Biotechnol Lett.2020 Jul;42(7):1287-1304.doi:10.1007/s10529-020- 02898-x.Epub 2020 May 5.).
  • tissue source-specific identification method for hMSCs.
  • tissue origin of hMSCs cell preparations or intermediate cell banks for clinical use can currently only be traced through collection and preparation records. There is no effective method for identification during the quality control process. Once confusion or cross-contamination occurs, it will not be correctly identified. .
  • regulatory agencies/laboratories responsible for quality review of stem cell products also need to use test data to identify and review the tissue source of hMSCs submitted by the production unit for inspection.
  • some researchers are trying to induce differentiation of pluripotent stem cells into cell products with similar or identical properties to hMSCs derived from various tissues for treatment research of specific indications. In this case, it becomes difficult to identify the tissue source of hMSCs. Particularly important.
  • CD146 is expressed in both bone marrow MSCs and umbilical cord hMSCs.
  • CD271 is expressed in bone marrow and adipose-derived hMSCs. (Stem Cells, Volume 32, Issue 6, June 2014, Pages 1408–1419,).
  • some individual studies have tried to explore the characteristic protein expression profile or secretion profile of tissue sources, but the research is often limited by the materials used. It is limited to comparing hMSCs derived from one or two tissues and cannot cover the various hMSCs commonly used in clinical research.
  • This application conducted transcriptome sequencing of hMSCs and used machine learning methods to screen out a combination of biomarkers that can identify the tissue source of hMSCs.
  • the machine learning method was used to train and classify the expression levels of biomarker genes of 137 hMSCs and their tissue sources. Verification, thereby constructing a machine learning model for identifying the tissue source of hMSCs based on a combination of biomarkers. This model can accurately identify the tissue origin of hMSCs from different tissues commonly used in clinical research.
  • the present application provides a method of constructing a model for identifying the tissue origin of mesenchymal stem cells (MSCs), which includes:
  • Step (1) Provide n strains of MSCs derived from different tissues, and collect transcriptome sequencing information of the MSCs, where n is an integer greater than or equal to 10;
  • Step (2) Obtain the mRNA information from the transcriptome sequencing information
  • Step (3) Obtain genes with TPMmax greater than 10 from the mRNA information
  • Step (4) Use the expression level of the gene obtained in step (3) as a feature vector, filter the feature vector through a machine learning method, and obtain the target feature vector;
  • Step (5) Use the expression amount of the target feature vector to train the machine learning model to build a model for identifying the tissue source of mesenchymal stem cells (MSCs).
  • MSCs mesenchymal stem cells
  • the expression level of the gene is the TPM value of the gene.
  • the expression amount of the target feature vector is the TPM value of the target feature vector.
  • step (5) 55% to 95% of samples are randomly extracted from the n strains of MSCs derived from different tissues as a training set, and the target feature vector of the training set is used to train the machine learning model Perform training to build a model for identifying the tissue source of mesenchymal stem cells (MSCs).
  • MSCs mesenchymal stem cells
  • the method further includes step (6): using the MSCs extracted outside the training set as a test set, and testing the machine learning model using the target feature vector of the test set to determine the accuracy of the model. degree, sensitivity and specificity.
  • the machine learning model is selected from Lasso regression, ridge regression, support vector machine, or linear discriminant.
  • the machine learning model is Lasso regression.
  • step (5) 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90 are randomly extracted from the n strains of MSCs derived from different tissues. % or 95% of the samples serve as the training set.
  • the target feature vector includes the following genes or includes transcripts of the following genes: ACVRL1, ARMC9, BCHE, CD55, EBP, FN1, FST, HOTAIRM1, LIMK2, MECOM, METTL26, MSX1, NBPF3 , NECTIN3, NRXN2, PDE5A, RIN3, RPA2, RSL24D1, TSSC2 and ZIC1.
  • the target feature vector is selected from the following genes or a transcript selected from the following genes: ACVRL1, ARMC9, BCHE, CD55, EBP, FN1, FST, HOTAIRM1, LIMK2, MECOM, METTL26, MSX1 , NBPF3, NECTIN3, NRXN2, PDE5A, RIN3, RPA2, RSL24D1, TSSC2, ZIC1, or any combination thereof.
  • n is an integer between 10 and 50, an integer between 51 and 100, an integer between 101 and 150, an integer between 151 and 200, and an integer between 201 and 250. , an integer between 251 and 300, an integer between 301 and 500, or an integer between 501 and 1000.
  • the source of the n strains of MSCs derived from different tissues is selected from bone marrow, umbilical cord, placenta or parts thereof (e.g., placental amniotic membrane), fat, dental pulp, hair follicles, skin, blood, or any of them combination.
  • the MSCs are MSCs derived from a mammal (eg, mouse, human).
  • the MSCs are human-derived MSCs (hMSCs).
  • the gene ACVRL1 has an Entrez Gene ID of 94.
  • the Entrez Gene ID of ARMC9 is 80210.
  • the gene BCHE has an Entrez Gene ID of 590.
  • the gene CD55 has an Entrez Gene ID of 1604.
  • the Entrez Gene ID of the gene EBP is 10682.
  • the gene FN1 has an Entrez Gene ID of 2335.
  • the Entrez Gene ID of the gene FST is 10468.
  • the Entrez Gene ID of the gene HOTAIRM1 is 100506311.
  • the gene LIMK2 has an Entrez Gene ID of 3985.
  • the Entrez Gene ID of MECOM is 2122.
  • the Entrez Gene ID of the gene METTL26 is 84326.
  • the gene MSX1 has an Entrez Gene ID of 4487.
  • the Entrez Gene ID of the gene NBPF3 is 84224.
  • the Entrez Gene ID of NECTIN3 is 25945.
  • the Entrez Gene ID of the gene NRXN2 is 9379.
  • the Entrez Gene ID of the gene PDE5A is 8654.
  • the Entrez Gene ID of the gene RIN3 is 79890.
  • the gene RPA2 has an Entrez Gene ID of 6118.
  • the Entrez Gene ID of the gene RSL24D1 is 51187.
  • the Entrez Gene ID of TSSC2 is 650368.
  • the gene ZIC1 has an Entrez Gene ID of 7545.
  • the Ensembl Gene ID of the gene ACVRL1 is ENSG00000139567.
  • the Ensembl Gene ID of the gene ARMC9 is ENSG00000135931.
  • the Ensembl Gene ID of the gene BCHE is ENSG00000114200.
  • the Ensembl Gene ID of the gene CD55 is ENSG00000196352.
  • the Ensembl Gene ID of the gene EBP is ENSG00000147155.
  • the Ensembl Gene ID of gene FN1 is ENSG00000115414.
  • the Ensembl Gene ID of the gene FST is ENSG00000134363.
  • the Ensembl Gene ID of the gene HOTAIRM1 is ENSG00000233429.
  • the Ensembl Gene ID of the gene LIMK2 is ENSG00000182541.
  • the Ensembl Gene ID of the gene MECOM is ENSG00000085276.
  • the Ensembl Gene ID of the gene METTL26 is ENSG00000130731.
  • the Ensembl Gene ID of the gene MSX1 is ENSG00000163132.
  • the Ensembl Gene ID of the gene NBPF3 is ENSG00000142794.
  • the Ensembl Gene ID of the gene NECTIN3 is ENSG00000177707.
  • the Ensembl Gene ID of the gene NRXN2 is ENSG00000110076.
  • the Ensembl Gene ID of the gene PDE5A is ENSG00000138735.
  • the Ensembl Gene ID of the gene RIN3 is ENSG00000100599.
  • the Ensembl Gene ID of the gene RPA2 is ENSG00000117748.
  • the Ensembl Gene ID of the gene RSL24D1 is ENSG00000137876.
  • the Ensembl Gene ID of the gene TSSC2 is ENSG00000223756.
  • the Ensembl Gene ID of the gene ZIC1 is ENSG00000152977.
  • the present application provides a machine learning model, which is constructed by the method as described above.
  • the machine learning model is used to identify the tissue of origin of one or more MSCs in a sample (e.g., bone marrow, umbilical cord, placenta or portion thereof (e.g., placental amniotic membrane), fat, dental pulp, hair follicles , skin, blood, or any combination thereof).
  • a sample e.g., bone marrow, umbilical cord, placenta or portion thereof (e.g., placental amniotic membrane), fat, dental pulp, hair follicles , skin, blood, or any combination thereof.
  • the present application provides the use of a machine learning model as previously described to identify the tissue origin of one or more MSCs in a sample.
  • the present application provides a method for identifying the tissue origin of MSCs in a sample, including:
  • step (a) the expression level is a TPM value.
  • the TPM value is obtained by transcriptome sequencing.
  • the target feature vector includes the following genes or includes proteins expressed by the following genes: ACVRL1, ARMC9, BCHE, CD55, EBP, FN1, FST, HOTAIRM1, LIMK2, MECOM, METTL26, MSX1 , NBPF3, NECTIN3, NRXN2, PDE5A, RIN3, RPA2, RSL24D1, TSSC2 and ZIC1.
  • the sample contains one or more MSCs.
  • transcriptome sequencing is performed on the MSCs in the sample to obtain the expression level of the target feature vector of the MSCs in the sample; or, in the above step (a) ), the expression level of the target feature vector of MSCs in the sample is obtained by performing expression profile chip detection, single cell transcriptome sequencing, RT-qPCR measurement, and digital PCR measurement on the target feature vector of MSCs in the sample.
  • the tissue source of the MSCs is selected from bone marrow, umbilical cord, placenta or parts thereof (e.g., placental amniotic membrane), fat, dental pulp, hair follicles, tissue from other parts of the placenta, skin, blood, or its random combination.
  • the MSCs are MSCs derived from a mammal (eg, mouse, human).
  • the MSCs are human-derived MSCs (hMSCs).
  • the sample contains a proportion of adipose hMSCs of greater than or equal to 30%.
  • the sample contains bone marrow hMSCs in a proportion of greater than or equal to 40%.
  • the sample contains dental pulp hMSCs in a proportion of greater than or equal to 40%.
  • the sample contains hair follicle hMSCs in a proportion of greater than or equal to 30%.
  • the sample contains umbilical cord hMSCs in a proportion of greater than or equal to 20%.
  • the sample contains placenta-amniotic hMSCs in a proportion of greater than or equal to 40%.
  • the present application provides a device for identifying the tissue source of mesenchymal stem cells, including:
  • memory configured to store instructions
  • a processor is coupled to the memory, and the processor is configured to execute the method as described above based on instructions stored in the memory.
  • the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, and when the instructions are executed by a processor, the method as described above is implemented.
  • the present application provides a kit for identifying the tissue origin of one or more MSCs in a sample, the kit comprising a reagent for determining the level of a biomarker in the sample, the biomarker
  • the compounds include ACVRL1, ARMC9, BCHE, CD55, EBP, FN1, FST, HOTAIRM1, LIMK2, MECOM, METTL26, MSX1, NBPF3, NECTIN3, NRXN2, PDE5A, RIN3, RPA2, RSL24D1, TSSC2 and ZIC1.
  • the level of the biomarker is the protein or mRNA level of the biomarker.
  • the MSCs are MSCs derived from a mammal (eg, mouse, human).
  • the MSCs are human-derived MSCs (hMSCs).
  • the present application provides the use of a reagent for determining the level of a biomarker in a sample in the preparation of a kit for identifying the tissue origin of one or more MSCs in the sample; wherein, the The above biomarkers include ACVRL1, ARMC9, BCHE, CD55, EBP, FN1, FST, HOTAIRM1, LIMK2, MECOM, METTL26, MSX1, NBPF3, NECTIN3, NRXN2, PDE5A, RIN3, RPA2, RSL24D1, TSSC2 and ZIC1.
  • the level of the biomarker is the protein or mRNA level of the biomarker.
  • the MSCs are MSCs derived from a mammal (eg, mouse, human).
  • the MSCs are human-derived MSCs (hMSCs).
  • sample refers to a biological sample obtained from a subject, which sample may be a sample that contains or is presumed to contain human mesenchymal stem cells.
  • machine learning model or “machine learning method” or “statistical learning method” have the same meaning and may be used interchangeably. It refers to a set of parameters and functions that can establish a corresponding training model through the measured features (target feature vectors) in the training sample.
  • the training model can learn from training samples during training with optimized parameters to provide the best quality measure (eg, accuracy) for classifying new samples.
  • the parameters and functions may be a collection of linear algebraic operations, nonlinear algebraic operations, and tensor algebraic operations.
  • the parameters and functions may include statistical functions, tests, and probability models.
  • the measured characteristic in the training sample is the expression of a gene.
  • sensitivity refers to the proportion of actual positives that are themselves correctly identified.
  • transcriptome sequencing refers to the rapid and comprehensive acquisition of almost all specific cells or tissues of a species in a certain state through a sequencing platform (e.g., a second-generation sequencing platform). All transcripts and gene sequences. It can be used to study gene expression, gene function, structure, alternative splicing, and prediction of new transcripts. Usually, in the analysis of transcriptome sequencing, there are three classic values, namely count, FPKM and TPM values.
  • count refers to the total number of reads in the sequencing data that are mapped to a certain gene, that is, the measured reads are mapped to the reference genome, and then The software calculates the total number of reads that map to the gene.
  • FPKM fragment per kilobase million
  • TPM transcripts per million
  • TPMmax refers to the maximum TPM value of a gene in a set of samples.
  • This application provides a model for identifying the tissue origin of mesenchymal stem cells (MSCs) and a method for constructing the model, which can accurately identify the tissue origin of MSCs from different tissues commonly used in clinical research.
  • the model has been verified multiple times with training sets, test sets and external data sets, and the accuracy, sensitivity and specificity can all reach 95% and above (even as high as 100%).
  • the model established in this application can also identify the tissue sources of various mixed mesenchymal stem cells in the sample, and the accuracy, sensitivity and specificity can also reach 100%, which has high clinical application value.
  • Figure 1 shows the detection results of the accuracy, sensitivity and specificity of the machine learning model on the training set in Example 2.
  • Figure 2 shows the detection results of the accuracy, sensitivity and specificity of the machine learning model on the test set in Example 2.
  • Figure 3 shows the detection results of the accuracy, sensitivity and specificity of the machine learning model on the external data set in Example 2.
  • Figure 4 shows the prediction ability of the machine learning model for mixed cells in Example 3.
  • Figure 4A shows the detection results of two different sources of hMSCs in simulated mixed sample 1
  • Figure 4B shows the detection results of two different types of hMSCs in simulated mixed sample 2.
  • Figure 4C shows the detection results of hMSCs from two different sources in simulated mixed sample 3.
  • Figure 5 shows the detection results of hMSCs from three different sources in simulated mixed sample 4 by the machine learning model in Example 3.
  • each sample obtained approximately 6G of cleanbase.
  • the analysis process is shown in Table 4 below.
  • the gene transcription expression levels of 137 hMSCs cells were obtained, including the count value of transcripts, FPKM and TPM.
  • the mRNA was filtered out from the transcripts based on Official Symbol recognition, and a total of 38,735 genes were obtained.
  • the R package tidyverse was used to filter out the highly abundant expressed genes (TPMmax>10), and a total of 13,315 genes were obtained.
  • the 137 hMSCs transcriptome data were divided into a training set (70%) and a test set (30%).
  • 13315 genes were used as feature vectors, and lasso regression (10-fold cross-validation) was used to screen the feature vectors.
  • the expression levels (ie, TPM values) of 21 genes in the training set (70% of the total sample size) were input into the machine learning model established above, and the accuracy, sensitivity, and specificity of the model's prediction performance were tested.
  • the accuracy test results are shown in Table 6 and Figure 1.
  • the results showed that the established machine learning model achieved 100% prediction accuracy for the tissue origin of hMSCs in the training set. Not only that, the results showed that the sensitivity and specificity of the machine learning model were also 100%.
  • the results showed that the established machine learning model achieved 100% prediction accuracy for the tissue origin of hMSCs in the test set. Not only that, the results showed that the sensitivity and specificity of the machine learning model were also 100%.
  • Example 1 the external data set described in Example 1 (a total of 99 hMSCs) was subjected to transcriptome sequencing as described above, and the expression levels of the 21 genes obtained were input into the machine learning model established above. The model's prediction performance accuracy, sensitivity, and specificity were tested. The accuracy test results are shown in Table 8 and Figure 3.
  • hMSCs from several different tissue sources may be mixed in the detection sample, so this embodiment simulates the situation in which hMSCs from several different tissue sources are mixed.
  • the first set of simulation data includes 11 samples in which adipose-derived hMSCs and bone marrow-derived hMSCs were mixed in different proportions, as shown in Table 9:
  • the second set of simulation data includes 11 samples of dental pulp-derived hMSCs and hair follicle-derived hMSCs mixed in different proportions, as shown in Table 10:
  • the third set of simulation data includes 11 samples of umbilical cord-derived hMSCs and placental amnion-derived hMSCs mixed in different proportions, as shown in Table 11:
  • hMSCs from three different tissue sources were mixed in different proportions.
  • the specific mixed samples are as follows: The mixed samples contain hMSCs derived from fat, bone marrow and hair follicles. After mixing, 11 mixed samples of the fourth group are obtained, as shown in Table 12:
  • FIG. 5 shows the detection results of hMSCs from three different sources in the fourth group of simulated mixed samples. The results showed that the model established in this application accurately predicted the different tissue origins of multiple hMSCs in mixed samples. Therefore, the model established in this application can be used for the detection of mixed samples of hMSCs (containing one or more hMSCs from different sources).
  • Example 2 in order to compare the impact of different machine learning models on the accuracy of the established model for identifying the tissue source of mesenchymal stem cells (MSCs), 5 different machine learning models/methods were selected, according to the method described in Example 2 Establish the above-mentioned model for identifying the tissue source of MSCs (the only difference between the method used in this embodiment and Example 2 is the use of different machine learning models/methods), and verify the accuracy of the established model for identifying the tissue source of MSCs difference.

Abstract

Provided are a method for constructing a model for identifying tissue sources of mesenchymal stem cells (MSCs), a method and device for identifying tissue sources of MSCs in a sample, and the use of a reagent, which is useful for determining a biomarker level in a sample, in the preparation of a kit.

Description

一种鉴定样本中间充质干细胞的组织来源的方法及其用途A method for identifying the tissue origin of mesenchymal stem cells in a sample and its use 技术领域Technical field
本申请涉及生物学与医学领域,具体涉及一种构建鉴定间充质干细胞(MSCs)组织来源的模型的方法,还涉及一种鉴定样本中MSCs的组织来源的方法和装置,还涉及用于确定样品中生物标志物水平的试剂在制备试剂盒中的用途。The present application relates to the fields of biology and medicine, specifically to a method of constructing a model for identifying the tissue origin of mesenchymal stem cells (MSCs), a method and device for identifying the tissue origin of MSCs in a sample, and a method for determining the tissue origin of MSCs. Use of reagents for biomarker levels in samples in preparing kits.
背景技术Background technique
人间充质干细胞(human mesenchymal stem cells,hMSCs)是一类具有多能性的成体干细胞,具有分化成中胚层谱系细胞的潜能,并且具有较强的调节免疫、抗凋亡、抗纤维化、促进组织修复再生的作用。由于hMSCs存在于体内多种组织中、易于分离和体外培养,hMSCs具有了较高的临床应用价值。国内目前以研究MSC安全性和有效性为目的开展的临床研究有100余项,涉及到的适应症有骨关节炎、移植物抗宿主病(GvHD)、糖尿病、卵巢早衰等。Human mesenchymal stem cells (hMSCs) are a type of pluripotent adult stem cells with the potential to differentiate into mesodermal lineage cells, and have strong immune regulation, anti-apoptosis, anti-fibrosis, and promotion of Tissue repair and regeneration. Because hMSCs exist in a variety of tissues in the body and are easy to isolate and culture in vitro, hMSCs have high clinical application value. There are currently more than 100 clinical studies in China aimed at studying the safety and effectiveness of MSCs, involving indications such as osteoarthritis, graft-versus-host disease (GvHD), diabetes, premature ovarian failure, etc.
hMSCs最初在骨髓中被分离鉴定,后续大量研究表明hMSCs广泛存在于人体各种组织中,例如脂肪组织、牙髓和牙囊等牙组织、毛囊、以及胎儿脐带和胎盘等围产期组织。许多研究已经观察到不同组织来源的hMSCs除了起源不同之外,细胞特性存在很大的差异。例如骨髓来源hMSCs具有较强的成骨分化能力和较弱的增殖能力,脂肪来源hMSCs则具有较强的成脂分化能力和增殖能力,更强的IDO1活性;围产期来源hMSCs具有最强的增殖能力,成骨和成脂分化能力却较弱(Front.Med.,20 September 2021|https://doi.org/10.3389/fmed.2021.728496)。除了已经报道的特性差异之外,组学研究表明不同组织来源hMSCs之间具有独特的转录组表达谱(Biotechnol Lett.2020 Jul;42(7):1287-1304.doi:10.1007/s10529-020-02898-x.Epub 2020 May 5.)。随着人们对不同组织来源hMSCs的生物学特性的认识逐渐加深,以及针对各种临床适应症的临床研究数据积累,有目的的选择合适的组织来源的MSC进行相应的疾病的治疗探索更加合理而有效。hMSCs were initially isolated and identified in bone marrow. Subsequent studies have shown that hMSCs are widely present in various tissues of the human body, such as adipose tissue, dental tissues such as dental pulp and dental follicles, hair follicles, and perinatal tissues such as fetal umbilical cord and placenta. Many studies have observed that hMSCs derived from different tissues have great differences in cell properties in addition to their different origins. For example, hMSCs derived from bone marrow have strong osteogenic differentiation ability and weak proliferation ability, while hMSCs derived from fat have strong adipogenic differentiation ability, proliferation ability, and stronger IDO1 activity; hMSCs derived from perinatal period have the strongest The proliferation ability, osteogenic and adipogenic differentiation abilities are weak (Front. Med., 20 September 2021 | https://doi.org/10.3389/fmed.2021.728496). In addition to the reported property differences, omics studies have shown that hMSCs from different tissue sources have unique transcriptome expression profiles (Biotechnol Lett.2020 Jul;42(7):1287-1304.doi:10.1007/s10529-020- 02898-x.Epub 2020 May 5.). With the gradual deepening of people's understanding of the biological properties of hMSCs derived from different tissues, and the accumulation of clinical research data for various clinical indications, it is more reasonable and more reasonable to purposefully select MSCs derived from appropriate tissues for the treatment of corresponding diseases. efficient.
建立hMSCs组织来源特异性鉴别方法具有重要的意义。第一,临床用hMSCs细胞制剂或中间体细胞库的组织来源目前只能通过采集和制备记录进行追踪,在质量控制过程中尚无有效的方法进行鉴别,一旦发生混淆或交叉污染将无法正确识别。第二,承担干细胞产品质量复核的监管机构/实验室也需要通过检测数据来鉴别和复核生产单位送检的hMSCs的组织来源。第三,一些研究者试图通过多能干细胞诱导分化成与各类组织来源的 hMSCs性质相近或相同的细胞产品,用于特定适应症的治疗研究,这种情形下,鉴别hMSCs的组织来源变得尤为重要。It is of great significance to establish a tissue source-specific identification method for hMSCs. First, the tissue origin of hMSCs cell preparations or intermediate cell banks for clinical use can currently only be traced through collection and preparation records. There is no effective method for identification during the quality control process. Once confusion or cross-contamination occurs, it will not be correctly identified. . Second, regulatory agencies/laboratories responsible for quality review of stem cell products also need to use test data to identify and review the tissue source of hMSCs submitted by the production unit for inspection. Third, some researchers are trying to induce differentiation of pluripotent stem cells into cell products with similar or identical properties to hMSCs derived from various tissues for treatment research of specific indications. In this case, it becomes difficult to identify the tissue source of hMSCs. Particularly important.
然而,hMSCs组织来源特异性鉴别国际上尚无方法可用。尽管国际细胞治疗学会(ISCT)在2006年提出了hMSCs的最低标准,但该定义是非特异性的,未能解决来自不同组织hMSCs以及hMSCs和成纤维细胞之间的差异。后续一些研究试图通过表面标志物分子、转录组学表达谱、分泌组特征等研究不同组织来源hMSCs的特征性的鉴别方法,但是目前尚没有建立有效的方法,一方面由于许多标志物分子本身不具有非常好的特异性,例如,CD29通常认为是脂肪干细胞表面标志物,但在胎盘来源的hMSCs中同样高表达,CD146在骨髓MSC和脐带hMSCs中均有表达、CD271则在骨髓和脂肪来源hMSCs中均表达(Stem Cells,Volume 32,Issue 6,June 2014,Pages 1408–1419,),其次,一些个别的研究试图探索组织来源特征性的蛋白表达谱或分泌谱,但研究往往受取材限制,仅局限于比较一两种组织来源的hMSCs,不能涵盖临床研究中常见的各种hMSCs。However, there is no internationally available method for the tissue source-specific identification of hMSCs. Although the International Society for Cell Therapy (ISCT) proposed minimum standards for hMSCs in 2006, this definition is non-specific and fails to address the differences between hMSCs from different tissues and between hMSCs and fibroblasts. Some subsequent studies have tried to study the characteristic identification methods of hMSCs derived from different tissues through surface marker molecules, transcriptomic expression profiles, secretome characteristics, etc. However, no effective method has been established yet. On the one hand, many marker molecules themselves are not It has very good specificity. For example, CD29 is usually considered a surface marker of adipose stem cells, but it is also highly expressed in placenta-derived hMSCs. CD146 is expressed in both bone marrow MSCs and umbilical cord hMSCs. CD271 is expressed in bone marrow and adipose-derived hMSCs. (Stem Cells, Volume 32, Issue 6, June 2014, Pages 1408–1419,). Secondly, some individual studies have tried to explore the characteristic protein expression profile or secretion profile of tissue sources, but the research is often limited by the materials used. It is limited to comparing hMSCs derived from one or two tissues and cannot cover the various hMSCs commonly used in clinical research.
因此,需要提供一种方法,以便能够对临床研究中常见的不同组织来源的hMSCs的组织来源进行准确地鉴别。Therefore, there is a need to provide a method that can accurately identify the tissue origin of hMSCs from different tissue sources commonly used in clinical research.
发明内容Contents of the invention
本申请通过对hMSCs进行转录组测序,利用机器学习方法筛选到了能够鉴定hMSCs组织来源的生物标志物组合,通过机器学习方法对137株hMSCs的生物标志物基因的表达量与其组织来源分类进行训练和验证,从而构建了一个基于生物标志物组合鉴定hMSCs组织来源的机器学习模型。该模型可以对临床研究中常见的不同组织来源的hMSCs的组织来源进行准确地鉴别。This application conducted transcriptome sequencing of hMSCs and used machine learning methods to screen out a combination of biomarkers that can identify the tissue source of hMSCs. The machine learning method was used to train and classify the expression levels of biomarker genes of 137 hMSCs and their tissue sources. Verification, thereby constructing a machine learning model for identifying the tissue source of hMSCs based on a combination of biomarkers. This model can accurately identify the tissue origin of hMSCs from different tissues commonly used in clinical research.
因此,在第一方面,本申请提供了一种构建鉴定间充质干细胞(MSCs)组织来源的模型的方法,其包括:Therefore, in a first aspect, the present application provides a method of constructing a model for identifying the tissue origin of mesenchymal stem cells (MSCs), which includes:
步骤(1):提供n株来源于不同组织的MSCs,收集所述MSCs转录组测序的信息,其中,所述n为大于等于10的整数;Step (1): Provide n strains of MSCs derived from different tissues, and collect transcriptome sequencing information of the MSCs, where n is an integer greater than or equal to 10;
步骤(2):从所述转录组测序的信息中获得mRNA的信息;Step (2): Obtain the mRNA information from the transcriptome sequencing information;
步骤(3):从所述mRNA的信息中获得TPMmax大于10的基因;Step (3): Obtain genes with TPMmax greater than 10 from the mRNA information;
步骤(4):将步骤(3)获得的基因的表达量做为特征向量,通过机器学习方法对所述特征向量进行筛选,并获得目标特征向量;Step (4): Use the expression level of the gene obtained in step (3) as a feature vector, filter the feature vector through a machine learning method, and obtain the target feature vector;
步骤(5):利用所述目标特征向量的表达量对机器学习模型进行训练,以构建鉴 定间充质干细胞(MSCs)组织来源的模型。Step (5): Use the expression amount of the target feature vector to train the machine learning model to build a model for identifying the tissue source of mesenchymal stem cells (MSCs).
在某些实施方案中,在步骤(4)中,所述基因的表达量为基因的TPM值。In certain embodiments, in step (4), the expression level of the gene is the TPM value of the gene.
在某些实施方案中,在步骤(5)中,所述目标特征向量的表达量为目标特征向量的TPM值。In some embodiments, in step (5), the expression amount of the target feature vector is the TPM value of the target feature vector.
在某些实施方案中,在步骤(5)中,从所述n株来源于不同组织的MSCs中随机提取55%至95%的样本作为训练集,利用训练集的目标特征向量对机器学习模型进行训练,以构建鉴定间充质干细胞(MSCs)组织来源的模型。In some embodiments, in step (5), 55% to 95% of samples are randomly extracted from the n strains of MSCs derived from different tissues as a training set, and the target feature vector of the training set is used to train the machine learning model Perform training to build a model for identifying the tissue source of mesenchymal stem cells (MSCs).
在某些实施方案中,所述方法还包括步骤(6):将提取至训练集以外的MSCs作为测试集,利用测试集的目标特征向量对机器学习模型进行测试,以确定所述模型的准确度、灵敏度和特异性。In some embodiments, the method further includes step (6): using the MSCs extracted outside the training set as a test set, and testing the machine learning model using the target feature vector of the test set to determine the accuracy of the model. degree, sensitivity and specificity.
在某些实施方案中,所述机器学习模型选自Lasso回归,岭回归,支持向量机或线性判别。In certain embodiments, the machine learning model is selected from Lasso regression, ridge regression, support vector machine, or linear discriminant.
在某些实施方案中,所述机器学习模型为Lasso回归。In certain embodiments, the machine learning model is Lasso regression.
在某些实施方案中,在步骤(5)中,从所述n株来源于不同组织的MSCs中随机提取55%,60%,65%,70%,75%,80%,85%,90%或95%的样本作为训练集。In some embodiments, in step (5), 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90 are randomly extracted from the n strains of MSCs derived from different tissues. % or 95% of the samples serve as the training set.
在某些实施方案中,所述目标特征向量包含下述基因或者包含下述基因的转录产物:ACVRL1,ARMC9,BCHE,CD55,EBP,FN1,FST,HOTAIRM1,LIMK2,MECOM,METTL26,MSX1,NBPF3,NECTIN3,NRXN2,PDE5A,RIN3,RPA2,RSL24D1,TSSC2和ZIC1。In certain embodiments, the target feature vector includes the following genes or includes transcripts of the following genes: ACVRL1, ARMC9, BCHE, CD55, EBP, FN1, FST, HOTAIRM1, LIMK2, MECOM, METTL26, MSX1, NBPF3 , NECTIN3, NRXN2, PDE5A, RIN3, RPA2, RSL24D1, TSSC2 and ZIC1.
在某些实施方案中,所述目标特征向量选自下述基因或者选自下述基因的转录产物:ACVRL1,ARMC9,BCHE,CD55,EBP,FN1,FST,HOTAIRM1,LIMK2,MECOM,METTL26,MSX1,NBPF3,NECTIN3,NRXN2,PDE5A,RIN3,RPA2,RSL24D1,TSSC2,ZIC1,或其任意组合。In certain embodiments, the target feature vector is selected from the following genes or a transcript selected from the following genes: ACVRL1, ARMC9, BCHE, CD55, EBP, FN1, FST, HOTAIRM1, LIMK2, MECOM, METTL26, MSX1 , NBPF3, NECTIN3, NRXN2, PDE5A, RIN3, RPA2, RSL24D1, TSSC2, ZIC1, or any combination thereof.
在某些实施方案中,所述n为10至50之间的整数,51至100之间的整数,101至150之间的整数,151至200之间的整数,201至250之间的整数,251至300之间的整数,301至500之间的整数,或501至1000之间的整数。In certain embodiments, n is an integer between 10 and 50, an integer between 51 and 100, an integer between 101 and 150, an integer between 151 and 200, and an integer between 201 and 250. , an integer between 251 and 300, an integer between 301 and 500, or an integer between 501 and 1000.
在某些实施方案中,所述n株来源于不同组织的MSCs的来源选自骨髓,脐带,胎 盘或其部分(例如,胎盘羊膜),脂肪,牙髓,毛囊,皮肤,血液,或其任意组合。In certain embodiments, the source of the n strains of MSCs derived from different tissues is selected from bone marrow, umbilical cord, placenta or parts thereof (e.g., placental amniotic membrane), fat, dental pulp, hair follicles, skin, blood, or any of them combination.
在某些实施方案中,所述MSCs是来源于哺乳动物(例如,小鼠,人)的MSCs。In certain embodiments, the MSCs are MSCs derived from a mammal (eg, mouse, human).
在某些实施方案中,所述MSCs是来源于人的MSC(hMSCs)。In certain embodiments, the MSCs are human-derived MSCs (hMSCs).
在某些实施方案中,所述基因ACVRL1的Entrez Gene ID为94。In certain embodiments, the gene ACVRL1 has an Entrez Gene ID of 94.
在某些实施方案中,所述基因ARMC9的Entrez Gene ID为80210。In certain embodiments, the Entrez Gene ID of ARMC9 is 80210.
在某些实施方案中,所述基因BCHE的Entrez Gene ID为590。In certain embodiments, the gene BCHE has an Entrez Gene ID of 590.
在某些实施方案中,所述基因CD55的Entrez Gene ID为1604。In certain embodiments, the gene CD55 has an Entrez Gene ID of 1604.
在某些实施方案中,所述基因EBP的Entrez Gene ID为10682。In certain embodiments, the Entrez Gene ID of the gene EBP is 10682.
在某些实施方案中,所述基因FN1的Entrez Gene ID为2335。In certain embodiments, the gene FN1 has an Entrez Gene ID of 2335.
在某些实施方案中,所述基因FST的Entrez Gene ID为10468。In certain embodiments, the Entrez Gene ID of the gene FST is 10468.
在某些实施方案中,所述基因HOTAIRM1的Entrez Gene ID为100506311。In certain embodiments, the Entrez Gene ID of the gene HOTAIRM1 is 100506311.
在某些实施方案中,所述基因LIMK2的Entrez Gene ID为3985。In certain embodiments, the gene LIMK2 has an Entrez Gene ID of 3985.
在某些实施方案中,所述基因MECOM的Entrez Gene ID为2122。In certain embodiments, the Entrez Gene ID of MECOM is 2122.
在某些实施方案中,所述基因METTL26的Entrez Gene ID为84326。In certain embodiments, the Entrez Gene ID of the gene METTL26 is 84326.
在某些实施方案中,所述基因MSX1的Entrez Gene ID为4487。In certain embodiments, the gene MSX1 has an Entrez Gene ID of 4487.
在某些实施方案中,所述基因NBPF3的Entrez Gene ID为84224。In certain embodiments, the Entrez Gene ID of the gene NBPF3 is 84224.
在某些实施方案中,所述基因NECTIN3的Entrez Gene ID为25945。In certain embodiments, the Entrez Gene ID of NECTIN3 is 25945.
在某些实施方案中,所述基因NRXN2的Entrez Gene ID为9379。In certain embodiments, the Entrez Gene ID of the gene NRXN2 is 9379.
在某些实施方案中,所述基因PDE5A的Entrez Gene ID为8654。In certain embodiments, the Entrez Gene ID of the gene PDE5A is 8654.
在某些实施方案中,所述基因RIN3的Entrez Gene ID为79890。In certain embodiments, the Entrez Gene ID of the gene RIN3 is 79890.
在某些实施方案中,所述基因RPA2的Entrez Gene ID为6118。In certain embodiments, the gene RPA2 has an Entrez Gene ID of 6118.
在某些实施方案中,所述基因RSL24D1的Entrez Gene ID为51187。In certain embodiments, the Entrez Gene ID of the gene RSL24D1 is 51187.
在某些实施方案中,所述基因TSSC2的Entrez Gene ID为650368。In certain embodiments, the Entrez Gene ID of TSSC2 is 650368.
在某些实施方案中,所述基因ZIC1的Entrez Gene ID为7545。In certain embodiments, the gene ZIC1 has an Entrez Gene ID of 7545.
在某些实施方案中,所述基因ACVRL1的Ensembl Gene ID为ENSG00000139567。In certain embodiments, the Ensembl Gene ID of the gene ACVRL1 is ENSG00000139567.
在某些实施方案中,所述基因ARMC9的Ensembl Gene ID为ENSG00000135931。In certain embodiments, the Ensembl Gene ID of the gene ARMC9 is ENSG00000135931.
在某些实施方案中,所述基因BCHE的Ensembl Gene ID为ENSG00000114200。In certain embodiments, the Ensembl Gene ID of the gene BCHE is ENSG00000114200.
在某些实施方案中,所述基因CD55的Ensembl Gene ID为ENSG00000196352。In certain embodiments, the Ensembl Gene ID of the gene CD55 is ENSG00000196352.
在某些实施方案中,所述基因EBP的Ensembl Gene ID为ENSG00000147155。In certain embodiments, the Ensembl Gene ID of the gene EBP is ENSG00000147155.
在某些实施方案中,所述基因FN1的Ensembl Gene ID为ENSG00000115414。In certain embodiments, the Ensembl Gene ID of gene FN1 is ENSG00000115414.
在某些实施方案中,所述基因FST的Ensembl Gene ID为ENSG00000134363。In certain embodiments, the Ensembl Gene ID of the gene FST is ENSG00000134363.
在某些实施方案中,所述基因HOTAIRM1的Ensembl Gene ID为ENSG00000233429。In certain embodiments, the Ensembl Gene ID of the gene HOTAIRM1 is ENSG00000233429.
在某些实施方案中,所述基因LIMK2的Ensembl Gene ID为ENSG00000182541。In certain embodiments, the Ensembl Gene ID of the gene LIMK2 is ENSG00000182541.
在某些实施方案中,所述基因MECOM的Ensembl Gene ID为ENSG00000085276。In certain embodiments, the Ensembl Gene ID of the gene MECOM is ENSG00000085276.
在某些实施方案中,所述基因METTL26的Ensembl Gene ID为ENSG00000130731。In certain embodiments, the Ensembl Gene ID of the gene METTL26 is ENSG00000130731.
在某些实施方案中,所述基因MSX1的Ensembl Gene ID为ENSG00000163132。In certain embodiments, the Ensembl Gene ID of the gene MSX1 is ENSG00000163132.
在某些实施方案中,所述基因NBPF3的Ensembl Gene ID为ENSG00000142794。In certain embodiments, the Ensembl Gene ID of the gene NBPF3 is ENSG00000142794.
在某些实施方案中,所述基因NECTIN3的Ensembl Gene ID为ENSG00000177707。In certain embodiments, the Ensembl Gene ID of the gene NECTIN3 is ENSG00000177707.
在某些实施方案中,所述基因NRXN2的Ensembl Gene ID为ENSG00000110076。In certain embodiments, the Ensembl Gene ID of the gene NRXN2 is ENSG00000110076.
在某些实施方案中,所述基因PDE5A的Ensembl Gene ID为ENSG00000138735。In certain embodiments, the Ensembl Gene ID of the gene PDE5A is ENSG00000138735.
在某些实施方案中,所述基因RIN3的Ensembl Gene ID为ENSG00000100599。In certain embodiments, the Ensembl Gene ID of the gene RIN3 is ENSG00000100599.
在某些实施方案中,所述基因RPA2的Ensembl Gene ID为ENSG00000117748。In certain embodiments, the Ensembl Gene ID of the gene RPA2 is ENSG00000117748.
在某些实施方案中,所述基因RSL24D1的Ensembl Gene ID为ENSG00000137876。In certain embodiments, the Ensembl Gene ID of the gene RSL24D1 is ENSG00000137876.
在某些实施方案中,所述基因TSSC2的Ensembl Gene ID为ENSG00000223756。In certain embodiments, the Ensembl Gene ID of the gene TSSC2 is ENSG00000223756.
在某些实施方案中,所述基因ZIC1的Ensembl Gene ID为ENSG00000152977。In certain embodiments, the Ensembl Gene ID of the gene ZIC1 is ENSG00000152977.
在另一方面,本申请提供了一种机器学习模型,所述机器学习模型由如前所述的方法构建而成。On the other hand, the present application provides a machine learning model, which is constructed by the method as described above.
在某些实施方案中,所述机器学习模型用于鉴定样本中一种或多种MSCs的组织来源(例如,骨髓,脐带,胎盘或其部分(例如,胎盘羊膜),脂肪,牙髓,毛囊,皮肤,血液,或其任意组合)。In certain embodiments, the machine learning model is used to identify the tissue of origin of one or more MSCs in a sample (e.g., bone marrow, umbilical cord, placenta or portion thereof (e.g., placental amniotic membrane), fat, dental pulp, hair follicles , skin, blood, or any combination thereof).
在另一方面,本申请提供了如前所述的机器学习模型在鉴定样本中一种或多种MSCs的组织来源的用途。In another aspect, the present application provides the use of a machine learning model as previously described to identify the tissue origin of one or more MSCs in a sample.
在另一方面,本申请提供了一种鉴定样本中MSCs的组织来源的方法,包括:On the other hand, the present application provides a method for identifying the tissue origin of MSCs in a sample, including:
步骤(a):提供所述样本中MSCs的目标特征向量的表达量,所述目标特征向量包含下述基因或者包含由下述基因的转录产物:ACVRL1,ARMC9,BCHE,CD55,EBP,FN1,FST,HOTAIRM1,LIMK2,MECOM,METTL26,MSX1,NBPF3,NECTIN3,NRXN2,PDE5A,RIN3,RPA2,RSL24D1,TSSC2和ZIC1;Step (a): Provide the expression level of the target feature vector of MSCs in the sample, and the target feature vector includes the following genes or the transcript products of the following genes: ACVRL1, ARMC9, BCHE, CD55, EBP, FN1, FST, HOTAIRM1, LIMK2, MECOM, METTL26, MSX1, NBPF3, NECTIN3, NRXN2, PDE5A, RIN3, RPA2, RSL24D1, TSSC2 and ZIC1;
步骤(b):将所述目标特征向量的表达量输入如前所构建的机器学习模型,以鉴定样本中MSCs的组织来源。Step (b): Input the expression of the target feature vector into the machine learning model constructed as before to identify the tissue source of the MSCs in the sample.
在某些实施方案中,在步骤(a)中,所述表达量为TPM值。In certain embodiments, in step (a), the expression level is a TPM value.
在某些实施方案中,所述TPM值通过转录组测序获得。In certain embodiments, the TPM value is obtained by transcriptome sequencing.
在某些实施方案中,所述目标特征向量包含下述基因或者包含由下述基因所表达的蛋白:ACVRL1,ARMC9,BCHE,CD55,EBP,FN1,FST,HOTAIRM1,LIMK2,MECOM,METTL26,MSX1,NBPF3,NECTIN3,NRXN2,PDE5A,RIN3,RPA2,RSL24D1,TSSC2和ZIC1。In certain embodiments, the target feature vector includes the following genes or includes proteins expressed by the following genes: ACVRL1, ARMC9, BCHE, CD55, EBP, FN1, FST, HOTAIRM1, LIMK2, MECOM, METTL26, MSX1 , NBPF3, NECTIN3, NRXN2, PDE5A, RIN3, RPA2, RSL24D1, TSSC2 and ZIC1.
在某些实施方案中,所述样本中含有一种或多种MSCs。In certain embodiments, the sample contains one or more MSCs.
在某些实施方案中,在上述步骤(a)中,通过对所述样本中的MSCs进行转录组测序,以获得所述样本中MSCs的目标特征向量的表达量;或者,在上述步骤(a)中,通过对所述样本中MSCs的目标特征向量进行表达谱芯片检测、单细胞转录组测序、RT-qPCR测定、数字PCR测定,以获得所述样本中MSCs的目标特征向量的表达量。In some embodiments, in the above step (a), transcriptome sequencing is performed on the MSCs in the sample to obtain the expression level of the target feature vector of the MSCs in the sample; or, in the above step (a) ), the expression level of the target feature vector of MSCs in the sample is obtained by performing expression profile chip detection, single cell transcriptome sequencing, RT-qPCR measurement, and digital PCR measurement on the target feature vector of MSCs in the sample.
在某些实施方案中,所述MSCs的组织来源选自骨髓,脐带,胎盘或其部分(例如,胎盘羊膜),脂肪,牙髓,毛囊,胎盘中其他部位的组织,皮肤,血液,或其任意组合。In certain embodiments, the tissue source of the MSCs is selected from bone marrow, umbilical cord, placenta or parts thereof (e.g., placental amniotic membrane), fat, dental pulp, hair follicles, tissue from other parts of the placenta, skin, blood, or its random combination.
在某些实施方案中,所述MSCs是来源于哺乳动物(例如,小鼠,人)的MSCs。In certain embodiments, the MSCs are MSCs derived from a mammal (eg, mouse, human).
在某些实施方案中,所述MSCs是来源于人的MSC(hMSCs)。In certain embodiments, the MSCs are human-derived MSCs (hMSCs).
在某些实施方案中,所述样本含有的脂肪hMSCs的比例大于等于30%。In certain embodiments, the sample contains a proportion of adipose hMSCs of greater than or equal to 30%.
在某些实施方案中,所述样本含有的骨髓hMSCs的比例大于等于40%。In certain embodiments, the sample contains bone marrow hMSCs in a proportion of greater than or equal to 40%.
在某些实施方案中,所述样本含有的牙髓hMSCs的比例大于等于40%。In certain embodiments, the sample contains dental pulp hMSCs in a proportion of greater than or equal to 40%.
在某些实施方案中,所述样本含有的毛囊hMSCs的比例大于等于30%。In certain embodiments, the sample contains hair follicle hMSCs in a proportion of greater than or equal to 30%.
在某些实施方案中,所述样本含有的脐带hMSCs的比例大于等于20%。In certain embodiments, the sample contains umbilical cord hMSCs in a proportion of greater than or equal to 20%.
在某些实施方案中,所述样本含有的胎盘羊膜hMSCs的比例大于等于40%。In certain embodiments, the sample contains placenta-amniotic hMSCs in a proportion of greater than or equal to 40%.
在另一方面,本申请提供了一种鉴定间充质干细胞组织来源的装置,包括:In another aspect, the present application provides a device for identifying the tissue source of mesenchymal stem cells, including:
存储器,被配置为存储指令;memory configured to store instructions;
处理器,耦合到存储器,处理器被配置为基于存储器存储的指令执行实现如前所述的方法。A processor is coupled to the memory, and the processor is configured to execute the method as described above based on instructions stored in the memory.
在另一方面,本申请提供了一种计算机可读存储介质,其中,计算机可读存储介质存储有计算机指令,指令被处理器执行时实现如前所述的方法。On the other hand, the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, and when the instructions are executed by a processor, the method as described above is implemented.
在另一方面,本申请提供了一种用于鉴定样本中一种或多种MSCs的组织来源的试剂盒,所述试剂盒包含用于确定样品中生物标志物水平的试剂,所述生物标志物包含ACVRL1,ARMC9,BCHE,CD55,EBP,FN1,FST,HOTAIRM1,LIMK2,MECOM,METTL26,MSX1,NBPF3,NECTIN3,NRXN2,PDE5A,RIN3,RPA2,RSL24D1,TSSC2和ZIC1。In another aspect, the present application provides a kit for identifying the tissue origin of one or more MSCs in a sample, the kit comprising a reagent for determining the level of a biomarker in the sample, the biomarker The compounds include ACVRL1, ARMC9, BCHE, CD55, EBP, FN1, FST, HOTAIRM1, LIMK2, MECOM, METTL26, MSX1, NBPF3, NECTIN3, NRXN2, PDE5A, RIN3, RPA2, RSL24D1, TSSC2 and ZIC1.
在某些实施方案中,所述生物标志物的水平是所述生物标志物的蛋白质或mRNA水平。In certain embodiments, the level of the biomarker is the protein or mRNA level of the biomarker.
在某些实施方案中,所述MSCs是来源于哺乳动物(例如,小鼠,人)的MSCs。In certain embodiments, the MSCs are MSCs derived from a mammal (eg, mouse, human).
在某些实施方案中,所述MSCs是来源于人的MSC(hMSCs)。In certain embodiments, the MSCs are human-derived MSCs (hMSCs).
在另一方面,本申请提供了用于确定样品中生物标志物水平的试剂在制备试剂盒中的用途,所述试剂盒用于鉴定样本中一种或多种MSCs的组织来源;其中,所述生物标志物包含ACVRL1,ARMC9,BCHE,CD55,EBP,FN1,FST,HOTAIRM1,LIMK2,MECOM,METTL26,MSX1,NBPF3,NECTIN3,NRXN2,PDE5A,RIN3,RPA2,RSL24D1,TSSC2和ZIC1。In another aspect, the present application provides the use of a reagent for determining the level of a biomarker in a sample in the preparation of a kit for identifying the tissue origin of one or more MSCs in the sample; wherein, the The above biomarkers include ACVRL1, ARMC9, BCHE, CD55, EBP, FN1, FST, HOTAIRM1, LIMK2, MECOM, METTL26, MSX1, NBPF3, NECTIN3, NRXN2, PDE5A, RIN3, RPA2, RSL24D1, TSSC2 and ZIC1.
在某些实施方案中,所述生物标志物的水平是所述生物标志物的蛋白质或mRNA水平。In certain embodiments, the level of the biomarker is the protein or mRNA level of the biomarker.
在某些实施方案中,所述MSCs是来源于哺乳动物(例如,小鼠,人)的MSCs。In certain embodiments, the MSCs are MSCs derived from a mammal (eg, mouse, human).
在某些实施方案中,所述MSCs是来源于人的MSC(hMSCs)。In certain embodiments, the MSCs are human-derived MSCs (hMSCs).
术语定义Definition of Terms
在本公开中,除非另有说明,否则本文中使用的科学和技术名词具有本领域技术人员所通常理解的含义。并且,本文中所用的细胞培养、分子遗传学、核酸化学、免疫学实验室操作步骤均为相应领域内广泛使用的常规步骤。同时,为了更好地理解本公开,下面提供相关术语的定义和解释。In this disclosure, unless otherwise stated, scientific and technical terms used herein have the meanings commonly understood by those skilled in the art. Moreover, the cell culture, molecular genetics, nucleic acid chemistry, and immunology laboratory procedures used in this article are routine procedures widely used in the corresponding fields. Meanwhile, in order to better understand the present disclosure, definitions and explanations of relevant terms are provided below.
如本文中所使用的,术语“样品”是指从受试者获得的生物样品,所述样品可以是含有或推测含有人间充质干细胞的样品。As used herein, the term "sample" refers to a biological sample obtained from a subject, which sample may be a sample that contains or is presumed to contain human mesenchymal stem cells.
如本文中所使用的,术语“机器学习模型”或“机器学习方法”或“统计学习方法”代表相同的含义,可以互换使用。其是指参数和函数的集合,能够通过训练样品中的测量特征(目标特征向量)以建立相应的训练模型。在某些实施方案中,所述训练模型可以在优化参数的训练过程中从训练样品中学习,以提供用于分类新样品的最佳质量度量(例如,准确性)。在某些实施方案中,所述参数和函数可以是线性代数运算、非线性代数运算和张量代数运算的集合。在某些实施方案中,所述参数和函数可以包含统计函数、检验和概率模型。在某些实施方案中,所述训练样品中的测量特性是基因的表达量。As used herein, the terms "machine learning model" or "machine learning method" or "statistical learning method" have the same meaning and may be used interchangeably. It refers to a set of parameters and functions that can establish a corresponding training model through the measured features (target feature vectors) in the training sample. In certain embodiments, the training model can learn from training samples during training with optimized parameters to provide the best quality measure (eg, accuracy) for classifying new samples. In certain embodiments, the parameters and functions may be a collection of linear algebraic operations, nonlinear algebraic operations, and tensor algebraic operations. In certain embodiments, the parameters and functions may include statistical functions, tests, and probability models. In certain embodiments, the measured characteristic in the training sample is the expression of a gene.
如本文中所使用的,术语“特异性(specificity)”是指本身被正确地鉴定的实际阴性的比例。As used herein, the term "specificity" refers to the proportion of actual negatives that are themselves correctly identified.
如本文中所使用的,术语“灵敏度(sensitivity)”是指本身被正确地鉴定的实际阳性的比例。As used herein, the term "sensitivity" refers to the proportion of actual positives that are themselves correctly identified.
如本文中所使用的,术语“转录组测序”或“RNA-seq”是指通过测序平台(例如,二代测序平台)快速全面地获得某一物种特定细胞或组织在某一状态下的几乎所有的转录本及基因序列。其可以用于研究基因表达量、基因功能、结构、可变剪接和新转录本预测等。通常,在转录组测序的分析中,有三个经典的数值,即count,FPKM以及TPM值。As used herein, the term "transcriptome sequencing" or "RNA-seq" refers to the rapid and comprehensive acquisition of almost all specific cells or tissues of a species in a certain state through a sequencing platform (e.g., a second-generation sequencing platform). All transcripts and gene sequences. It can be used to study gene expression, gene function, structure, alternative splicing, and prediction of new transcripts. Usually, in the analysis of transcriptome sequencing, there are three classic values, namely count, FPKM and TPM values.
如本文中所使用的,术语“count”是指测序数据中比对到某个基因上的读段(reads)的总数目,即,将测得的读段比对到参考基因组上,然后通过软件来计算比对到该基因上的读段的总数量。As used in this article, the term "count" refers to the total number of reads in the sequencing data that are mapped to a certain gene, that is, the measured reads are mapped to the reference genome, and then The software calculates the total number of reads that map to the gene.
如本文中所使用的,术语“FPKM(fragments per kilobase million)”是指比对到的某个基因的片段(Fragment)数目,对测序深度进行归一化,然后再对基因长度进行归一化,以消除不同测序样本间由于测序深度和基因长度对结果的影响。As used in this article, the term "FPKM (fragments per kilobase million)" refers to the number of aligned fragments of a gene, normalized to sequencing depth, and then normalized to gene length. , to eliminate the impact of sequencing depth and gene length on the results between different sequencing samples.
如本文中所使用的,术语“TPM(transcripts per million)”是指比对到的某个基因的片段(Fragment)数目,先对基因长度进行归一化,然后再对测序深度进行归一化,以消除不同测序样本间由于测序深度和基因长度对结果的影响。在某些实施方案中,TPM可以 做为基因表达量的衡量指标。As used in this article, the term "TPM (transcripts per million)" refers to the number of aligned fragments of a certain gene. The gene length is first normalized, and then the sequencing depth is normalized. , to eliminate the impact of sequencing depth and gene length on the results between different sequencing samples. In some embodiments, TPM can be used as a measure of gene expression.
如本文中所使用的,术语“TPMmax”,其是指一组样本中某基因的TPM值的最大值。As used herein, the term "TPMmax" refers to the maximum TPM value of a gene in a set of samples.
有益效果beneficial effects
本申请提供了一种鉴定间充质干细胞(MSCs)组织来源的模型以及构建该模型的方法,能够对临床研究中常见的不同组织来源的MSCs的组织来源进行准确地鉴别。该模型经过训练集,测试集以及外部数据集的多次验证,准确度,灵敏度和特异性均能够达到95%及以上(甚至能高达100%)。此外,本申请建立的模型还能够鉴定样品中多种混合的间充质干细胞各自的组织来源,且准确度,灵敏度和特异性也能够达到100%,具有了较高的临床应用价值。This application provides a model for identifying the tissue origin of mesenchymal stem cells (MSCs) and a method for constructing the model, which can accurately identify the tissue origin of MSCs from different tissues commonly used in clinical research. The model has been verified multiple times with training sets, test sets and external data sets, and the accuracy, sensitivity and specificity can all reach 95% and above (even as high as 100%). In addition, the model established in this application can also identify the tissue sources of various mixed mesenchymal stem cells in the sample, and the accuracy, sensitivity and specificity can also reach 100%, which has high clinical application value.
附图说明Description of the drawings
图1显示了实施例2中机器学习模型对训练集的准确度、灵敏度和特异性的检测结果。Figure 1 shows the detection results of the accuracy, sensitivity and specificity of the machine learning model on the training set in Example 2.
图2显示了实施例2中机器学习模型对测试集的准确度、灵敏度和特异性的检测结果。Figure 2 shows the detection results of the accuracy, sensitivity and specificity of the machine learning model on the test set in Example 2.
图3显示了实施例2中机器学习模型对外部数据集的准确度、灵敏度和特异性的检测结果。Figure 3 shows the detection results of the accuracy, sensitivity and specificity of the machine learning model on the external data set in Example 2.
图4显示了实施例3中机器学习模型对混合细胞的预测能力,其中,图4A为模拟混合样本1的中2种不同来源的hMSCs的检测结果,图4B为模拟混合样本2中2种不同来源的hMSCs的检测结果,图4C为模拟混合样本3中2种不同来源的hMSCs的检测结果。Figure 4 shows the prediction ability of the machine learning model for mixed cells in Example 3. Figure 4A shows the detection results of two different sources of hMSCs in simulated mixed sample 1, and Figure 4B shows the detection results of two different types of hMSCs in simulated mixed sample 2. The detection results of hMSCs from two different sources. Figure 4C shows the detection results of hMSCs from two different sources in simulated mixed sample 3.
图5显示了实施例3中机器学习模型对模拟混合样本4的中3种不同来源的hMSCs的检测结果。Figure 5 shows the detection results of hMSCs from three different sources in simulated mixed sample 4 by the machine learning model in Example 3.
具体实施方式Detailed ways
现参照下列意在举例说明本发明(而非限定本发明)的实施例来描述本发明。除非特别指明,否则基本上按照本领域内熟知的以及在各种参考文献中描述的常规方法进行实施例中描述的实验和方法。The invention will now be described with reference to the following examples which are intended to illustrate but not to limit the invention. Unless otherwise indicated, the experiments and methods described in the examples were performed essentially according to conventional methods well known in the art and described in various references.
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适 当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。Techniques, methods and equipment known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods and equipment should be considered part of the authorized specification. In all examples shown and discussed herein, any specific values are to be construed as illustrative only and not as limiting. Accordingly, other examples of the exemplary embodiments may have different values.
另外,实施例中未注明具体条件者,按照常规条件或制造商建议的条件进行。所用试剂或仪器未注明生产厂商者,均为可以通过市购获得的常规产品。本领域技术人员知晓,实施例以举例方式描述本发明,且不意欲限制本发明所要求保护的范围。本文中提及的全部公开案和其他参考资料以其全文通过引用合并入本文。In addition, if the specific conditions are not specified in the examples, the conventional conditions or the conditions recommended by the manufacturer shall be followed. If the manufacturer of the reagents or instruments used is not indicated, they are all conventional products that can be purchased commercially. Those skilled in the art will appreciate that the examples describe the invention by way of example and are not intended to limit the scope of the invention as claimed. All publications and other references mentioned herein are incorporated by reference in their entirety.
实施例1.试验材料和设备Example 1. Test materials and equipment
1、用于建立机器学习模型的hMSCs细胞共137株,均来自于中国食品药品检定研究院细胞资源保藏研究中心,组织来源及数量见下述表1。1. A total of 137 hMSCs cells were used to establish the machine learning model, all from the Cell Resource Preservation Research Center of the China Institute for Food and Drug Control. The tissue source and quantity are shown in Table 1 below.
表1.建立机器学习模型的hMSCs细胞Table 1. hMSCs cells used to build machine learning models
Figure PCTCN2022110507-appb-000001
Figure PCTCN2022110507-appb-000001
2、收集了共99株hMSCs细胞数据,作为进一步测试的外部数据集,组织来源及数量见下述表2。2. A total of 99 hMSCs cell data were collected as an external data set for further testing. The tissue sources and quantities are shown in Table 2 below.
表2.外部数据集的hMSCs细胞Table 2. hMSCs cells from external data sets
Figure PCTCN2022110507-appb-000002
Figure PCTCN2022110507-appb-000002
Figure PCTCN2022110507-appb-000003
Figure PCTCN2022110507-appb-000003
3、试验所使用的具体材料和设备见下述表3。3. The specific materials and equipment used in the test are shown in Table 3 below.
表3.材料及设备Table 3. Materials and equipment
Figure PCTCN2022110507-appb-000004
Figure PCTCN2022110507-appb-000004
实施例2.机器学习模型的建立Example 2. Establishment of machine learning model
一、建立机器学习模型1. Establish a machine learning model
1、转录组测序1. Transcriptome sequencing
实施例1中的137株hMSCs细胞采用Trizol法提取RNA,通过表3中的cDNA文库构建试剂盒,将提取的RNA反转录成cDNA并建立cDNA文库,通过表3中的测序试剂盒进行转录组测序,获得137株hMSCs细胞的转录组测序的信息。RNA was extracted from the 137 hMSCs cells in Example 1 using the Trizol method, and the cDNA library construction kit in Table 3 was used. The extracted RNA was reverse transcribed into cDNA and a cDNA library was established. Transcription was performed using the sequencing kit in Table 3. Group sequencing was performed to obtain transcriptome sequencing information of 137 hMSCs cells.
2、转录组测序后数据分析2. Data analysis after transcriptome sequencing
转录组测序后每个样本得到约6G的cleanbase,分析流程如下述表4所示。After transcriptome sequencing, each sample obtained approximately 6G of cleanbase. The analysis process is shown in Table 4 below.
表4.转录组测序过程Table 4. Transcriptome sequencing process
Figure PCTCN2022110507-appb-000005
Figure PCTCN2022110507-appb-000005
经过上述分析后,得到137株hMSCs细胞的基因的转录表达水平,其中,包括转录本的count值、FPKM和TPM。After the above analysis, the gene transcription expression levels of 137 hMSCs cells were obtained, including the count value of transcripts, FPKM and TPM.
3、统计学习数据预处理3. Statistical learning data preprocessing
预处理软件:R(ver=4.1.3),R包tidyverse(ver=1.3.1)。Preprocessing software: R (ver=4.1.3), R package tidyverse (ver=1.3.1).
从转录本中根据Official Symbol识别过滤出mRNA,共得到38735个基因,利用R包tidyverse从中过滤出高丰度表达基因(TPMmax>10),共得到13315个基因。The mRNA was filtered out from the transcripts based on Official Symbol recognition, and a total of 38,735 genes were obtained. The R package tidyverse was used to filter out the highly abundant expressed genes (TPMmax>10), and a total of 13,315 genes were obtained.
4、统计学习建模4. Statistical learning modeling
统计学习建模平台:R;软件glmnet和tidymodel。Statistical learning modeling platform: R; software glmnet and tidymodel.
将137个hMSCs转录组数据分成训练集(70%)和测试集(30%),将13315个基因做为特征向量,采用lasso回归(10折交叉验证)进行特征向量筛选,方法为:cv.glmnet(x,y,type.measure=“class”,nfolds=10,family=“multinomial”,alpha=1,type.multinomial=“grouped”)。10折交叉验证结果显示λ=0.02552时,错分率为0,同时目标特征向量可缩小为21个。The 137 hMSCs transcriptome data were divided into a training set (70%) and a test set (30%). 13315 genes were used as feature vectors, and lasso regression (10-fold cross-validation) was used to screen the feature vectors. The method is: cv. glmnet(x,y,type.measure="class",nfolds=10,family="multinomial",alpha=1,type.multinomial="grouped"). The 10-fold cross-validation result shows that when λ = 0.02552, the misclassification rate is 0, and the target feature vector can be reduced to 21.
通过上述特征向量筛选最终得到21个目标特征向量,即21个基因,具体如表5所示。Through the above feature vector screening, 21 target feature vectors, namely 21 genes, were finally obtained, as shown in Table 5.
表5.筛选到的21个基因Table 5. 21 genes screened
序号serial number symbol IDsymbol ID Entrez Gene IDEntrez Gene ID Ensembl Gene ID Ensembl Gene ID
11 ACVRL1ACVRL1 9494 ENSG00000139567 ENSG00000139567
22 ARMC9ARMC9 8021080210 ENSG00000135931ENSG00000135931
33 BCHEBCHE 590590 ENSG00000114200 ENSG00000114200
44 CD55CD55 16041604 ENSG00000196352 ENSG00000196352
55 EBPEBP 1068210682 ENSG00000147155 ENSG00000147155
66 FN1FN1 23352335 ENSG00000115414ENSG00000115414
77 FSTFST 1046810468 ENSG00000134363ENSG00000134363
88 HOTAIRM1HOTAIRM1 100506311100506311 ENSG00000233429 ENSG00000233429
99 LIMK2LIMK2 39853985 ENSG00000182541 ENSG00000182541
1010 MECOMMECOM 21222122 ENSG00000085276 ENSG00000085276
1111 METTL26METTL26 8432684326 ENSG00000130731ENSG00000130731
1212 MSX1MSX1 44874487 ENSG00000163132ENSG00000163132
1313 NBPF3NBPF3 8422484224 ENSG00000142794ENSG00000142794
1414 NECTIN3NECTIN3 2594525945 ENSG00000177707ENSG00000177707
1515 NRXN2NRXN2 93799379 ENSG00000110076ENSG00000110076
1616 PDE5APDE5A 86548654 ENSG00000138735ENSG00000138735
1717 RIN3RIN3 7989079890 ENSG00000100599 ENSG00000100599
1818 RPA2RPA2 61186118 ENSG00000117748ENSG00000117748
1919 RSL24D1RSL24D1 5118751187 ENSG00000137876 ENSG00000137876
2020 TSSC2TSSC2 650368650368 ENSG00000223756ENSG00000223756
21twenty one ZIC1ZIC1 75457545 ENSG00000152977ENSG00000152977
以上述21个基因再次进行lasso回归,并建立机器学习模型。Perform lasso regression again with the above 21 genes, and establish a machine learning model.
二、机器学习模型的初步建立与评估2. Initial establishment and evaluation of machine learning models
将训练集(总样本量的70%)的21个基因的表达量(即,TPM值)输入上述建立的机器学习模型中,并对模型的预测性能准确度、灵敏度、特异性进行检测。准确度的检测结果如表6所示和图1所示。The expression levels (ie, TPM values) of 21 genes in the training set (70% of the total sample size) were input into the machine learning model established above, and the accuracy, sensitivity, and specificity of the model's prediction performance were tested. The accuracy test results are shown in Table 6 and Figure 1.
表6.模型检测的准确度Table 6. Accuracy of model detection
Figure PCTCN2022110507-appb-000006
Figure PCTCN2022110507-appb-000006
结果显示,建立的机器学习模型在训练集中对hMSCs的组织来源实现了100%的预测准确性。不仅如此,结果显示,机器学习模型的灵敏度和特异性也均为100%。The results showed that the established machine learning model achieved 100% prediction accuracy for the tissue origin of hMSCs in the training set. Not only that, the results showed that the sensitivity and specificity of the machine learning model were also 100%.
接着,将测试集(总样本量的30%)的21个基因的表达量输入上述建立的机器学习模型中,并对模型的预测性能准确度、灵敏度、特异性进行检测。准确度的检测结果如表7和图2所示。Next, the expression levels of 21 genes in the test set (30% of the total sample size) were input into the machine learning model established above, and the accuracy, sensitivity, and specificity of the model's prediction performance were tested. The accuracy test results are shown in Table 7 and Figure 2.
表7.模型检测的准确度Table 7. Accuracy of model detection
Figure PCTCN2022110507-appb-000007
Figure PCTCN2022110507-appb-000007
结果显示,建立的机器学习模型在测试集中对hMSCs的组织来源实现了100%的预测准确性。不仅如此,结果显示,机器学习模型的灵敏度和特异性也均为100%。The results showed that the established machine learning model achieved 100% prediction accuracy for the tissue origin of hMSCs in the test set. Not only that, the results showed that the sensitivity and specificity of the machine learning model were also 100%.
进一步的,将实施例1所述的外部数据集(共99株hMSCs)按照如上所述的方法进 行转录组测序,并将获得的21个基因的表达量输入上述建立的机器学习模型中,对模型的预测性能准确度、灵敏度、特异性进行检测。准确度的检测结果如表8和图3所示。Further, the external data set described in Example 1 (a total of 99 hMSCs) was subjected to transcriptome sequencing as described above, and the expression levels of the 21 genes obtained were input into the machine learning model established above. The model's prediction performance accuracy, sensitivity, and specificity were tested. The accuracy test results are shown in Table 8 and Figure 3.
表8.模型检测的准确度Table 8. Accuracy of model detection
Figure PCTCN2022110507-appb-000008
Figure PCTCN2022110507-appb-000008
结果显示,建立的机器学习模型在外部数据集中对hMSCs的组织来源实现了100%的预测准确性。不仅如此,结果显示,机器学习模型的灵敏度和特异性也均为100%。The results showed that the established machine learning model achieved 100% prediction accuracy for the tissue origin of hMSCs in external datasets. Not only that, the results showed that the sensitivity and specificity of the machine learning model were also 100%.
实施例3.机器学习模型对混合细胞的预测能力Example 3. Predictive ability of machine learning model for mixed cells
在实际应用中,检测样本中可能混合了几种不同组织来源的hMSCs,因此本实施例模拟了几种不同组织来源的hMSCs混合的情况。In practical applications, hMSCs from several different tissue sources may be mixed in the detection sample, so this embodiment simulates the situation in which hMSCs from several different tissue sources are mixed.
首先,将hMSCs转录组测序后reads抽取1,000,000(1M)条、2,000,000(2M)条、…10,000,000(10M)条,将不同组织来源hMSCs测序reads按照不同比例进行混合,生成新的混合样本。具体混合样本如下:First, 1,000,000 (1M), 2,000,000 (2M), ... 10,000,000 (10M) reads were extracted from hMSCs transcriptome sequencing, and the hMSCs sequencing reads from different tissue sources were mixed in different proportions to generate new mixed samples. The specific mixed samples are as follows:
第一组模拟数据包含脂肪来源hMSCs和骨髓来源hMSCs以不同比例混合的11个样本,具体如表9所示:The first set of simulation data includes 11 samples in which adipose-derived hMSCs and bone marrow-derived hMSCs were mixed in different proportions, as shown in Table 9:
表9.第一组模拟数据Table 9. First set of simulation data
  脂肪hMSCs测序reads数Number of sequencing reads of adipose hMSCs 骨髓hMSCs测序reads数Bone marrow hMSCs sequencing read number
混合样本1 Mixed sample 1 0M0M 10M10M
混合样本2 Mixed sample 2 1M1M 9M9M
混合样本3Mixed sample 3 2M2M 8M8M
混合样本4 Mixed sample 4 3M3M 7M7M
混合样本5 Mixed sample 5 4M4M 6M6M
混合样本6 Mixed sample 6 5M5M 5M5M
混合样本7Mixed sample 7 6M6M 4M4M
混合样本8Mixed sample 8 7M7M 3M3M
混合样本9 Mixed sample 9 8M8M 2M2M
混合样本10 Mixed sample 10 9M9M 1M1M
混合样本11 Mixed sample 11 10M10M 0M0M
第二组模拟数据包含牙髓来源hMSCs和毛囊来源hMSCs以不同比例混合的11个样本,具体如表10所示:The second set of simulation data includes 11 samples of dental pulp-derived hMSCs and hair follicle-derived hMSCs mixed in different proportions, as shown in Table 10:
表10.第二组模拟数据Table 10. Second set of simulation data
  牙髓hMSCs测序reads数Number of sequencing reads of dental pulp hMSCs 毛囊hMSCs测序reads数Number of sequencing reads of hair follicle hMSCs
混合样本1 Mixed sample 1 0M0M 10M10M
混合样本2 Mixed sample 2 1M1M 9M9M
混合样本3Mixed sample 3 2M2M 8M8M
混合样本4 Mixed sample 4 3M3M 7M7M
混合样本5 Mixed sample 5 4M4M 6M6M
混合样本6 Mixed sample 6 5M5M 5M5M
混合样本7Mixed sample 7 6M6M 4M4M
混合样本8Mixed sample 8 7M7M 3M3M
混合样本9 Mixed sample 9 8M8M 2M2M
混合样本10 Mixed sample 10 9M9M 1M1M
混合样本11 Mixed sample 11 10M10M 0M0M
第三组模拟数据包含脐带来源hMSCs和胎盘羊膜来源hMSCs以不同比例混合的11个样本,具体如表11所示:The third set of simulation data includes 11 samples of umbilical cord-derived hMSCs and placental amnion-derived hMSCs mixed in different proportions, as shown in Table 11:
表11.第三组模拟数据Table 11. The third set of simulation data
  脐带hMSCs测序reads数Umbilical cord hMSCs sequencing read number 胎盘羊膜hMSCs测序reads数Placental amnion hMSCs sequencing read number
混合样本1 Mixed sample 1 0M0M 10M10M
混合样本2 Mixed sample 2 1M1M 9M9M
混合样本3Mixed sample 3 2M2M 8M8M
混合样本4 Mixed sample 4 3M3M 7M7M
混合样本5 Mixed sample 5 4M4M 6M6M
混合样本6 Mixed sample 6 5M5M 5M5M
混合样本7Mixed sample 7 6M6M 4M4M
混合样本8Mixed sample 8 7M7M 3M3M
混合样本9 Mixed sample 9 8M8M 2M2M
混合样本10 Mixed sample 10 9M9M 1M1M
混合样本11 Mixed sample 11 10M10M 0M0M
将上述三组混合样本按照如上所述的方法进行组织来源鉴别分析,结果如图4所示。结果显示,准确的预测出了多组混合样本中hMSCs的不同组织来源。The above three groups of mixed samples were subjected to tissue source identification analysis according to the method described above, and the results are shown in Figure 4. The results showed that the different tissue sources of hMSCs in multiple groups of mixed samples were accurately predicted.
进一步的,将3种不同组织来源的hMSCs按照不同比例进行混合。具体混合样本如下:混合样本包含脂肪、骨髓和毛囊来源的hMSCs,混合后得到第四组的11个混合样本,具体如表12所示:Further, hMSCs from three different tissue sources were mixed in different proportions. The specific mixed samples are as follows: The mixed samples contain hMSCs derived from fat, bone marrow and hair follicles. After mixing, 11 mixed samples of the fourth group are obtained, as shown in Table 12:
表12.三种不同来源的hMSCs的混合样本Table 12. Mixed samples of hMSCs from three different sources
Figure PCTCN2022110507-appb-000009
Figure PCTCN2022110507-appb-000009
按照如上所述的方法进行组织来源鉴别检测,图5为第四组模拟混合样本中3种不同来源的hMSCs的检测结果。结果显示,本申请建立的模型准确的预测出了混合样本中多种hMSCs的不同组织来源。因此,本申请建立的模型能够用于hMSCs的混合样本(含一种或多种不同来源的hMSCs)的检测。Tissue source identification detection was performed as described above. Figure 5 shows the detection results of hMSCs from three different sources in the fourth group of simulated mixed samples. The results showed that the model established in this application accurately predicted the different tissue origins of multiple hMSCs in mixed samples. Therefore, the model established in this application can be used for the detection of mixed samples of hMSCs (containing one or more hMSCs from different sources).
实施例4.不同的机器学习方法的比较Example 4. Comparison of different machine learning methods
本实施例为了比较不同机器学习模型对于建立的鉴定间充质干细胞(MSCs)组织来源的模型的准确性的影响,分别选用5种不同的机器学习模型/方法,按照实施例2所述的方法建立上述鉴定MSCs组织来源的模型(本实施例所使用的方法与实施例2的唯一不同之处在于采用了不同的机器学习模型/方法),并验证建立的模型对于MSCs组织来源鉴定的准确性的差别。In this example, in order to compare the impact of different machine learning models on the accuracy of the established model for identifying the tissue source of mesenchymal stem cells (MSCs), 5 different machine learning models/methods were selected, according to the method described in Example 2 Establish the above-mentioned model for identifying the tissue source of MSCs (the only difference between the method used in this embodiment and Example 2 is the use of different machine learning models/methods), and verify the accuracy of the established model for identifying the tissue source of MSCs difference.
实验结果如表9所示,与实施例2中的lasso回归方法相比(通过lasso回归方法建立的模型对训练集、测试集和外部数据集的鉴定准确度均为100%),岭回归,支持向量机,以及线性判别的方法同样能够达到较高的准确度,可作为Lasso回归建模的替代性方法,用于建立本申请的鉴定间充质干细胞(MSCs)组织来源的模型。The experimental results are shown in Table 9. Compared with the lasso regression method in Example 2 (the identification accuracy of the model established by the lasso regression method for the training set, test set and external data set is 100%), ridge regression, Support vector machines and linear discrimination methods can also achieve high accuracy and can be used as alternative methods to Lasso regression modeling to establish the model for identifying the tissue source of mesenchymal stem cells (MSCs) in this application.
表9.本实施例采用的机器学习方法Table 9. Machine learning method used in this embodiment
Figure PCTCN2022110507-appb-000010
Figure PCTCN2022110507-appb-000010
尽管本公开的具体实施方式已经得到详细的描述,但本领域技术人员将理解:根据已经公布的所有教导,可以对细节进行各种修改和变动,并且这些改变均在本公开的保护范围之内。本公开的全部分为由所附权利要求及其任何等同物给出。Although the specific embodiments of the present disclosure have been described in detail, those skilled in the art will understand that various modifications and changes can be made to the details based on all teachings that have been published, and these changes are within the protection scope of the present disclosure. . The entire disclosure is given by the appended claims and any equivalents thereof.

Claims (11)

  1. 一种构建鉴定间充质干细胞(MSCs)组织来源的模型的方法,其包括:A method of constructing a model for identifying the tissue origin of mesenchymal stem cells (MSCs), which includes:
    步骤(1):提供n株来源于不同组织的MSCs,收集所述MSCs转录组测序的信息,其中,所述n为大于等于10的整数;Step (1): Provide n strains of MSCs derived from different tissues, and collect transcriptome sequencing information of the MSCs, where n is an integer greater than or equal to 10;
    步骤(2):从所述转录组测序的信息中获得mRNA的信息;Step (2): Obtain the mRNA information from the transcriptome sequencing information;
    步骤(3):从所述mRNA的信息中获得TPMmax大于10的基因;Step (3): Obtain genes with TPMmax greater than 10 from the mRNA information;
    步骤(4):将步骤(3)获得的基因的表达量做为特征向量,通过机器学习方法对所述特征向量进行筛选,并获得目标特征向量;Step (4): Use the expression level of the gene obtained in step (3) as a feature vector, filter the feature vector through a machine learning method, and obtain the target feature vector;
    步骤(5):利用所述目标特征向量的表达量对机器学习模型进行训练,以构建鉴定间充质干细胞(MSCs)组织来源的模型;Step (5): Use the expression amount of the target feature vector to train the machine learning model to build a model for identifying the tissue source of mesenchymal stem cells (MSCs);
    优选地,在步骤(5)中,从所述n株来源于不同组织的MSCs中随机提取55%至95%的样本作为训练集,利用训练集的目标特征向量对机器学习模型进行训练,以构建鉴定间充质干细胞(MSCs)组织来源的模型;Preferably, in step (5), 55% to 95% of samples are randomly extracted from the n strains of MSCs derived from different tissues as a training set, and the target feature vector of the training set is used to train the machine learning model to Construct a model to identify the tissue source of mesenchymal stem cells (MSCs);
    更优选地,所述方法还包括步骤(6):将提取至训练集以外的MSCs作为测试集,利用测试集的目标特征向量对机器学习模型进行测试,以确定所述模型的准确度、灵敏度和特异性;More preferably, the method further includes step (6): using the MSCs extracted outside the training set as a test set, and testing the machine learning model using the target feature vector of the test set to determine the accuracy and sensitivity of the model. and specificity;
    优选地,在步骤(4)中,所述基因的表达量为基因的TPM值;Preferably, in step (4), the expression level of the gene is the TPM value of the gene;
    优选地,在步骤(5)中,所述目标特征向量的表达量为目标特征向量的TPM值。Preferably, in step (5), the expression amount of the target feature vector is the TPM value of the target feature vector.
  2. 权利要求1的方法,其中,所述机器学习模型选自Lasso回归,岭回归,支持向量机或线性判别;The method of claim 1, wherein the machine learning model is selected from Lasso regression, ridge regression, support vector machine or linear discriminant;
    优选地,所述机器学习模型为Lasso回归。Preferably, the machine learning model is Lasso regression.
  3. 权利要求1或2的方法,其中,所述方法具有选自下列的一项或多项特征:The method of claim 1 or 2, wherein the method has one or more characteristics selected from the following:
    (1)在步骤(5)中,从所述n株来源于不同组织的MSCs中随机提取55%,60%,65%,70%,75%,80%,85%,90%或95%的样本作为训练集;(1) In step (5), 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95% are randomly extracted from the n strains of MSCs derived from different tissues. samples as the training set;
    (2)所述目标特征向量包含下述基因或者包含下述基因的转录产物:ACVRL1,ARMC9,BCHE,CD55,EBP,FN1,FST,HOTAIRM1,LIMK2,MECOM,METTL26, MSX1,NBPF3,NECTIN3,NRXN2,PDE5A,RIN3,RPA2,RSL24D1,TSSC2和ZIC1;(2) The target feature vector contains the following genes or the transcript products of the following genes: ACVRL1, ARMC9, BCHE, CD55, EBP, FN1, FST, HOTAIRM1, LIMK2, MECOM, METTL26, MSX1, NBPF3, NECTIN3, NRXN2 , PDE5A, RIN3, RPA2, RSL24D1, TSSC2 and ZIC1;
    (3)所述目标特征向量选自下述基因或者选自下述基因的转录产物:ACVRL1,ARMC9,BCHE,CD55,EBP,FN1,FST,HOTAIRM1,LIMK2,MECOM,METTL26,MSX1,NBPF3,NECTIN3,NRXN2,PDE5A,RIN3,RPA2,RSL24D1,TSSC2,ZIC1,或其任意组合;(3) The target feature vector is selected from the following genes or the transcript products of the following genes: ACVRL1, ARMC9, BCHE, CD55, EBP, FN1, FST, HOTAIRM1, LIMK2, MECOM, METTL26, MSX1, NBPF3, NECTIN3 , NRXN2, PDE5A, RIN3, RPA2, RSL24D1, TSSC2, ZIC1, or any combination thereof;
    (4)所述n为10至50之间的整数,51至100之间的整数,101至150之间的整数,151至200之间的整数,201至250之间的整数,251至300之间的整数,301至500之间的整数,或501至1000之间的整数;(4) The n is an integer between 10 and 50, an integer between 51 and 100, an integer between 101 and 150, an integer between 151 and 200, an integer between 201 and 250, and 251 and 300. an integer between 301 and 500, or an integer between 501 and 1000;
    (5)所述n株来源于不同组织的MSCs的来源选自骨髓,脐带,胎盘或其部分(例如,胎盘羊膜),脂肪,牙髓,毛囊,皮肤,血液,或其任意组合;(5) The sources of the n strains of MSCs derived from different tissues are selected from bone marrow, umbilical cord, placenta or parts thereof (for example, placental amniotic membrane), fat, dental pulp, hair follicles, skin, blood, or any combination thereof;
    (6)所述MSCs是来源于哺乳动物(例如,小鼠,人)的MSCs;(6) The MSCs are MSCs derived from mammals (for example, mice, humans);
    (7)所述MSCs是来源于人的MSC(hMSCs);(7) The MSCs are human-derived MSCs (hMSCs);
    优选地,所述转录产物选自rRNA,tRNA,mRNA,或非编码RNA;Preferably, the transcription product is selected from rRNA, tRNA, mRNA, or non-coding RNA;
    优选地,所述转录产物是mRNA。Preferably, the transcript is mRNA.
  4. 一种机器学习模型,所述机器学习模型由权利要求1-3任一项所述的方法构建而成;A machine learning model, the machine learning model is constructed by the method described in any one of claims 1-3;
    优选地,所述机器学习模型用于鉴定样本中一种或多种MSCs的组织来源(例如,骨髓,脐带,胎盘或其部分(例如,胎盘羊膜),脂肪,牙髓,毛囊,皮肤,血液,或其任意组合)。Preferably, the machine learning model is used to identify the tissue of origin of one or more MSCs in the sample (e.g., bone marrow, umbilical cord, placenta or part thereof (e.g., placental amniotic membrane), fat, dental pulp, hair follicles, skin, blood , or any combination thereof).
  5. 权利要求4所述的机器学习模型在鉴定样本中一种或多种MSCs的组织来源的用途。Use of the machine learning model of claim 4 in identifying the tissue origin of one or more MSCs in a sample.
  6. 一种鉴定样本中MSCs的组织来源的方法,包括:A method for identifying the tissue origin of MSCs in a sample, including:
    步骤(a):提供所述样本中MSCs的目标特征向量的表达量,所述目标特征向量包含下述基因或者包含由下述基因的转录产物:ACVRL1,ARMC9,BCHE,CD55,EBP,FN1,FST,HOTAIRM1,LIMK2,MECOM,METTL26,MSX1,NBPF3,NECTIN3,NRXN2,PDE5A,RIN3,RPA2,RSL24D1,TSSC2和ZIC1;Step (a): Provide the expression level of the target feature vector of MSCs in the sample, and the target feature vector includes the following genes or the transcript products of the following genes: ACVRL1, ARMC9, BCHE, CD55, EBP, FN1, FST, HOTAIRM1, LIMK2, MECOM, METTL26, MSX1, NBPF3, NECTIN3, NRXN2, PDE5A, RIN3, RPA2, RSL24D1, TSSC2 and ZIC1;
    步骤(b):将所述目标特征向量的表达量输入权利要求4所构建的机器学习模型,以鉴定样本中MSCs的组织来源;Step (b): Input the expression amount of the target feature vector into the machine learning model constructed in claim 4 to identify the tissue source of the MSCs in the sample;
    优选地,步骤(a)中,所述表达量为TPM值;Preferably, in step (a), the expression level is the TPM value;
    优选地,所述TPM值通过转录组测序获得。Preferably, the TPM value is obtained by transcriptome sequencing.
  7. 权利要求6所述的方法,其中,所述样本中含有一种或多种MSCs;The method of claim 6, wherein the sample contains one or more MSCs;
    优选地,在上述步骤(a)中,通过对所述样本中的MSCs进行转录组测序,以获得所述样本中MSCs的目标特征向量的表达量;或者,在上述步骤(a)中,通过对所述样本中MSCs的目标特征向量进行表达谱芯片检测、单细胞转录组测序、RT-qPCR测定、数字PCR测定,以获得所述样本中MSCs的目标特征向量的表达量;Preferably, in the above step (a), by performing transcriptome sequencing on the MSCs in the sample, the expression level of the target feature vector of the MSCs in the sample is obtained; or, in the above step (a), by Perform expression profiling chip detection, single cell transcriptome sequencing, RT-qPCR measurement, and digital PCR measurement on the target feature vectors of MSCs in the sample to obtain the expression level of the target feature vector of MSCs in the sample;
    优选地,所述MSCs的组织来源选自骨髓,脐带,胎盘或其部分(例如,胎盘羊膜),脂肪,牙髓,毛囊,胎盘中其他部位的组织,皮肤,血液,或其任意组合;Preferably, the tissue source of the MSCs is selected from bone marrow, umbilical cord, placenta or parts thereof (for example, placental amniotic membrane), fat, dental pulp, hair follicles, tissues from other parts of the placenta, skin, blood, or any combination thereof;
    优选地,所述MSCs是来源于哺乳动物(例如,小鼠,人)的MSCs;Preferably, the MSCs are MSCs derived from mammals (e.g., mice, humans);
    优选地,所述MSCs是来源于人的MSC(hMSCs)。Preferably, the MSCs are human-derived MSCs (hMSCs).
  8. 一种鉴定间充质干细胞组织来源的装置,包括:A device for identifying the tissue origin of mesenchymal stem cells, comprising:
    存储器,被配置为存储指令;memory configured to store instructions;
    处理器,耦合到存储器,处理器被配置为基于存储器存储的指令执行实现如权利要求6或7所述的方法。A processor, coupled to the memory, configured to execute the method according to claim 6 or 7 based on instructions stored in the memory.
  9. 一种计算机可读存储介质,其中,计算机可读存储介质存储有计算机指令,指令被处理器执行时实现如权利要求6或7所述的方法。A computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, and when the instructions are executed by a processor, the method according to claim 6 or 7 is implemented.
  10. 一种用于鉴定样本中一种或多种MSCs的组织来源的试剂盒,所述试剂盒包含用于确定样品中生物标志物水平的试剂,所述生物标志物包含ACVRL1,ARMC9,BCHE,CD55,EBP,FN1,FST,HOTAIRM1,LIMK2,MECOM,METTL26,MSX1,NBPF3,NECTIN3,NRXN2,PDE5A,RIN3,RPA2,RSL24D1,TSSC2和ZIC1;A kit for identifying the tissue origin of one or more MSCs in a sample, the kit comprising reagents for determining the levels of biomarkers in the sample, the biomarkers comprising ACVRL1, ARMC9, BCHE, CD55 , EBP, FN1, FST, HOTAIRM1, LIMK2, MECOM, METTL26, MSX1, NBPF3, NECTIN3, NRXN2, PDE5A, RIN3, RPA2, RSL24D1, TSSC2 and ZIC1;
    优选地,所述生物标志物的水平是所述生物标志物的蛋白质或mRNA水平;Preferably, the level of said biomarker is the protein or mRNA level of said biomarker;
    优选地,所述MSCs是来源于哺乳动物(例如,小鼠,人)的MSCs;Preferably, the MSCs are MSCs derived from mammals (e.g., mice, humans);
    优选地,所述MSCs是来源于人的MSC(hMSCs)。Preferably, the MSCs are human-derived MSCs (hMSCs).
  11. 用于确定样品中生物标志物水平的试剂在制备试剂盒中的用途,所述试剂盒用于鉴定样本中一种或多种MSCs的组织来源;其中,所述生物标志物包含ACVRL1,ARMC9,BCHE,CD55,EBP,FN1,FST,HOTAIRM1,LIMK2,MECOM,METTL26,MSX1,NBPF3,NECTIN3,NRXN2,PDE5A,RIN3,RPA2,RSL24D1,TSSC2和ZIC1;The use of reagents for determining the level of biomarkers in a sample in the preparation of a kit for identifying the tissue origin of one or more MSCs in the sample; wherein the biomarkers include ACVRL1, ARMC9, BCHE, CD55, EBP, FN1, FST, HOTAIRM1, LIMK2, MECOM, METTL26, MSX1, NBPF3, NECTIN3, NRXN2, PDE5A, RIN3, RPA2, RSL24D1, TSSC2 and ZIC1;
    优选地,所述生物标志物的水平是所述生物标志物的蛋白质或mRNA水平;Preferably, the level of said biomarker is the protein or mRNA level of said biomarker;
    优选地,所述MSCs是来源于哺乳动物(例如,小鼠,人)的MSCs;Preferably, the MSCs are MSCs derived from mammals (e.g., mice, humans);
    优选地,所述MSCs是来源于人的MSC(hMSCs)。Preferably, the MSCs are human-derived MSCs (hMSCs).
PCT/CN2022/110507 2022-06-22 2022-08-05 Method for identifying tissue sources of mesenchymal stem cells in sample and use thereof WO2023245827A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210710572.3 2022-06-22
CN202210710572.3A CN115565608A (en) 2022-06-22 2022-06-22 Method for identifying tissue source of mesenchymal stem cells in sample and application thereof

Publications (1)

Publication Number Publication Date
WO2023245827A1 true WO2023245827A1 (en) 2023-12-28

Family

ID=84737399

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/110507 WO2023245827A1 (en) 2022-06-22 2022-08-05 Method for identifying tissue sources of mesenchymal stem cells in sample and use thereof

Country Status (2)

Country Link
CN (1) CN115565608A (en)
WO (1) WO2023245827A1 (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1886658A (en) * 2003-09-29 2006-12-27 帕斯沃克斯资讯有限公司 Systems and methods for detecting biological features
CN103459592A (en) * 2010-12-09 2013-12-18 银丰生物工程技术有限公司 Sub-totipotent stem cell product and apparent hereditary modifying label thereof
US20170258843A1 (en) * 2016-03-14 2017-09-14 AngioStem, Inc. Stem cell mediated neuroregeneration and neuroprotection
CN107513571A (en) * 2017-09-30 2017-12-26 首都医科大学附属北京口腔医院 MiRNA application
WO2018013703A1 (en) * 2016-07-12 2018-01-18 Mindshare Medical, Inc. Medical analytics system
CN110402146A (en) * 2016-11-03 2019-11-01 埃克森蒂姆生物技术公司 Mescenchymal stem cell group, its product and application thereof
CN113196404A (en) * 2018-12-19 2021-07-30 格瑞尔公司 Cancer tissue origin prediction using multi-tier analysis of small variations in cell-free DNA samples
CN113286883A (en) * 2018-12-18 2021-08-20 格里尔公司 Methods for detecting disease using RNA analysis
CN113826167A (en) * 2019-05-13 2021-12-21 格瑞尔公司 Model-based characterization and classification

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1886658A (en) * 2003-09-29 2006-12-27 帕斯沃克斯资讯有限公司 Systems and methods for detecting biological features
CN103459592A (en) * 2010-12-09 2013-12-18 银丰生物工程技术有限公司 Sub-totipotent stem cell product and apparent hereditary modifying label thereof
US20170258843A1 (en) * 2016-03-14 2017-09-14 AngioStem, Inc. Stem cell mediated neuroregeneration and neuroprotection
WO2018013703A1 (en) * 2016-07-12 2018-01-18 Mindshare Medical, Inc. Medical analytics system
CN110402146A (en) * 2016-11-03 2019-11-01 埃克森蒂姆生物技术公司 Mescenchymal stem cell group, its product and application thereof
CN107513571A (en) * 2017-09-30 2017-12-26 首都医科大学附属北京口腔医院 MiRNA application
CN113286883A (en) * 2018-12-18 2021-08-20 格里尔公司 Methods for detecting disease using RNA analysis
CN113196404A (en) * 2018-12-19 2021-07-30 格瑞尔公司 Cancer tissue origin prediction using multi-tier analysis of small variations in cell-free DNA samples
CN113826167A (en) * 2019-05-13 2021-12-21 格瑞尔公司 Model-based characterization and classification

Also Published As

Publication number Publication date
CN115565608A (en) 2023-01-03

Similar Documents

Publication Publication Date Title
Farlik et al. DNA methylation dynamics of human hematopoietic stem cell differentiation
Landt et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia
CN104704364B (en) For the prediction of pre-eclampsia and/or HELLP syndrome or the biomarker test of early detection
CN111073962A (en) Rapid aneuploidy detection
EP3444357B1 (en) Noninvasive detection method for screening healthily-growing blastulas
CN109072479A (en) Spontaneous pre-term risk is layered using circulation particle
JP2015506684A (en) Method, system, and computer-readable storage medium for determining presence / absence of genome copy number variation
Lin et al. Cell type-specific DNA methylation in neonatal cord tissue and cord blood: a 850K-reference panel and comparison of cell types
Loyfer et al. A human DNA methylation atlas reveals principles of cell type-specific methylation and identifies thousands of cell type-specific regulatory elements
KR101739535B1 (en) Method for detecting aneuploidy of fetus
CN115537462A (en) Sequencing method for simultaneously detecting pathogenic bacteria and host gene expression quantity and application of sequencing method in diagnosis and prognosis of bacterial meningitis
CN113362893A (en) Construction method and application of tumor screening model
Menon et al. Microarray studies in pulmonary arterial hypertension
TW201105965A (en) Method for determining the cardio-generative potential of mammalian cells
WO2023245827A1 (en) Method for identifying tissue sources of mesenchymal stem cells in sample and use thereof
CN103911439A (en) Analyzing method and application of differential expression gene of systemic lupus erythematosus hydroxymethylation status
Hajkarim et al. Single cell RNA-sequencing for the study of atherosclerosis
US20230066188A1 (en) Biomarker identifying method and cell producing method
CN115011695A (en) Multiple cancer species identification marker based on free circular DNA gene, kit and application
Wolmarans et al. Single-cell transcriptome analysis of human adipose-derived stromal cells identifies a contractile cell subpopulation
Sun et al. The maternal-fetal interface of successful pregnancies and impact of fetal sex using single cell sequencing
US20160209427A1 (en) Biomarkers for lower urinary tract symptoms (luts)
Chen et al. Identification of disturbed pathways in heart failure based on Gibbs sampling and pathway enrichment analysis
Hernandez-Lopez et al. Lossy compression of quality scores in differential gene expression: A first assessment and impact analysis
Ordovas-Montanes et al. Reduced cellular diversity and an altered basal progenitor cell state inform epithelial barrier dysfunction in human type 2 immunity

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22947568

Country of ref document: EP

Kind code of ref document: A1