WO2020154885A1 - 单细胞类型检测方法、装置、设备和存储介质 - Google Patents

单细胞类型检测方法、装置、设备和存储介质 Download PDF

Info

Publication number
WO2020154885A1
WO2020154885A1 PCT/CN2019/073647 CN2019073647W WO2020154885A1 WO 2020154885 A1 WO2020154885 A1 WO 2020154885A1 CN 2019073647 W CN2019073647 W CN 2019073647W WO 2020154885 A1 WO2020154885 A1 WO 2020154885A1
Authority
WO
WIPO (PCT)
Prior art keywords
expression
entropy
data set
cell
data
Prior art date
Application number
PCT/CN2019/073647
Other languages
English (en)
French (fr)
Inventor
李辰威
刘宝琳
康博熙
刘烨丹
任仙文
张泽民
Original Assignee
北京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京大学 filed Critical 北京大学
Priority to PCT/CN2019/073647 priority Critical patent/WO2020154885A1/zh
Priority to CN201980000101.XA priority patent/CN109891508B/zh
Publication of WO2020154885A1 publication Critical patent/WO2020154885A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • the embodiments of the present invention relate to the field of single-cell transcriptome sequencing data analysis, and in particular to a single-cell type detection method, device, equipment and storage medium.
  • the interpretability of the selected genes is poor, and it is necessary to use biological knowledge to obtain results based on algorithms
  • the marker gene annotates the taxa.
  • the above existing analysis methods all put high requirements on the user's biological background and computing hardware.
  • the invention provides a single cell type detection method, device, equipment and storage medium, which improves the analysis efficiency and accuracy of single cell expression data, and realizes rapid and accurate detection of cell types.
  • an embodiment of the present invention provides a single cell type detection method, including:
  • the reference data is input into the expression entropy model to determine the information genes contained in each type of cell in the reference data; the reference data includes the expression profile data set of M genes in N single cells; the expression entropy model is trained The reference data is obtained;
  • the cell type of the single cell to be tested is determined according to the occurrence probability and the expression level.
  • the method before inputting the reference data into the expression entropy model to determine the information genes contained in each type of cell in the reference data, the method further includes:
  • the expression entropy is the degree of dispersion of messenger ribonucleic acid expression
  • the inputting the reference data into the expression entropy model to determine the information genes contained in each type of cell in the reference data includes:
  • the training the expression entropy model according to the first expression entropy data set to complete the construction of the expression entropy model includes:
  • the expression entropy model is constructed according to the adjusted reference coefficients.
  • the method further includes:
  • the performing gene screening according to the first expression entropy data set and the second expression entropy data set to determine the information genes contained in each type of cell in the reference data includes:
  • X difference values are selected from the difference value set, and genes corresponding to the X difference values are used as information genes contained in each type of cell in the reference data.
  • an embodiment of the present invention also provides a single cell type detection device, including:
  • the information gene determination module is used to input reference data into the expression entropy model to determine the information genes contained in each type of cell in the reference data;
  • the reference data includes the expression profile data set of M genes in N single cells;
  • the expression entropy model is trained and generated according to the reference data;
  • a probability calculation module which is used to calculate the occurrence probability of the information gene in each type of cell
  • the cell type determination module is configured to determine the cell type of the single cell to be tested according to the occurrence probability and the expression amount when the expression level corresponding to the information gene obtained by detecting the single cell to be tested is received.
  • the device further includes:
  • the expression entropy calculation module is configured to perform expression entropy calculation according to the gene expression amount data set to generate a first expression entropy data set; the expression entropy is the degree of dispersion of gene expression of messenger ribonucleic acid;
  • the model construction module is configured to train the expression entropy model according to the first expression entropy data set, and complete the construction of the expression entropy model.
  • an embodiment of the present invention also provides a device, the device including:
  • One or more processors are One or more processors;
  • Storage device for storing one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the single cell type detection method provided in the first aspect.
  • an embodiment of the present invention also provides a storage medium, the storage medium includes a stored computer program, wherein, when the computer program is running, the device where the storage medium is located is controlled to execute the order described in the first aspect. Cell type detection method.
  • reference data is input into an expression entropy model to determine the information genes contained in each type of cell in the reference data; the expression entropy model passes The reference data is obtained by training; the occurrence probability of the information gene in each type of cell is calculated; when the corresponding expression amount of the information gene obtained by the detection of the single cell to be tested is received, according to the occurrence probability And the expression amount to determine the cell type of the single cell to be tested.
  • Figure 1 is a flow chart of a single cell analysis method in the prior art
  • FIG. 2 is a schematic flowchart of a first embodiment of a single cell type detection method according to an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a second embodiment of a single cell type detection method according to an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of a third embodiment of a single cell type detection method according to an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of a fourth embodiment of a single cell type detection method according to an embodiment of the present invention.
  • FIG. 6 is a schematic flowchart of a fifth embodiment of a single cell type detection method according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a single cell type detection device according to an embodiment of the present invention.
  • Fig. 8 is a schematic structural diagram of a device according to an embodiment of the present invention.
  • the present invention provides a single cell type detection method. By constructing and using an expression entropy model, the analysis efficiency and accuracy of single cell expression data are improved, and rapid Accurately detect cell types.
  • FIG. 2 it is a schematic flowchart of the first embodiment of the single cell type detection method according to the embodiment of the present invention.
  • This embodiment is applicable to single-cell transcriptome sequencing data analysis, and the method can be executed by a processor.
  • the single cell type detection method provided by the embodiment of the present invention further includes the construction process of the expression entropy model.
  • the construction process of the expression entropy model includes:
  • the reference data includes an expression profile data set of M genes in N single cells; the expression entropy model is obtained by training the reference data.
  • the reference data is data generated by a large number of different sequencing platforms (Smart-seq2, 10X genomics, etc.), including 26 published single cell expression profile data sets. Due to the inconsistent measurement standards used in data from different platforms, the expression profile data set needs to be standardized, so that the expression profile data set uniformly uses TPM (Transcripts Per Million) as a measure of gene expression to obtain gene expression Quantity data set.
  • TPM Transcripts Per Million
  • the expression entropy describes the discrete degree of mRNA (messenger ribonucleic acid) expression.
  • the expression of each gene is divided into a bin every 120TPM, so that the expression of each gene in the gene expression data set is divided into different bins, and it is considered that the cells corresponding to the genes in the same bin are in The gene has the same expression level.
  • the calculation method of expression entropy is:
  • S is the expression entropy
  • b k is the number of cells in the Kth bin.
  • the first expression entropy data set is generated according to the number of cells contained in each bin after the gene expression data set is divided into the expression entropy calculation formula for calculation.
  • the construction of the expression entropy model is completed by training the first expression data set.
  • the process of training the first expression data set and constructing the expression entropy model includes:
  • the average gene expression E m of the M genes in the reference data is calculated according to the total expression of the M genes in the gene expression data set.
  • S320 Perform regression analysis on the first expression entropy data set and the average gene expression, and adjust the reference coefficient of the expression entropy model;
  • FIG. 4 is a schematic flowchart of a third embodiment of a single cell type detection method according to an embodiment of the present invention. This embodiment is suitable for single-cell transcriptome sequencing data analysis. Further, after the expression entropy model is constructed, the process of single-cell type detection through the expression entropy specifically includes the following steps:
  • the reference data is input into the expression entropy model to achieve more biologically meaningful gene screening.
  • the process of inputting reference data into the expression entropy model, and determining the information genes contained in each type of cell in the reference data is:
  • E mi is the average expression level of information gene i in the j-th cell.
  • S430 When receiving the expression level corresponding to the information gene obtained by detecting the single cell to be tested, determine the cell type of the single cell to be tested according to the occurrence probability and the expression level.
  • the probability that the single cell to be tested belongs to each type of cell type is calculated according to the expression level and the occurrence probability of the information gene in each cell type; Among them, the formula for calculating the probability that the single cell to be tested belongs to each type of cell type is:
  • E i is the expression level corresponding to the information gene of the single cell to be tested (log2[TPM+1]).
  • the probability set that the single cell to be tested belongs to each type of cell type is calculated, the cell type corresponding to the highest probability value (ie, the highest P j ) in the probability set is the cell type of the single cell to be tested.
  • Determine the information gene contained in each cell type in the reference data by inputting reference data to the expression entropy model, and calculate the probability of the information gene in each cell type, and finally calculate the received single cell to be tested belongs to each type of cell
  • the probability of the type determine the cell type of the single cell to be tested, and realize the rapid definition of the single cell to be tested into the existing cell types. There is no need to perform the tedious existing single cell analysis process, and the type of each cell is directly given. It greatly saves time and resources for single-cell data analysis.
  • the single cell type detection method inputs reference data into an expression entropy model to determine the information genes contained in each type of cell in the reference data; the expression entropy model is trained The reference data is obtained; the occurrence probability of the information gene in each type of cell is calculated; when the corresponding expression amount of the information gene obtained by the detection of the single cell to be tested is received, according to the occurrence probability and the The expression level determines the cell type of the single cell to be tested.
  • Fig. 5 is a schematic flowchart of a fourth embodiment of a single cell type detection method according to an embodiment of the present invention.
  • this embodiment adds a screening method of inputting reference data into an expression entropy model to achieve gene screening.
  • the present invention performs unsupervised gene screening based on the expression entropy model, and the specific steps include:
  • the first expression entropy data set is calculated based on the number of cells contained in each bin after the gene expression data set is divided into the expression entropy calculation formula to generate the first expression entropy data set; the second expression entropy data set To input the reference data into the second expression entropy data set corresponding to the M genes generated in the expression entropy model. Obtain the first expression entropy data and the second expression entropy data corresponding to each of the M genes.
  • the first expression entropy data and the second expression entropy data of each gene are calculated by the above formula to obtain the difference set of M genes.
  • S530 Select X difference values from the difference value set according to the selection rule, and use genes corresponding to the X difference values as information genes contained in each type of cell in the reference data.
  • the user can select the first X differences with the largest d s from the difference set according to requirements, and use the genes corresponding to these X differences as the information genes contained in each type of cell in the reference data.
  • the present invention performs supervised gene screening E-test based on the expression entropy model.
  • the specific steps include: using entropy subtraction as a statistic to perform supervised gene selection.
  • the entropy reduction of each gene is defined as:
  • E m1 represents the average expression of gene i in T1 cells
  • E m2 represents the average expression of gene i in T2 cells. Therefore, for a more appropriate cell type, the entropy reduction of each gene is defined as:
  • the average expression data set of multiple cell types contained in each gene in the reference data is calculated by the above formula to obtain the difference set of M genes; the user can select the top X with the largest d s from the difference set according to the needs
  • the genes corresponding to these X differences are used as the information genes contained in each type of cell in the reference data.
  • Fig. 6 is a schematic flowchart of a fifth embodiment of a single cell type detection method according to an embodiment of the present invention.
  • this embodiment adds an application scenario of unsupervised gene screening.
  • the present invention performs unsupervised gene screening based on the expression entropy model to determine the purity of a type of cell, and the specific steps include:
  • S610 When receiving genetic data obtained by detecting a single cell to be tested, input the genetic data into the expression entropy model to obtain a virtual expression entropy data set;
  • S620 Perform expression entropy calculation according to the gene data to generate an actual expression entropy data set
  • S630 Perform calculation according to the virtual expression entropy data set and the actual expression entropy data set to determine the purity of the cell to be tested.
  • the average expression of genes in the gene data is input into the expression entropy model to obtain a virtual expression entropy data set, that is, the expression entropy S′ i ; the expression is performed according to the gene data
  • Entropy calculation obtains the actual expression entropy data set, that is, the standardized expression entropy S i of genes.
  • the calculation formula for determining the cell purity is:
  • S i is the result of the expression normalized entropy
  • S 'i by the average expression level of gene expression obtained into an entropy formula.
  • FIG. 7 it is a schematic structural diagram of a single cell type detection device according to an embodiment of the present invention.
  • the present invention also provides a single cell type detection device, which can be adapted to perform the single cell type detection method of any one of the first to third embodiments, and the device includes:
  • the information gene determination module 701 is configured to input reference data into an expression entropy model to determine the information genes contained in each type of cell in the reference data;
  • the reference data includes an expression profile data set of M genes in N single cells;
  • the expression entropy model is trained and generated according to the reference data;
  • the probability calculation module 702 is configured to calculate the occurrence probability of the information gene in each type of cell
  • the cell type determination module 703 is configured to determine the cell type of the single cell to be tested according to the occurrence probability and the expression amount when the expression level corresponding to the information gene obtained by detecting the single cell to be tested is received.
  • the device further includes:
  • the data standardization module 704 is used to standardize the reference data to obtain a gene expression data set
  • the expression entropy calculation module 705 is configured to perform expression entropy calculation according to the gene expression amount data set to generate a first expression entropy data set; the expression entropy is the degree of dispersion of gene expression of messenger ribonucleic acid;
  • the model construction module 706 is configured to train the expression entropy model according to the first expression entropy data set, and complete the construction of the expression entropy model.
  • the single cell type detection device inputs reference data into an expression entropy model to determine the information genes contained in each type of cell in the reference data; the expression entropy model is trained The reference data is obtained; the occurrence probability of the information gene in each type of cell is calculated; when the corresponding expression amount of the information gene obtained by the detection of the single cell to be tested is received, according to the occurrence probability and the The expression level determines the cell type of the single cell to be tested.
  • An embodiment of the present invention also provides a device, which includes:
  • One or more processors are One or more processors;
  • Storage device for storing one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the single cell type detection method in any one of the first to third embodiments.
  • FIG. 8 it is a schematic structural diagram of a device provided by Embodiment 5 of the present invention.
  • the device includes a processor 801 and a storage device 802; the number of processors 801 in the device may be one or more, as shown in FIG.
  • a processor 801 is taken as an example; the processor 801 and the storage device 802 in the device may be connected by a bus or other methods. In FIG. 8, a bus connection is taken as an example.
  • the storage device 802 can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the command processing method in the embodiment of the present invention (for example, the information gene determination module 701, Probability calculation module 702, cell type determination module 703, data standardization module 704, expression entropy calculation module 705, and model construction module 706).
  • the processor 801 executes various functional applications and data processing in the device by running software programs, instructions, and modules stored in the storage device 802, that is, realizing the above-mentioned command processing method.
  • the embodiment of the present invention also provides a storage medium, the storage medium includes a stored computer program, wherein, when the computer program is running, the device where the storage medium is located is controlled to execute any one of Embodiment 1 to Embodiment 3 Single cell type detection method.
  • processor-executable instruction storage medium provided by the embodiment of the present invention is not limited to the method operations described above, and can also perform the single-cell type detection provided by any embodiment of the present invention. Related operations in the method.
  • the single-cell type detection method, device, device, and storage medium input reference data into an expression entropy model to determine the information genes contained in each type of cell in the reference data;
  • the expression entropy model is obtained by training the reference data; calculating the occurrence probability of the information gene in each type of cell; when receiving the expression amount corresponding to the information gene obtained by detecting the single cell to be tested, The cell type of the single cell to be tested is determined according to the occurrence probability and the expression amount.
  • the present invention can be implemented by software and necessary general-purpose hardware. Of course, it can also be implemented by hardware, but in many cases the former is a better implementation. .
  • the technical solution of the present invention can be embodied in the form of a software product, which can be stored in a computer readable storage medium, such as a computer floppy disk. , Read-Only Memory (ROM), Random Access Memory (RAM), Flash memory (FLASH), hard disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer) , A server, or a network device, etc.) execute the method described in each embodiment of the present invention.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • FLASH Flash memory
  • the included units and modules are only divided according to functional logic, but are not limited to the above division, as long as the corresponding function can be realized;
  • the specific names of each functional unit are only used to distinguish each other, and are not used to limit the protection scope of the present invention.

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

一种单细胞类型检测方法、装置、设备和存储介质,方法包括:将参考数据输入表达熵模型,确定参考数据中每一类细胞包含的信息基因;参考数据包括N个单细胞中M个基因的表达谱数据集;表达熵模型通过训练参考数据得到;计算信息基因在每一类细胞中的出现概率;当接收到对待测单细胞进行检测获得的信息基因对应的表达量时,根据出现概率和表达量确定待测单细胞的细胞类型。

Description

单细胞类型检测方法、装置、设备和存储介质 技术领域
本发明实施例涉及单细胞转录组测序数据分析领域,具体涉及一种单细胞类型检测方法、装置、设备和存储介质。
背景技术
在过去几年里,单细胞捕获技术有了明显的提高,科学家利用现有的技术可以捕获数十万甚至数百万的细胞。随之产生的巨大信息量给生物信息学分析带来了极大的机遇和挑战,其中对单细胞表达数据聚类是得到生物学结论至关重要的基础分析方法。如图1所示,为现有技术中单细胞分析方法流程图。现有的聚类方法在识别highly variably genes(高变异基因)时采用gini-index(基尼系数),dropout rates(流失率),以及方差等方法,对基因表达量的描述有着严重的偏差;而在类群的marker gene(标记基因)选择上使用(主成分分析)pca score(seurat)和神经网络(scQuery)等方法,对所选基因的可解释性差,且需要运用生物学知识根据算法得到的marker gene(标记基因)对类群进行注释。最近,也有细胞分类算法产生(Seurat3,scmap等),但其没有进行很好的假阳性控制且对细胞的分类从训练到预测需要大量的时间和内存。以上现有分析方法都对使用者的生物学背景和计算硬件提出了很高的要求。
随着单细胞转录组测序技术的不断发展,海量不同测序平台(Smart-seq2,10X genomics等)产生的数据之间如何进行整合;在可利用资源和时间受限的情况下如何准确快速的分析更多单细胞数据是现阶段急需解决的问题。
发明内容
本发明提供一种单细胞类型检测方法、装置、设备和存储介质,提升单细胞表达数据的分析效率和准确度,实现迅速准确检测细胞类型。
第一方面,本发明实施例提供了一种单细胞类型检测方法,包括:
将参考数据输入表达熵模型,确定所述参考数据中每一类细胞包含的信息基因;所述参考数据包括N个单细胞中M个基因的表达谱数据集;所述表达熵模型通过训练所述参考数据得到;
计算所述信息基因在所述每一类细胞中的出现概率;
当接收到对待测单细胞进行检测获得的所述信息基因对应的表达量时,根据所述出现概率和所述表达量确定所述待测单细胞的细胞类型。
进一步地,在将参考数据输入表达熵模型,确定所述参考数据中每一类细胞包含的信息基因之前,还包括:
将所述表达谱数据集标准化得到基因表达量数据集;
根据所述基因表达量数据集进行表达熵计算,生成第一表达熵数据集;所述表达熵为信使核糖核酸表达的离散程度;
根据所述第一表达熵数据集对所述表达熵模型进行训练,完成所述表达熵模型的构建。
进一步地,所述将参考数据输入表达熵模型,确定所述参考数据中每一类细胞包含的信息基因,包括:
将所述参考数据输入所述表达熵模型中,生成所述M个基因对应的第二表达熵数据集;
根据所述第一表达熵数据集和所述第二表达熵数据集进行基因筛选,确定 所述参考数据中每一类细胞包含的信息基因。
进一步地,所述根据所述第一表达熵数据集对所述表达熵模型进行训练,完成所述表达熵模型的构建,包括:
根据所述基因表达量数据集获得所述M个基因的平均基因表达量;
对所述第一表达熵数据集和所述平均基因表达量进行回归分析,调整所述表达熵模型的参考系数;
根据调整后的参考系数构建所述表达熵模型。
进一步地,所述方法还包括:
当接收到对待测单细胞进行检测获得的基因数据时,将所述基因数据输入所述表达熵模型得到虚拟表达熵数据集;
根据所述基因数据进行表达熵计算,生成实际表达熵数据集;
根据所述虚拟表达熵数据集和所述实际表达熵数据集进行计算,确定所述待测细胞的纯度。
进一步地,所述根据所述第一表达熵数据集和所述第二表达熵数据集进行基因筛选,确定所述参考数据中每一类细胞包含的信息基因,包括:
根据所述第一表达熵数据集和所述第二表达熵数据集,获取每一所述基因对应的第一表达熵数据和第二表达熵数据;
计算每一所述基因对应的第二表达熵数据与第一表达熵数据的差值,获得所述M个基因的差值集合;
按照选取规则从所述差值集合中选出X个差值,将所述X个差值对应的基因作为所述参考数据中每一类细胞包含的信息基因。
在第二方面,本发明实施例还提供一种单细胞类型检测装置,包括:
信息基因确定模块,用于将参考数据输入表达熵模型,确定所述参考数据中每一类细胞包含的信息基因;所述参考数据包括N个单细胞中M个基因的表达谱数据集;所述表达熵模型根据所述参考数据训练生成;
概率计算模块,用于计算所述信息基因在所述每一类细胞中的出现概率;
细胞类型确定模块,用于当接收到对待测单细胞进行检测获得的所述信息基因对应的表达量时,根据所述出现概率和所述表达量确定所述待测单细胞的细胞类型。
进一步地,所述装置还包括:
数据标准化模块,用于将所述参考数据标准化得到基因表达量数据集;
表达熵计算模块,用于根据所述基因表达量数据集进行表达熵计算,生成第一表达熵数据集;所述表达熵为信使核糖核酸的基因表达的离散程度;
模型构建模块,用于根据所述第一表达熵数据集对所述表达熵模型进行训练,完成所述表达熵模型的构建。
在第三方面,本发明实施例还提供一种设备,所述设备包括:
一个或多个处理器;
存储装置,用于存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现第一方面提供的单细胞类型检测方法。
在第四方面,本发明实施例还提供一种存储介质,所述存储介质包括存储的计算机程序,其中,在所述计算机程序运行时控制所述存储介质所在设备执行第一方面所述的单细胞类型检测方法。
本发明实施例提供的一种单细胞类型检测方法、装置、设备和存储介质, 将参考数据输入表达熵模型,确定所述参考数据中每一类细胞包含的信息基因;所述表达熵模型通过训练所述参考数据得到;计算所述信息基因在所述每一类细胞中的出现概率;当接收到对待测单细胞进行检测获得的所述信息基因对应的表达量时,根据所述出现概率和所述表达量确定所述待测单细胞的细胞类型。通过向表达熵模型输入参考数据确定参考数据中每一类细胞类型包含的信息基因,并计算信息基因在每一细胞类型的出现概率,最后通过计算接收到的待测单细胞属于每一类细胞类型的概率,确定待测单细胞的细胞类型,实现将待测单细胞迅速定义到已有的细胞类型中,无需进行繁琐的现有单细胞分析流程,直接给出每个细胞的类型,极大的节省了单细胞数据分析的时间和资源。
附图说明
图1为现有技术中单细胞分析方法流程图;
图2是本发明实施例的单细胞类型检测方法的第一实施例的流程示意图;
图3是本发明实施例的单细胞类型检测方法的第二实施例的流程示意图;
图4是本发明实施例的单细胞类型检测方法的第三实施例的流程示意图;
图5是本发明实施例的单细胞类型检测方法的第四实施例的流程示意图;
图6是本发明实施例的单细胞类型检测方法的第五实施例的流程示意图;
图7是本发明实施例的单细胞类型检测装置的结构示意图;
图8是本发明实施例的一种设备的结构示意图。
具体实施方式
下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是,此 处所描述的具体实施例仅仅用于解释本发明,而非对本发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部结构。
由于现有技术中单细胞分析的方法采用gini-index(基尼系数),dropout rates(流失率),以及方差等方法,对基因表达量的描述有着严重的偏差;而在类群的标记基因选择上使用主成分分析和神经网络等方法,对所选基因的可解释性差,并且现有的细胞分类算法从训练到预测需耗费大量的时间和内存。为解决现有单细胞分析技术的效率及准确率低的问题,本发明提供一种单细胞类型检测方法,通过构建并使用表达熵模型,提升单细胞表达数据的分析效率和准确度,实现迅速准确检测细胞类型。
实施例一
如图2所示,是本发明实施例的单细胞类型检测方法的第一实施例的流程示意图。本实施例可适用于单细胞转录组测序数据分析,该方法可以由处理器来执行。
需要说明的是,在将参考数据输入表达熵模型以进行单细胞类型检测之前,本发明实施例提供的一种单细胞类型检测方法还包括该表达熵模型的构建过程。
在本发明实施例中,表达熵模型的构建过程包括:
S210、将表达谱数据集标准化得到基因表达量数据集;
具体地,参考数据包括N个单细胞中M个基因的表达谱数据集;所述表达熵模型通过训练所述参考数据得到。在此实施例中参考数据为海量不同测序平台(Smart-seq2,10X genomics等)产生的数据,包括26个已发表的单细胞的表达谱数据集。由于来自不同平台的数据所采用的衡量标准不一致,需对该 表达谱数据集进行标准化,使得该表达谱数据集统一使用TPM(Transcripts Per Million)这一基因表达量的衡量指标,从而获得基因表达量数据集。
S220、根据所述基因表达量数据集进行表达熵计算,生成第一表达熵数据集;所述表达熵为信使核糖核酸表达的离散程度;
具体地,表达熵描述的是mRNA(信使核糖核酸)表达的离散程度。将由M个基因和N个单细胞组成的基因表达量数据集进行下游分析,即将该基因表达量数据集中每个基因的表达划分为一组向量:
Figure PCTCN2019073647-appb-000001
通过计算将每个基因中的表达量每间隔120TPM划分为一个bin,从而将基因表达数据集中每个基因的表达分到不同的bin中,并且认为划分在同一个bin中的基因对应的细胞在该基因上具有相同的表达水平。根据对基因表达量数据集的划分得到表达熵的计算方法为:
Figure PCTCN2019073647-appb-000002
其中,S为表达熵;b k为在第K个bin中的细胞数。
根据基因表达量数据集划分后的每一个bin所包含的细胞数代入表达熵的计算算式进行计算后生成第一表达熵数据集。
S230、根据所述第一表达熵数据集对所述表达熵模型进行训练,完成所述表达熵模型的构建。
具体地,当获得第一表达熵数据集后,通过对该第一表达数据集进行训练完成表达熵模型的构建。
如图3所示,是本发明实施例的单细胞类型检测方法的第二实施例的流程 示意图。进一步地,在本发明实施例的一个实施示例中,训练第一表达数据集并构建表达熵模型的过程,包括:
S310、根据所述基因表达量数据集获得所述M个基因的平均基因表达量;
具体地,根据基因表达量数据集中的M个基因的表达总量计算获得参考数据中M个基因的平均基因表达量E m
S320、对所述第一表达熵数据集和所述平均基因表达量进行回归分析,调整所述表达熵模型的参考系数;
具体地,对第一表达熵数据集和平均基因表达量进行回归分析,得到关系式:S(E m)=a·ln(b·E m+1);其中,S为表达熵;E m为平均基因表达量;将第一表达熵数据集和平均基因表达量输入上述关系式,调整上述关系式的参考系数a和b。通过对代入第一表达熵数据集计算该关系式获得的a和b的值取平均值,获得调整后的表达熵的参考系数:a=0.18;b=0.03。
S330、根据调整后的参考系数构建所述表达熵模型。
具体地,根据调整后的参考系数:a=0.18;b=0.03,得到统一的表达熵模型,该表达熵模型为:
S(E m)=0.18·ln(0.03·E m+1)
从而完成表达熵模型的构建。
图4是本发明实施例的单细胞类型检测方法的第三实施例的流程示意图。本实施例可适用于单细胞转录组测序数据分析,进一步地,在完成表达熵模型构建后,通过该表达熵进行单细胞类型检测的过程,具体包括如下步骤:
S410、将参考数据输入表达熵模型,确定所述参考数据中每一类细胞包含的信息基因;所述参考数据包括N个单细胞中M个基因的表达谱数据集;所述 表达熵模型通过训练所述参考数据得到;
具体地,将参考数据输入表达熵模型实现更具有生物学意义的基因筛选。在本发明实施例的一个实施示例中,将参考数据输入表达熵模型,确定所述参考数据中每一类细胞包含的信息基因的过程为:
将所述参考数据输入所述表达熵模型中,生成所述M个基因对应的第二表达熵数据集;根据所述第一表达熵数据集和所述第二表达熵数据集进行基因筛选,确定参考数据中每一类细胞包含的信息基因(informative genes)。
S420、计算所述信息基因在所述每一类细胞中的出现概率;
需要说明的是,当确定参考数据中每一类细胞包含的信息基因后,对于参考数据中每一个细胞类型,计算信息基因i的出现概率,其出现概率的计算公式为:
Figure PCTCN2019073647-appb-000003
其中,E mi是信息基因i在第j类细胞中的平均表达量。通过计算每一类细胞类型中信息基因i的出现概率,获得每个细胞类型的概率向量。
S430、当接收到对待测单细胞进行检测获得的所述信息基因对应的表达量时,根据所述出现概率和所述表达量确定所述待测单细胞的细胞类型。
具体地,当接收到对待测单细胞进行检测获得的信息基因对应的表达量时,根据表达量和信息基因在每一细胞类型的出现概率计算待测单细胞属于每一类细胞类型的概率;其中,待测单细胞属于每一类细胞类型的概率的计算公式为:
Figure PCTCN2019073647-appb-000004
其中,E i为待测单细胞的信息基因对应的表达量(log2[TPM+1])。当计算获得待测单细胞属于每一类细胞类型的概率集合,在概率集合中概率最高值(即P j最高)所对应的细胞类型即为待测单细胞的细胞类型。通过向表达熵模型输入参考数据确定参考数据中每一类细胞类型包含的信息基因,并计算信息基因在每一细胞类型的出现概率,最后通过计算接收到的待测单细胞属于每一类细胞类型的概率,确定待测单细胞的细胞类型,实现将待测单细胞迅速定义到已有的细胞类型中,无需进行繁琐的现有单细胞分析流程,直接给出每个细胞的类型,极大的节省了单细胞数据分析的时间和资源。
需要说明的是,本发明实施例提供的一种单细胞类型检测方法,将参考数据输入表达熵模型,确定所述参考数据中每一类细胞包含的信息基因;所述表达熵模型通过训练所述参考数据得到;计算所述信息基因在所述每一类细胞中的出现概率;当接收到对待测单细胞进行检测获得的所述信息基因对应的表达量时,根据所述出现概率和所述表达量确定所述待测单细胞的细胞类型。通过向表达熵模型输入参考数据确定参考数据中每一类细胞类型包含的信息基因,并计算信息基因在每一细胞类型的出现概率,最后通过计算接收到的待测单细胞属于每一类细胞类型的概率,确定待测单细胞的细胞类型,实现将待测单细胞迅速定义到已有的细胞类型中,无需进行繁琐的现有单细胞分析流程,直接给出每个细胞的类型,极大的节省了单细胞数据分析的时间和资源。
实施例二
图5是本发明实施例的单细胞类型检测方法的第四实施例的流程示意图。在实施例一的基础上,本实施例增加了将参考数据输入表达熵模型实现基因筛选的筛选方法。在本发明实施例的一个实施示例中,本发明基于表达熵模型进 行无监督基因筛选,具体步骤包括:
S510、根据所述第一表达熵数据集和所述第二表达熵数据集,获取每一所述基因对应的第一表达熵数据和第二表达熵数据;
具体地,第一表达熵数据集为根据基因表达量数据集划分后的每一个bin所包含的细胞数代入表达熵的计算算式进行计算后生成第一表达熵数据集;第二表达熵数据集为将所述参考数据输入所述表达熵模型中生成的所述M个基因对应的第二表达熵数据集。获取M个基因中每一基因对应的第一表达熵数据和第二表达熵数据。
S520、计算每一所述基因对应的第二表达熵数据与第一表达熵数据的差值,获得所述M个基因的差值集合;
具体地,根据每一基因对应的第一表达熵数据和第二表达熵数据,进行差值计算:d s(i)=S′ i-S i;其中,S i为任一基因的第一表达熵数据;S′ i为该基因的第二表达熵数据。每一基因的第一表达熵数据和第二表达熵数据通过上式计算后获得M个基因的差值集合。
S530、按照选取规则从所述差值集合中选出X个差值,将所述X个差值对应的基因作为所述参考数据中每一类细胞包含的信息基因。
具体地,用户可以根据需求从差值集合中选出d s最大的前X个差值,将这X个差值对应的基因作为所述参考数据中每一类细胞包含的信息基因。
在本发明实施例的另一个实施示例中,本发明基于表达熵模型进行有监督基因筛选E-test,具体步骤包括:用熵减作为统计量来进行有监督的基因选择。对于任意两类细胞类型T1和T2,每个基因的熵减定义为:
Figure PCTCN2019073647-appb-000005
其中,E m1表示基因i在T1类细胞中的平均表达,E m2表示基因i在T2类细胞中的平均表达。因此,对于更对的细胞类型来说,每个基因的熵减定义为:
Figure PCTCN2019073647-appb-000006
每一基因在参考数据中所包含的多个细胞类型的平均表达数据集通过上式计算后获得M个基因的差值集合;用户可以根据需求从差值集合中选出d s最大的前X个差值,将这X个差值对应的基因作为所述参考数据中每一类细胞包含的信息基因。
实施例三
图6是本发明实施例的单细胞类型检测方法的第五实施例的流程示意图。在实施例二的基础上,本实施例增加了无监督基因筛选的应用场景。在本发明实施例的一个实施示例中,本发明基于表达熵模型进行无监督基因筛选判断一类细胞的纯度,具体步骤包括:
S610、当接收到对待测单细胞进行检测获得的基因数据时,将所述基因数据输入所述表达熵模型得到虚拟表达熵数据集;
S620、根据所述基因数据进行表达熵计算,生成实际表达熵数据集;
S630、根据所述虚拟表达熵数据集和所述实际表达熵数据集进行计算,确定所述待测细胞的纯度。
具体地,当接收到对待测单细胞进行检测获得的基因数据时,将基因数据中基因的平均表达量输入表达熵模型得到虚拟表达熵数据集,即表达熵S′ i;根据基因数据进行表达熵计算,获得实际表达熵数据集,即基因经过标准化的表 达熵S i。根据表达熵S′ i以及基因经过标准化的表达熵S i进行计算,确定所述待测细胞的纯度,确定细胞纯度的计算公式为:
Figure PCTCN2019073647-appb-000007
其中,S i是经过标准化的表达熵,S′ i是通过将基因的平均表达量带入公式得到的表达熵。通过上述方法确定细胞纯度使以前没有衡量标准的一类细胞的纯度或者异质性得到了很好的定量描述。
实施例四
如图7所示,是本发明实施例的单细胞类型检测装置的结构示意图。本发明还提供一种单细胞类型检测装置,该装置可适用于执行实施例一至三任一种的单细胞类型检测方法,该装置包括:
信息基因确定模块701,用于将参考数据输入表达熵模型,确定所述参考数据中每一类细胞包含的信息基因;所述参考数据包括N个单细胞中M个基因的表达谱数据集;所述表达熵模型根据所述参考数据训练生成;
概率计算模块702,用于计算所述信息基因在所述每一类细胞中的出现概率;
细胞类型确定模块703,用于当接收到对待测单细胞进行检测获得的所述信息基因对应的表达量时,根据所述出现概率和所述表达量确定所述待测单细胞的细胞类型。
进一步地,所述装置还包括:
数据标准化模块704,用于将所述参考数据标准化得到基因表达量数据集;
表达熵计算模块705,用于根据所述基因表达量数据集进行表达熵计算, 生成第一表达熵数据集;所述表达熵为信使核糖核酸的基因表达的离散程度;
模型构建模块706,用于根据所述第一表达熵数据集对所述表达熵模型进行训练,完成所述表达熵模型的构建。
需要说明的是,本发明实施例提供的一种单细胞类型检测装置,将参考数据输入表达熵模型,确定所述参考数据中每一类细胞包含的信息基因;所述表达熵模型通过训练所述参考数据得到;计算所述信息基因在所述每一类细胞中的出现概率;当接收到对待测单细胞进行检测获得的所述信息基因对应的表达量时,根据所述出现概率和所述表达量确定所述待测单细胞的细胞类型。通过向表达熵模型输入参考数据确定参考数据中每一类细胞类型包含的信息基因,并计算信息基因在每一细胞类型的出现概率,最后通过计算接收到的待测单细胞属于每一类细胞类型的概率,确定待测单细胞的细胞类型,实现将待测单细胞迅速定义到已有的细胞类型中,无需进行繁琐的现有单细胞分析流程,直接给出每个细胞的类型,极大的节省了单细胞数据分析的时间和资源。
实施例五
本发明实施例还提供一种设备,所述设备包括:
一个或多个处理器;
存储装置,用于存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现实施例一至实施例三中任一实施例中的单细胞类型检测方法。
如图8所示,为本发明实施例五提供的一种设备的结构示意图,该设备包括处理器801和存储装置802;设备中处理器801的数量可以是一个或多个,图8中以一个处理器801为例;设备中的处理器801和存储装置802可以通过 总线或其他方式连接,图8中以通过总线连接为例。
存储装置802作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块,如本发明实施例中的命令处理方法对应的程序指令/模块(例如,信息基因确定模块701、概率计算模块702、细胞类型确定模块703、数据标准化模块704、表达熵计算模块705和模型构建模块706)。处理器801通过运行存储在存储装置802中的软件程序、指令以及模块,从而执行设备中的各种功能应用以及数据处理,即实现上述的命令处理方法。
实施例六
本发明实施例还提供一种存储介质,所述存储介质包括存储的计算机程序,其中,在所述计算机程序运行时控制所述存储介质所在设备执行实施例一至实施例三中任一实施例中的单细胞类型检测方法。
当然,本发明实施例所提供的一种处理器可执行指令的存储介质,其处理器可执行指令不限于如上所述的方法操作,还可以执行本发明任意实施例所提供的单细胞类型检测方法中的相关操作。
综上所述,本发明实施例提供的一种单细胞类型检测方法、装置、设备和存储介质,将参考数据输入表达熵模型,确定所述参考数据中每一类细胞包含的信息基因;所述表达熵模型通过训练所述参考数据得到;计算所述信息基因在所述每一类细胞中的出现概率;当接收到对待测单细胞进行检测获得的所述信息基因对应的表达量时,根据所述出现概率和所述表达量确定所述待测单细胞的细胞类型。通过向表达熵模型输入参考数据确定参考数据中每一类细胞类型包含的信息基因,并计算信息基因在每一细胞类型的出现概率,最后通过计算接收到的待测单细胞属于每一类细胞类型的概率,确定待测单细胞的细胞类 型,实现将待测单细胞迅速定义到已有的细胞类型中,无需进行繁琐的现有单细胞分析流程,直接给出每个细胞的类型,极大的节省了单细胞数据分析的时间和资源。
通过以上关于实施方式的描述,所属领域的技术人员可以清楚地了解到,本发明可借助软件及必需的通用硬件来实现,当然也可以通过硬件实现,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。
值得注意的是,上述锂电池的充电方法的实施例中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本发明的保护范围。
注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解,本发明不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本发明进行了较为详细的说明,但是本发明不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例,而本发明的范围由所附的权利要求范围决定。

Claims (10)

  1. 一种单细胞类型检测方法,其特征在于,包括:
    将参考数据输入表达熵模型,确定所述参考数据中每一类细胞包含的信息基因;所述参考数据包括N个单细胞中M个基因的表达谱数据集;所述表达熵模型通过训练所述参考数据得到;
    计算所述信息基因在所述每一类细胞中的出现概率;
    当接收到对待测单细胞进行检测获得的所述信息基因对应的表达量时,根据所述出现概率和所述表达量确定所述待测单细胞的细胞类型。
  2. 如权利要求1所述的单细胞类型检测方法,其特征在于,在将参考数据输入表达熵模型,确定所述参考数据中每一类细胞包含的信息基因之前,还包括:
    将所述表达谱数据集标准化得到基因表达量数据集;
    根据所述基因表达量数据集进行表达熵计算,生成第一表达熵数据集;所述表达熵为信使核糖核酸表达的离散程度;
    根据所述第一表达熵数据集对所述表达熵模型进行训练,完成所述表达熵模型的构建。
  3. 如权利要求2所述的单细胞类型检测方法,其特征在于,所述将参考数据输入表达熵模型,确定所述参考数据中每一类细胞包含的信息基因,包括:
    将所述参考数据输入所述表达熵模型中,生成所述M个基因对应的第二表达熵数据集;
    根据所述第一表达熵数据集和所述第二表达熵数据集进行基因筛选,确定所述参考数据中每一类细胞包含的信息基因。
  4. 如权利要求2所述的单细胞类型检测方法,其特征在于,所述根据所述第一表达熵数据集对所述表达熵模型进行训练,完成所述表达熵模型的构建,包括:
    根据所述基因表达量数据集获得所述M个基因的平均基因表达量;
    对所述第一表达熵数据集和所述平均基因表达量进行回归分析,调整所述表达熵模型的参考系数;
    根据调整后的参考系数构建所述表达熵模型。
  5. 如权利要求3所述的单细胞类型检测方法,其特征在于,所述方法还包括:
    当接收到对待测单细胞进行检测获得的基因数据时,将所述基因数据输入所述表达熵模型得到虚拟表达熵数据集;
    根据所述基因数据进行表达熵计算,生成实际表达熵数据集;
    根据所述虚拟表达熵数据集和所述实际表达熵数据集进行计算,确定所述待测细胞的纯度。
  6. 如权利要求3所述的单细胞类型检测方法,其特征在于,所述根据所述第一表达熵数据集和所述第二表达熵数据集进行基因筛选,确定所述参考数据中每一类细胞包含的信息基因,包括:
    根据所述第一表达熵数据集和所述第二表达熵数据集,获取每一所述基因对应的第一表达熵数据和第二表达熵数据;
    计算每一所述基因对应的第二表达熵数据与第一表达熵数据的差值,获得所述M个基因的差值集合;
    按照选取规则从所述差值集合中选出X个差值,将所述X个差值对应的基因作为所述参考数据中每一类细胞包含的信息基因。
  7. 一种单细胞类型检测装置,其特征在于,包括:
    信息基因确定模块,用于将参考数据输入表达熵模型,确定所述参考数据中每一类细胞包含的信息基因;所述参考数据包括N个单细胞中M个基因的表达谱数据集;所述表达熵模型根据所述参考数据训练生成;
    概率计算模块,用于计算所述信息基因在所述每一类细胞中的出现概率;
    细胞类型确定模块,用于当接收到对待测单细胞进行检测获得的所述信息基因对应的表达量时,根据所述出现概率和所述表达量确定所述待测单细胞的细胞类型。
  8. 如权利要求7所述的单细胞检测装置,其特征在于,所述装置还包括:
    数据标准化模块,用于将所述参考数据标准化得到基因表达量数据集;
    表达熵计算模块,用于根据所述基因表达量数据集进行表达熵计算,生成第一表达熵数据集;所述表达熵为信使核糖核酸的基因表达的离散程度;
    模型构建模块,用于根据所述第一表达熵数据集对所述表达熵模型进行训练,完成所述表达熵模型的构建。
  9. 一种设备,其特征在于,所述设备包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-6中任一所述的单细胞类型检测方法。
  10. 一种存储介质,其特征在于,所述存储介质包括存储的计算机程序,其中,在所述计算机程序运行时控制所述存储介质所在设备执行如权利要求1至6任意一项所述的单细胞类型检测方法。
PCT/CN2019/073647 2019-01-29 2019-01-29 单细胞类型检测方法、装置、设备和存储介质 WO2020154885A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/073647 WO2020154885A1 (zh) 2019-01-29 2019-01-29 单细胞类型检测方法、装置、设备和存储介质
CN201980000101.XA CN109891508B (zh) 2019-01-29 2019-01-29 单细胞类型检测方法、装置、设备和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/073647 WO2020154885A1 (zh) 2019-01-29 2019-01-29 单细胞类型检测方法、装置、设备和存储介质

Publications (1)

Publication Number Publication Date
WO2020154885A1 true WO2020154885A1 (zh) 2020-08-06

Family

ID=66938359

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/073647 WO2020154885A1 (zh) 2019-01-29 2019-01-29 单细胞类型检测方法、装置、设备和存储介质

Country Status (2)

Country Link
CN (1) CN109891508B (zh)
WO (1) WO2020154885A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114038505A (zh) * 2021-10-19 2022-02-11 清华大学 一种在线整合多来源单细胞数据的方法和系统
CN116189770A (zh) * 2022-11-02 2023-05-30 杭州链康医学检验实验室有限公司 一种单细胞转录组rna污染去除方法、介质和设备
CN116564418A (zh) * 2023-04-20 2023-08-08 深圳湾实验室 细胞类群相关性网络构建方法和装置、设备及存储介质
CN117116356A (zh) * 2023-10-25 2023-11-24 智泽童康(广州)生物科技有限公司 细胞亚群关联网络图的生成方法、存储介质和服务器
CN116564418B (zh) * 2023-04-20 2024-06-11 深圳湾实验室 细胞类群相关性网络构建方法和装置、设备及存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111243675A (zh) * 2020-01-07 2020-06-05 广州基迪奥生物科技有限公司 一种交互式细胞异质性分析可视化平台及其实现方法
CN112289379B (zh) * 2020-10-15 2022-11-22 天津诺禾致源生物信息科技有限公司 细胞类型的确定方法、装置、存储介质及电子装置
CN112837754B (zh) * 2020-12-25 2022-10-28 北京百奥智汇科技有限公司 一种基于特征基因的单细胞自动分类方法和装置
CN113889180B (zh) * 2021-09-30 2024-05-24 山东大学 一种基于动态网络熵的生物标记物识别方法与系统
CN114107512B (zh) * 2022-01-26 2022-05-13 北京大学 一种免疫治疗获得性耐药的早期筛查装置及其应用
CN115083522B (zh) * 2022-08-18 2022-10-28 天津诺禾致源生物信息科技有限公司 细胞类型的预测方法、装置及服务器

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010033777A2 (en) * 2008-09-19 2010-03-25 University Of Pittsburgh-Of The Commonwealth System Of Higher Education Discovery of t -homology in a set of sequences and production of lists of t-homologous sequences with predefined properties
CN102952854A (zh) * 2011-08-25 2013-03-06 深圳华大基因科技有限公司 单细胞分类和筛选方法及其装置
CN104598774A (zh) * 2015-02-04 2015-05-06 河南师范大学 基于logistic与相关信息熵的特征基因选择方法
CN108897988A (zh) * 2018-05-14 2018-11-27 浙江大学 一种群智能寻优的结肠癌癌细胞检测仪

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4461240B2 (ja) * 2004-09-27 2010-05-12 独立行政法人産業技術総合研究所 遺伝子発現プロファイル検索装置、遺伝子発現プロファイル検索方法およびプログラム
CN106295251A (zh) * 2015-05-25 2017-01-04 中国科学院青岛生物能源与过程研究所 基于单细胞表现型数据库的表型数据分析处理方法
CN105297142B (zh) * 2015-08-19 2018-12-07 南方科技大学 同时对单细胞基因组和转录组构库及测序的方法基于单细胞整合基因组学的测序方法及应用
CN106701995B (zh) * 2017-02-20 2019-11-26 元码基因科技(北京)股份有限公司 通过单细胞转录组测序进行细胞质量控制的方法
CN108520249A (zh) * 2018-04-19 2018-09-11 赵乐 一种细胞分类器的构建方法、装置及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010033777A2 (en) * 2008-09-19 2010-03-25 University Of Pittsburgh-Of The Commonwealth System Of Higher Education Discovery of t -homology in a set of sequences and production of lists of t-homologous sequences with predefined properties
CN102952854A (zh) * 2011-08-25 2013-03-06 深圳华大基因科技有限公司 单细胞分类和筛选方法及其装置
CN104598774A (zh) * 2015-02-04 2015-05-06 河南师范大学 基于logistic与相关信息熵的特征基因选择方法
CN108897988A (zh) * 2018-05-14 2018-11-27 浙江大学 一种群智能寻优的结肠癌癌细胞检测仪

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114038505A (zh) * 2021-10-19 2022-02-11 清华大学 一种在线整合多来源单细胞数据的方法和系统
CN116189770A (zh) * 2022-11-02 2023-05-30 杭州链康医学检验实验室有限公司 一种单细胞转录组rna污染去除方法、介质和设备
CN116189770B (zh) * 2022-11-02 2023-08-18 杭州链康医学检验实验室有限公司 一种单细胞转录组rna污染去除方法、介质和设备
CN116564418A (zh) * 2023-04-20 2023-08-08 深圳湾实验室 细胞类群相关性网络构建方法和装置、设备及存储介质
CN116564418B (zh) * 2023-04-20 2024-06-11 深圳湾实验室 细胞类群相关性网络构建方法和装置、设备及存储介质
CN117116356A (zh) * 2023-10-25 2023-11-24 智泽童康(广州)生物科技有限公司 细胞亚群关联网络图的生成方法、存储介质和服务器
CN117116356B (zh) * 2023-10-25 2024-01-30 智泽童康(广州)生物科技有限公司 细胞亚群关联网络图的生成方法、存储介质和服务器

Also Published As

Publication number Publication date
CN109891508B (zh) 2023-05-23
CN109891508A (zh) 2019-06-14

Similar Documents

Publication Publication Date Title
WO2020154885A1 (zh) 单细胞类型检测方法、装置、设备和存储介质
CN106650780B (zh) 数据处理方法及装置、分类器训练方法及系统
Kuismin et al. Estimation of covariance and precision matrix, network structure, and a view toward systems biology
CN110796270A (zh) 一种机器学习模型选择方法
WO2022213789A1 (zh) 锂电池soc估计方法、装置及计算机可读存储介质
CN112800231B (zh) 电力数据校验方法、装置、计算机设备和存储介质
CN109633448B (zh) 识别电池健康状态的方法、装置和终端设备
CN111625516A (zh) 检测数据状态的方法、装置、计算机设备和存储介质
Zhu et al. Single-cell clustering based on shared nearest neighbor and graph partitioning
US11995568B2 (en) Identification and prediction of metabolic pathways from correlation-based metabolite networks
CN112990330A (zh) 用户用能异常数据检测方法及设备
CN112363896A (zh) 日志异常检测系统
CN110796159A (zh) 基于k-means算法的电力数据分类方法及系统
Kim et al. A method to identify differential expression profiles of time-course gene data with Fourier transformation
Xu et al. Ontology integration to identify protein complex in protein interaction networks
CN112287980A (zh) 基于典型特征向量的动力电池筛选方法
WO2018036402A1 (zh) 模型中关键变量的探测方法及装置
CN104899507A (zh) 一种网络高维大数据异常入侵的检测方法
CN108519993B (zh) 基于多数据流计算的社交网络热点事件检测方法
Wang et al. Dynamic early recognition of abnormal lithium-ion batteries before capacity drops using self-adaptive quantum clustering
Zhang et al. Dbiecm-an evolving clustering method for streaming data clustering
Liu et al. Are dropout imputation methods for scRNA-seq effective for scATAC-seq data?
CN115831219B (zh) 一种质量预测方法、装置、设备及存储介质
CN111984514A (zh) 基于Prophet-bLSTM-DTW的日志异常检测方法
WO2020124977A1 (zh) 生产数据处理方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19913762

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19913762

Country of ref document: EP

Kind code of ref document: A1