CN115440303B - Method, medium and equipment for filtering low-quality cells of unicellular transcriptome - Google Patents

Method, medium and equipment for filtering low-quality cells of unicellular transcriptome Download PDF

Info

Publication number
CN115440303B
CN115440303B CN202211367300.4A CN202211367300A CN115440303B CN 115440303 B CN115440303 B CN 115440303B CN 202211367300 A CN202211367300 A CN 202211367300A CN 115440303 B CN115440303 B CN 115440303B
Authority
CN
China
Prior art keywords
cell
cells
multicellular
gene
artificial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211367300.4A
Other languages
Chinese (zh)
Other versions
CN115440303A (en
Inventor
陈哲名
郎秋蕾
韩斐然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Lianchuan Biotechnology Co ltd
Original Assignee
Hangzhou Lianchuan Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Lianchuan Biotechnology Co ltd filed Critical Hangzhou Lianchuan Biotechnology Co ltd
Priority to CN202211367300.4A priority Critical patent/CN115440303B/en
Priority to CN202310175918.9A priority patent/CN116486916A/en
Priority to CN202310181167.1A priority patent/CN116805511A/en
Publication of CN115440303A publication Critical patent/CN115440303A/en
Application granted granted Critical
Publication of CN115440303B publication Critical patent/CN115440303B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • Library & Information Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Chemical & Material Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method for filtering low-quality cells of a single-cell transcriptome, and relates to a biological data processing method. The method comprises the following steps: grouping the cells; taking the average expression quantity according to the genes to generate a characteristic expression profile of the cell population; combining the characteristic expression profiles of the cell populations randomly in pairs to generate artificial multiple cells; combining the artificial multi-cell expression profile and the real cell expression profile, and calculating the distance between each cell; setting a plurality of equidistant neighborhoods in a specified range, and calculating the artificial multicellular proportion of each real cell in each neighborhood; counting the artificial multicellular proportion distribution under each neighborhood, solving the bimodal coefficient of the artificial multicellular proportion distribution, and taking the neighborhood with the maximum bimodal coefficient as an optimal neighborhood; in the optimal neighborhood, a prescribed number of real cells having the largest proportion of artificial multicellular cells are identified as multicellular cells, and deleted from the real cell expression profile. The filtering standard and the accuracy of the single-cell transcriptome data are improved, and the reliability of the data is enhanced.

Description

Method, medium and equipment for filtering low-quality cells of unicellular transcriptome
Technical Field
The invention relates to a biological data processing method, in particular to a method, a medium and equipment for filtering low-quality cells of a single-cell transcriptome.
Background
Single cell transcriptome sequencing based on microfluidic technology enables quantification of gene expression of tens of thousands of cells in a single experiment. The method mainly identifies single cells based on sequence tags, and the core technology is to add a unique sequence tag to each cell, and consider nucleic acid sequences carrying the same tag as coming from the same cell during sequencing. The 10X Genomics single cell transcriptome sequencing platform is a widely applied technology at present, realizes high-flux cell sorting and capturing by utilizing technologies such as microfluidics, oil drop wrapping, barcode labels and the like, can separate and mark 500 to tens of thousands of single cells at one time, can obtain transcriptome information of each cell after sequencing, and has the advantages of high cell flux, low library construction cost, short capturing period and the like.
The typical single cell transcriptome sequencing experiment process is as follows, firstly preparing cell suspension, mixing the cell suspension with magnetic beads on a corresponding platform instrument by using a microfluidic chip, and wrapping with oil drops. Each microbead is provided with a unique nucleotide sequence, namely a barcode label, and can mark a single cell. Each barcode tag is also linked to a molecular identifier (UMI) consisting of a nucleotide sequence, and each UMI can tag an mRNA transcript. Through reverse transcription, PCR amplification, library generation and sequencing, whether each sequence in the result is from the same cell and the same mRNA can be determined according to the barcode label and the UMI label in sequencing data, and the method can reduce the preference influence of PCR on different molecules. By matching and counting barcode and UMI, gene expression information is summarized in a counting matrix, thereby obtaining a transcriptome expression profile of an individual cell.
Single cell experiments often obtain single cells in bulk based on dissociation, disruption of biological tissues, which often results in many cell fragments or apoptosis. Droplet-based single-cell transcriptome techniques also exist where two or more cells (or whole cells + cell debris) form a droplet. In the single-cell transcriptome data, hundreds of thousands to millions of droplets are included, but barcode in the droplets cannot automatically identify whether the droplets include cells, or whether the included cells are cell fragments or dead/dying cells or multiple cells, i.e., the amount of the included cytoplasm cannot be automatically determined. The quality of the cells greatly influences the results in the subsequent analysis, so that the type of the droplet represented by the barcode needs to be judged before the data analysis. The 10 Xgenomics official software cellanger can only determine whether barcode is empty liquid drop, and can not identify the cell mass, which may cause that the analysis result of the single cell transcriptome has larger deviation from the actual condition, and even obtains the opposite result in the biological sense. There is currently no systematic method for identifying low quality cells and filtering the cells.
Disclosure of Invention
In order to solve at least one technical problem mentioned in the background art, the present invention aims to provide a method, a medium and a device for filtering low-quality cells in a single-cell transcriptome, which identify and filter low-quality cells in the single-cell transcriptome, improve the filtering standard and accuracy of the single-cell transcriptome, and enhance the reliability of data.
In order to achieve the purpose, the invention provides the following technical scheme:
a method for filtering low-quality cells of a single-cell transcriptome, comprising the steps of:
s101, grouping cells based on the real cell expression profile;
s1041, taking the average value of the expression quantity of the cell expression profile of each cell group according to the gene, and generating the characteristic expression profile of each cell group;
s1042, randomly combining the characteristic expression profiles of the cell populations pairwise to generate a certain number of artificial multiple cells;
s1043, combining the artificial multi-cell expression profile and the real cell expression profile, and calculating the distance between each cell;
s1044, setting a plurality of equidistant neighborhoods in a specified range, and calculating the artificial multicellular proportion of each real cell in each neighborhood;
s1045, counting the artificial multicellular proportion distribution under each neighborhood, solving a double-peak coefficient of the artificial multicellular proportion distribution, and taking the neighborhood with the maximum double-peak coefficient as an optimal neighborhood;
s1046, in the optimal neighborhood, identifying a predetermined number of real cells having the largest proportion of artificial multicellular cells as multicellular cells, and deleting them from the real cell expression profile.
Further, the method for pairwise combination of the characteristic expression profiles comprises the following steps:
Y=a1*X1+a2*X2
wherein Y is the generated artificial multicellular, and X1 and X2 are characteristic expression profiles of cell populations; a1 and a2 are scaling coefficients, one of a1 and a2 is set to 1, and the other is set to a random value larger than 0 and smaller than 1.
Further, the distance between the cells is an Euclidean distance or a Manhattan distance.
Further, the artificial multicellular proportion is as follows: the ratio of the number of artificial multicellular cells to the total number of cells in the neighborhood in which the combined expression profile is located.
Further, the bimodal coefficients are:
Figure 833859DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 568641DEST_PATH_IMAGE002
is a bimodal coefficient;
Figure 207433DEST_PATH_IMAGE003
and
Figure 930538DEST_PATH_IMAGE004
respectively representing the skewness and kurtosis of artificial multicellular proportion distribution;
Figure 964485DEST_PATH_IMAGE005
the number of the real cells.
Further, the method for determining the specified number is as follows: setting a multicellular ratio, the product of the number of real cells and the multicellular ratio being a prescribed number of real cells identified as multicellular.
Further, in S1044, the neighborhood is set as follows: 100 equidistant neighbourhoods are arranged in the range of 0.0001 to 0.01.
A computer storage medium having stored thereon a computer program which, when executed by a processor, implements the method for low quality cell filtration of a single-cell transcriptome as described above.
A terminal device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of low quality cell filtration of a single-cell transcriptome as described above when executing the computer program.
Compared with the prior art, the invention has the beneficial effects that: the method can generate a certain amount of artificial multiple cells, count the artificial multiple cell proportion distribution of each real cell in the neighborhood according to each set neighborhood, determine the optimal neighborhood, then regard a plurality of real cells with the maximum artificial multiple cell proportion as multiple cells in the optimal neighborhood, and delete the multiple cells (low-quality cells) from the real cell expression spectrum, thereby improving the filtering standard and accuracy of the single-cell transcriptome data and enhancing the reliability of the data.
Drawings
FIG. 1 is an overall flow chart of an embodiment of the present invention.
FIG. 2 is a flow chart of a multi-cell filtration process according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The first embodiment is as follows:
referring to fig. 1, the embodiment provides a method for filtering low-quality cells of a single-cell transcriptome, the implementation process mainly includes steps S100 to S104, which are described in detail as follows:
in step S100, the original expression profile (also called real cell expression profile) may contain non-cell data, so the original expression profile is initially filtered by using cellanger software to remove the non-cell data in the data, and a filtered cell expression profile is generated. If the original expression profile does not contain non-cell data, the original expression profile can be directly adopted for the next operation without filtering.
The original expression profile of this example is shown in Table 1 below.
Table 1: original expression profiles
Figure 881887DEST_PATH_IMAGE007
In Table 1 above, the numbers of the cells such as C1, C301 and C15001, the numbers of the genes such as G1, G2 and G53, and the data in the text of the table indicate the expression levels of the genes.
The primary filtered cell expression profile of this example is shown in Table 2 below.
Table 2: primary filtering of cell expression profiles
Figure 7975DEST_PATH_IMAGE009
In comparison to table 1 above, one-time filtering of the cell expression profile filtered cell C10001 and cell C15001.
And step S101, based on the primary filtered cell expression profile or the original expression profile, carrying out normalization, dimensionality reduction and grouping on the cells by using Seurat software. Wherein the normalization method selects LogNormalize, and the scale factor parameter is set as 10000; the screening method for the hypervariable gene used "vst"; reducing the PCA dimension to 50 PC dimensions; clustering used FindNeighbors functions with resolution parameter set to 0.8. To this end, all cells are divided into different cell populations. The grouping results of the primary filtered cell expression profile of this example are shown in Table 3 below:
table 3: clustering results of once filtered cell expression profiles
Figure 534771DEST_PATH_IMAGE011
In table 3 above, the cells C1 and C601 are divided into the cell group a, and similarly, the other cells are divided into the cell group B, the cell group C, and the cell group D.
The cell debris and dead/dying cells are filtered, and the steps S102 to S103 are completed. It is worth mentioning that the cell debris and dead/dying cells may be filtered together or may be removed separately.
Step S102, setting four gene sets A for the genes respectively gene 、A mt 、A active 、A antioxi And calculating the expressed gene fraction S of each cell population gene Mitochondrial fraction S mt Activity of the compoundFraction S active Antioxidant fraction S antioxi
Wherein, A gene Total gene set (including G1 to G53 in table 3 above) in the cell expression profile was filtered for one time; s gene The average value of the number of expressed genes (i.e., the number of genes whose expression level is greater than 0) in all cells in the cell population.
A mt The set of mitochondrial genes for the corresponding species (including G1 and G2 in table 3 above); s mt Is A of all cells in the cell population mt Average value of the ratio of Gene expression, A mt Ratio of Gene expression = A in Individual cells mt Sum of Gene expression amounts/in Individual cells A gene Sum of gene expression amounts 100%.
A active Housekeeping gene sets for the corresponding species (including G6 and G7 in table 3 above); human ACTB and GAPDH genes; s active A for all cells in the cell population active Average value of average expression amount of Gene, A active Mean gene expression = a in single cell active Sum of Gene expression amount/A active Number of genes.
A antioxi Set of antioxidant genes for the corresponding species (including G20 and G21 in table 3 above); such as human SOD1, SOD2, SOD3, CAT, GPX1, GPX2, GPX3, GPX4, GPX5, GPX6, GPX7, GPX8, NQO1, NFE2L2 genes. S antioxi Is A of all cells in the cell population antioxi Average value of average expression amount of Gene, A antioxi Mean gene expression = a in individual cells antioxi Sum of Gene expression amount/A antioxi Number of genes.
In conclusion, for each cell population, the corresponding expressed gene fraction S is obtained gene Mitochondrial fraction S mt Activity fraction S active Antioxidant fraction S antioxi Table 4 below is a table showing the scores obtained.
Table 4: display table of four scores
Figure 298590DEST_PATH_IMAGE012
Step S103, setting S gene 、S mt 、S active 、S antioxi Corresponding threshold value G gene 、G mt 、G active 、G antioxi And determining the cell population type.
When S is gene Fraction less than G gene At this time, the cell population was judged to be cell debris.
When S is mt Greater than G mt And S antioxi Less than G antioxi When the cells were dead or dying, the cell population was judged.
When S is active Less than G active When the cells were dead or dying, the cell population was judged.
G gene 、G mt 、G active 、G antioxi Typically set at 500, 25%, 2.
As shown in table 4 above, since the cell population D (corresponding cell C8001) satisfies the above three at the same time, the cell population D is judged as both cell debris and dead/dying cells.
In the primary-filtered cell expression profile, cells judged as cell debris and dead/dying cells (which can be deleted if either of them is satisfied) are deleted, and a secondary-filtered cell expression profile is generated. Table 5 shows the expression profile of the secondary filtered cells of this example.
Table 5: secondary filtering of cell expression profiles
Figure 416588DEST_PATH_IMAGE013
The cells in the above primary filtered cell expression profile and the secondary filtered cell expression profile are both true and therefore also called true cell expression profiles.
And step S104, screening out the multicellular in the real cells based on the simulated multicellular characteristic expression profile and a knn algorithm, and filtering the multicellular from the secondary filtered cell expression profile to generate a final filtered expression profile. The filtration of the multicellular cells may be carried out independently or after filtration of cell debris or dead/dying cells.
Referring to fig. 2, the following steps S1041 and S1046 are specifically implemented (the grouping step is completed by the foregoing step S101):
s1041, generating a characteristic expression profile of each cell group by averaging (rounding off) the expression levels of the respective genes.
Taking cell population a as an example, if it contains only 2 cells in table 5, its characteristic expression profile is shown in table 5 below:
cell population/gene G1 G2 G6 G7 G20 G21 G51 G52 G53
A 2 4 13 18 4 6 740 9 5
S1042, randomly combining the characteristic expression profiles of all cell populations in pairs to generate a certain ratio P N The ratio of P to P N Usually set to 25%; ratio P of artificial multicellular N = number of artificial multicellular cells/(number of artificial multicellular cells + number of true cells) × 100%;
the method for pairwise combination of the characteristic expression profiles comprises the following steps:
Y=a1*X1+a2*X2
wherein Y is the generated artificial multicellular, and X1 and X2 are characteristic expression profiles of cell populations; a1 and a2 are proportionality coefficients, one of a1 and a2 is set to be 1, and the other is set to be a random value larger than 0 and smaller than 1, namely when a1 is 1, a2 is a random value between 0~1, and when a2 is 1, a1 is a random value 1< -a 1 > -a 2 between 0~1. Each artificial multicellular cell generated in this way comprises an intact cell and a defective cell, i.e., the process of generating multicellular cells in actual experiments is simulated.
And S1043, combining the artificial multi-cell expression profile and the secondary filtered cell expression profile, and using Seurat software for renormalization and PCA dimension reduction. Based on PCA dimension reduction results, the distance between each cell is calculated, usually as Euclidean distance or Manhattan distance, and other distance measurement methods can play similar roles.
S1044 setting 100 equidistant neighbourhoods P within the range of 0.0001 to 0.01 kn Calculating each of the domains P kn Next, each real cell is in the neighborhood P kn Inner artificial multicellular proportion P ANN I.e. counting the nearest true cells (total number of cells N in the combined expression profile) merge *P kn ) Number of Artificial multicellular in Individual cell N ANN Calculate P ANN =N ANN /(N merge *P kn )。
S1045, counting each neighborhood P kn P of ANN And (4) distributing, and calculating the skewness s and the kurtosis k of the distribution. The calculation formula of the skewness s is s =
Figure 593754DEST_PATH_IMAGE014
The formula for calculating kurtosis k is k =
Figure 924241DEST_PATH_IMAGE015
Where n is the same neighborhood P kn Lower P ANN Number of (1), p i Is the same neighborhood P kn The lower ith P ANN The value of (A), M is the same neighborhood P kn All of ANN SD is the same neighborhood P kn All of ANN Standard deviation of (d).
Recalculate each P ANN Distributed bimodal coefficient BC, BC =
Figure 542566DEST_PATH_IMAGE016
Wherein N is real The number of the real cells. Selecting the neighborhood size when the double-peak coefficient BC is maximum as the optimal neighborhood P used finally K
In the optimal neighborhood P K Under the neighborhood, the boundary of single cell and multiple cells can be clarified to the greatest extent, namely all cells can be divided into single cell and multiple cells as much as possible, and the number of the cells in the intermediate state with fuzzy classification is reduced to obtain the cells in the intermediate state which are as accurate as possibleThe classification result of (1).
S1046, setting the multicellular rate R doub Calculating the expected value E of the number of multiple cells doub =N real *R doub . Calculating the neighborhood size as the optimal neighborhood P K P of each real cell ANN A1 is to P ANN Maximum E doub Individual cells were identified as multicellular and deleted from the secondary filtered cell expression profile to generate the final filtered cell expression profile.
Example two:
a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the method for single-cell transcriptome RNA contamination identification as described in example one and/or the method for low-quality cell filtration of a single-cell transcriptome as described in example two.
Example three:
a terminal device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of low quality cell filtration for a single-cell transcriptome of embodiment one when executing the computer program.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (8)

1. A method for filtering low-quality cells of a single-cell transcriptome, comprising the steps of:
s101, grouping cells based on the real cell expression profile;
the genes are respectively arranged into four gene sets A gene 、A mt 、A active 、A antioxi And calculating the expressed gene fraction S of each cell population gene Mitochondrial fraction S mt Activity fraction S active Antioxidant fraction S antioxi ;A gene For a primary filtration of the total gene set, S, in the cell expression profile gene Is the average of the number of expressed genes of all cells in the cell population; a. The mt Set of mitochondrial genes for the corresponding species, S mt Is A of all cells in the cell population mt Average value of gene expression ratio; a. The active Housekeeping gene sets, S, for the corresponding species active Is A of all cells in the cell population active Average value of average expression level of gene; a. The antioxi Set of antioxidant genes, S, for the corresponding species antioxi Is A of all cells in the cell population antioxi Average value of average expression level of gene; setting S gene 、S mt 、S active 、S antioxi Corresponding threshold value G gene 、G mt 、G active 、G antioxi And determining the cell population type; deleting cells judged as cell debris and dead/dying cells in the true cell expression profile;
s1041, taking the average value of the expression quantity of the cell expression profile of each cell group according to the gene, and generating the characteristic expression profile of each cell group;
s1042, randomly combining every two characteristic expression profiles of the cell populations to generate a certain number of artificial multiple cells;
s1043, combining the artificial multi-cell expression profile and the real cell expression profile, and calculating the distance between each cell;
s1044, setting a plurality of equidistant neighborhoods in a specified range, and calculating the artificial multicellular proportion of each real cell in each neighborhood;
s1045, counting the artificial multicellular proportion distribution under each neighborhood, solving a double-peak coefficient of the artificial multicellular proportion distribution, and taking the neighborhood with the maximum double-peak coefficient as an optimal neighborhood;
s1046, under the optimal neighborhood, identifying the specified number of real cells with the maximum artificial multicellular proportion as multicellular, and deleting the multicellular from the real cell expression spectrum;
the bimodal coefficient is:
Figure 143272DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 965734DEST_PATH_IMAGE002
is a bimodal coefficient;
Figure 295085DEST_PATH_IMAGE003
and
Figure 493985DEST_PATH_IMAGE004
respectively representing the skewness and kurtosis of artificial multicellular proportion distribution;
Figure 225180DEST_PATH_IMAGE005
the number of the real cells.
2. The method for filtering low-quality cells of a single-cell transcriptome according to claim 1, wherein said characteristic expression profile is combined two by the following method:
Y=a1*X1+a2*X2
wherein Y is the generated artificial multicellular, and X1 and X2 are characteristic expression profiles of cell populations; one of a1 and a2 is set to 1, and the other is set to a random value greater than 0 and less than 1.
3. The method of claim 1, wherein the distance between said cells is Euclidean distance or Manhattan distance.
4. The method of claim 1, wherein the artificial multicellular ratio is: the ratio of the number of artificial multicellular cells to the total number of cells in the neighborhood in which the combined expression profile is located.
5. The method of claim 1, wherein the predetermined number is determined by the following method: setting a multicellular ratio, the product of the number of real cells and the multicellular ratio being a prescribed number of real cells identified as multicellular.
6. The method for filtering low-quality cells of a transcriptome of single cell according to claim 1, wherein in said S1044, neighborhood is set as follows: 100 equidistant neighbourhoods are arranged in the range of 0.0001 to 0.01.
7. A computer storage medium on which a computer program is stored which, when being executed by a processor, carries out the method for low quality cell filtration of a single-cell transcriptome of any of claims 1 to 6.
8. A terminal device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the method for low quality cell filtration in a single-cell transcriptome of any of claims 1 to 6.
CN202211367300.4A 2022-11-03 2022-11-03 Method, medium and equipment for filtering low-quality cells of unicellular transcriptome Active CN115440303B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202211367300.4A CN115440303B (en) 2022-11-03 2022-11-03 Method, medium and equipment for filtering low-quality cells of unicellular transcriptome
CN202310175918.9A CN116486916A (en) 2022-11-03 2022-11-03 Single cell transcriptome dying cell and multicellular filtration method, medium and equipment
CN202310181167.1A CN116805511A (en) 2022-11-03 2022-11-03 Single cell transcriptome cell debris and multicellular filtration method, medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211367300.4A CN115440303B (en) 2022-11-03 2022-11-03 Method, medium and equipment for filtering low-quality cells of unicellular transcriptome

Related Child Applications (2)

Application Number Title Priority Date Filing Date
CN202310181167.1A Division CN116805511A (en) 2022-11-03 2022-11-03 Single cell transcriptome cell debris and multicellular filtration method, medium and equipment
CN202310175918.9A Division CN116486916A (en) 2022-11-03 2022-11-03 Single cell transcriptome dying cell and multicellular filtration method, medium and equipment

Publications (2)

Publication Number Publication Date
CN115440303A CN115440303A (en) 2022-12-06
CN115440303B true CN115440303B (en) 2023-02-10

Family

ID=84252019

Family Applications (3)

Application Number Title Priority Date Filing Date
CN202211367300.4A Active CN115440303B (en) 2022-11-03 2022-11-03 Method, medium and equipment for filtering low-quality cells of unicellular transcriptome
CN202310175918.9A Pending CN116486916A (en) 2022-11-03 2022-11-03 Single cell transcriptome dying cell and multicellular filtration method, medium and equipment
CN202310181167.1A Pending CN116805511A (en) 2022-11-03 2022-11-03 Single cell transcriptome cell debris and multicellular filtration method, medium and equipment

Family Applications After (2)

Application Number Title Priority Date Filing Date
CN202310175918.9A Pending CN116486916A (en) 2022-11-03 2022-11-03 Single cell transcriptome dying cell and multicellular filtration method, medium and equipment
CN202310181167.1A Pending CN116805511A (en) 2022-11-03 2022-11-03 Single cell transcriptome cell debris and multicellular filtration method, medium and equipment

Country Status (1)

Country Link
CN (3) CN115440303B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116312786B (en) * 2023-02-08 2023-11-28 杭州联川生物技术股份有限公司 Single cell expression pattern difference evaluation method based on multi-group comparison
CN117995275A (en) * 2023-03-02 2024-05-07 杭州联川生物技术股份有限公司 Single cell expression mode difference evaluation method, medium and equipment based on reliability screening

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114708910A (en) * 2022-02-24 2022-07-05 上海市第一人民医院 Method for calculating cell subset enrichment fraction in cell sequencing by using single cell sequencing data

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101149327B (en) * 2007-11-06 2010-06-30 浙江大学 Antineoplastic drug evaluation and screening method based on cell microscopic image information
MX2011011947A (en) * 2009-05-11 2012-01-30 Berg Biosystems Llc Methods for the diagnosis of metabolic disorders using epimetabolic shifters, multidimensional intracellular molecules, or environmental influencers.
EP4253412A3 (en) * 2015-12-16 2023-11-22 The Walter and Eliza Hall Institute of Medical Research Inhibition of cytokine-induced sh2 protein in nk cells
JP2020527946A (en) * 2017-07-21 2020-09-17 ザ ボード オブ トラスティーズ オブ ザ レランド スタンフォード ジュニア ユニバーシティー Systems and methods for analyzing mixed cell populations
CN107368701A (en) * 2017-07-31 2017-11-21 浙江绍兴千寻生物科技有限公司 In high volume unicellular ATAC seq data quality controls and analysis method
CN111292807B (en) * 2018-12-06 2021-10-08 新格元(南京)生物科技有限公司 Method for analyzing double cells in single-cell transcriptome data
CN110675914B (en) * 2019-09-17 2024-01-26 佛山市第一人民医院(中山大学附属佛山医院) Method for screening tumor specific T cells and TCR
US20220364147A1 (en) * 2019-11-15 2022-11-17 Miltenyi Biotec B.V. & Co. KG Color and bardcoded beads for single cell indexing
CN111951892B (en) * 2020-08-04 2024-06-18 荣联科技集团股份有限公司 Method and electronic equipment for analyzing cell track based on single-cell sequencing data
CN112700820B (en) * 2021-01-07 2021-11-19 广州华银健康医疗集团股份有限公司 Cell subset annotation method based on single cell transcriptome sequencing
CN113674800B (en) * 2021-08-25 2022-02-08 中国农业科学院蔬菜花卉研究所 Cell clustering method based on single cell transcriptome sequencing data
CN114438230A (en) * 2022-03-08 2022-05-06 南通大学附属医院 Cell development map and marker gene of human three-month-old embryo mandible tissue
CN116525010A (en) * 2023-04-26 2023-08-01 杭州联川生物技术股份有限公司 Single-cell transcriptome double-source multi-cell filtering method, medium and equipment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114708910A (en) * 2022-02-24 2022-07-05 上海市第一人民医院 Method for calculating cell subset enrichment fraction in cell sequencing by using single cell sequencing data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于单细胞转录组测序技术分析缺血性脑卒中脑细胞转录组表达差异;韦婉等;《脑与神经疾病杂志》;20200930(第10期);第34-40页 *

Also Published As

Publication number Publication date
CN116805511A (en) 2023-09-26
CN116486916A (en) 2023-07-25
CN115440303A (en) 2022-12-06

Similar Documents

Publication Publication Date Title
CN115440303B (en) Method, medium and equipment for filtering low-quality cells of unicellular transcriptome
Abrego et al. Fungal communities decline with urbanization—more in air than in soil
Christie et al. Bayesian parentage analysis with systematic accountability of genotyping error, missing data and false matching
Buen Abad Najar et al. Coverage-dependent bias creates the appearance of binary splicing in single cells
US20230197196A1 (en) Allelotyping Methods for Massively Parallel Sequencing
Glusman et al. Optimal scaling of digital transcriptomes
Lewis et al. Family SES is associated with the gut microbiome in infants and children
CN111477281A (en) Pan-genome construction method and construction device based on phylogenetic tree
CN107832584B (en) Gene analysis method, device, equipment and storage medium of metagenome
CN113470743A (en) Differential gene analysis method based on BD single cell transcriptome and proteome sequencing data
Thurman et al. Differential gene expression analysis for multi-subject single-cell RNA-sequencing studies with aggregateBioVar
CN116525010A (en) Single-cell transcriptome double-source multi-cell filtering method, medium and equipment
Huang et al. treeclimbR pinpoints the data-dependent resolution of hierarchical hypotheses
Schwender et al. Identifying interesting genes with siggenes
Roche et al. The accuracy of absolute differential abundance analysis from relative count data
Sheng et al. Probabilistic machine learning ensures accurate ambient denoising in droplet-based single-cell omics
Qi et al. SDImpute: a statistical block imputation method based on cell-level and gene-level information for dropouts in single-cell RNA-seq data
Pique-Regi et al. Joint estimation of copy number variation and reference intensities on multiple DNA arrays using GADA
CN107273715A (en) A kind of detection method and device
Niederberger et al. Factor graph analysis of live cell–imaging data reveals mechanisms of cell fate decisions
Steinheuer et al. Benchmarking scRNA-seq imputation tools with respect to network inference highlights deficits in performance at high levels of sparsity
Pan et al. The Poisson distribution model fits UMI-based single-cell RNA-sequencing data
Agten et al. A compositional model to predict the aggregated isotope distribution for average DNA and RNA oligonucleotides
Aparicio et al. Quasi-universality in single-cell sequencing data
Zhang An improved nonparametric approach for detecting differentially expressed genes with replicated microarray data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant