CN116072217A - Single cell transcriptome data availability processing method, medium and equipment - Google Patents

Single cell transcriptome data availability processing method, medium and equipment Download PDF

Info

Publication number
CN116072217A
CN116072217A CN202310126779.0A CN202310126779A CN116072217A CN 116072217 A CN116072217 A CN 116072217A CN 202310126779 A CN202310126779 A CN 202310126779A CN 116072217 A CN116072217 A CN 116072217A
Authority
CN
China
Prior art keywords
barcode
cell
gene expression
larger
area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310126779.0A
Other languages
Chinese (zh)
Other versions
CN116072217B (en
Inventor
陈哲名
郎秋蕾
陈志锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Lianchuan Gene Diagnosis Technology Co ltd
Original Assignee
Hangzhou Lianchuan Gene Diagnosis Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Lianchuan Gene Diagnosis Technology Co ltd filed Critical Hangzhou Lianchuan Gene Diagnosis Technology Co ltd
Priority to CN202310126779.0A priority Critical patent/CN116072217B/en
Publication of CN116072217A publication Critical patent/CN116072217A/en
Application granted granted Critical
Publication of CN116072217B publication Critical patent/CN116072217B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Public Health (AREA)
  • Molecular Biology (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)

Abstract

The invention discloses a single cell transcriptome data availability analysis method, and relates to a biological data analysis method. The method comprises the following steps: sequencing the barcode according to the large-to-small gene expression quantity; obtaining the inflection point of the variation amplitude of the gene expression quantity; traversing all inflection points, classifying the barcode into a cell area, an empty droplet area and a magnetic bead area, and counting the corresponding number of the barcode; extracting the expression profile of all barcode in the cell area; counting and comparing the read numbers of the reference genome, and calculating the average read number of the cells; when the gene expression quantity corresponding to at least one inflection point is larger than G1, the gene expression quantity corresponding to at least 1 inflection point is larger than G2 and smaller than G1, the number of the barcode in the cell area is larger than K3, the number of the barcode in the empty liquid drop area is larger than K4, and the average read number of the cells is larger than K6, judging that the sample data is available; otherwise, it is determined that the sample data is not available. The invention can systematically analyze the availability of single-cell transcriptome data, provide data availability early warning before downstream analysis, and save analysis time and energy of analysis personnel.

Description

Single cell transcriptome data availability processing method, medium and equipment
Cross Reference to Related Applications
The application is based on application number 2022113631393, and the application date is: 2022, 11, 02, the name of the invention is: a single cell transcriptome data availability analysis method, medium and apparatus are disclosed.
Technical Field
The invention relates to a biological data analysis method, in particular to a single cell transcriptome data availability processing method, medium and equipment.
Background
Single cell transcriptome sequencing technology can obtain information of nearly ten thousand gene expression in a single cell, and can distinguish transcription characteristics of various cell types in biological tissues and comprehensively reveal gene expression heterogeneity among cells. The core technology of the high-throughput single-cell sequencing platform is to add a unique sequence tag to each cell and treat the nucleic acid sequence carrying the same tag as being from the same cell during sequencing. The 10XGenomics single-cell transcriptome sequencing platform is a technology widely applied at present, and the platform utilizes technologies such as micro-flow control, oil drop encapsulation, barcode labels and the like to realize high-flux cell sorting and capturing, can separate and mark 500 to tens of thousands of single cells at one time, can obtain transcriptome information of each cell after sequencing, and has the advantages of high cell flux, low library construction cost, short capturing period and the like. The technology is mainly used for cell typing and identification of marker factors, can realize division of cell populations and detection of gene expression differences among the cell populations, can also predict cell differentiation and development tracks, and plays an increasingly important role in the current disease, immunity and tumor fields and research on tissues, organs and development.
A typical single cell transcriptome sequencing technique consists of 6 steps: single cell analysis, RNA isolation, reverse transcription, amplification, library generation and sequencing. The first two steps are particularly important. 10X Genomics single cell transcriptome sequencing technology utilizes microfluidic chips to encapsulate microbeads and single cells with a barcode tag in one droplet. Each microbead carries a unique nucleotide sequence, namely, a barcode tag, which can label individual cells. Each barcode tag is also linked to a molecular identifier (unique molecular identifier, UMI) which also consists of a nucleotide sequence, each UMI being capable of labelling an mRNA transcript. Through reverse transcription, PCR amplification, library generation and sequencing, in sequencing data, whether each sequence in the result is from the same cell and the same mRNA can be determined according to the barcode label and UMI label, so that a transcriptome expression profile of a single cell is obtained.
Although 10X Genomics single cell transcriptome sequencing technology can detect thousands of cells simultaneously, it is premised on the normal generation of droplets (GEMs) that encapsulate cells and microbeads, and that there is a sufficient amount of sequencing data for each cell. When the generation of GEMs fails or the number of cells is excessive during the experiment, it is difficult for the sequencing data to correctly reflect the true state of the cells. The reason for failure of GEMs generation may be that cells or magnetic beads are blocked in a microchannel (collectively referred to as blocking holes), or that oil droplets do not properly contain a cell suspension (collectively referred to as weighing failure), the former results in extremely low numbers of captured cells, and the latter results in unclear boundaries of captured cells, which causes confusion of expression profiles. While an excessive number of cells may result in an insufficient sequencing of each cell, resulting in severely unstable results. Under the prior art conditions, the data problems caused by the reasons cannot be directly reflected in the experimental process and the data volume, and sequencing data is often found to be unavailable when the data is analyzed to a certain degree, so that a lot of manpower, calculation force and time are wasted.
Disclosure of Invention
In order to solve at least one technical problem mentioned in the background art, the invention aims to provide a single-cell transcriptome data availability analysis method, medium and equipment, which can judge whether single-cell transcriptome data is unavailable due to experimental problems, provide data availability early warning before downstream analysis, save analysis time and energy of analysts, and provide basis for subsequent processing according to corresponding processing schemes.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a single cell transcriptome data availability analysis method comprising the steps of:
s1, sequencing barcode according to the gene expression quantity from large to small;
s2, obtaining an inflection point of the variation amplitude of the gene expression quantity;
s3, traversing all inflection points, and classifying the barcode into a cell area, an empty droplet area and a magnetic bead area by combining the size of the gene expression quantity;
s4, counting the numbers of the barcode in the cell area, the empty droplet area and the magnetic bead area;
s5, extracting expression profiles of all barcode in a cell area;
s6, counting and comparing the read numbers of the upper reference genome, and calculating the average read number of the cells;
s7, judging that sample data is available when the gene expression quantity corresponding to at least one inflection point is larger than G1, the gene expression quantity corresponding to at least 1 inflection point is larger than G2 and smaller than G1, the number of barcode in a cell area is larger than K3, the number of barcode in an empty liquid drop area is larger than K4, and the average read number of cells is larger than K6; otherwise, it is determined that the sample data is not available.
Further, the method for solving the inflection point of the variation amplitude of the gene expression level is as follows:
s21, drawing a scatter diagram by taking the ranking of the barcode as an X axis and the gene expression quantity as a Y axis;
s22, taking the nearest point on the scatter diagram at a specified distance, and obtaining the slope between two adjacent points;
s23, when the change trend of the slope is changed from large to small and the slope is smaller than the set slope threshold for the first time in the trend duration, the corresponding point is set as an inflection point.
Further, before the scatter diagram is drawn in S21, the ranking of the barcode and the gene expression level are subjected to logarithmic processing.
Further, the classification method of the cell area, the empty droplet area and the magnetic bead area is as follows:
classifying the corresponding gene expression amount of barcode before the inflection point of G1 into a cell region; classifying the barcode whose corresponding gene expression level is located before the inflection point between G1 and G2 and is not in the cell region into an empty droplet region; and classifying the corresponding genes into a magnetic bead region after the inflection point of which the expression quantity of the corresponding genes is smaller than G2.
Further, when the sample data is not available, further judging the reason why the sample data is not available:
calculating the expression proportion of different genes in the barcode, and counting the number of first genes with the expression proportion larger than P1 and the number of second genes with the expression proportion larger than P2;
when the gene expression quantity corresponding to only one inflection point is larger than G2 and the first gene quantity is larger than K1 or the second gene quantity is larger than K2, judging that the sample data is not available, wherein the experiment has a weighing failure;
when the number of the barcode in the cell area is smaller than K3 and the number of the barcode in the empty droplet area is smaller than K4, judging that the sample data is not available, wherein the hole blockage exists in the experiment;
when the number of the barcode in the cell area is smaller than K3 and the number of the barcode in the empty droplet area is larger than K4, judging that the availability of the sample data is to be confirmed, wherein the number of the experimental cells is too small;
when the number of the barcode in the cell area is larger than K5 and the average read number of the cells is smaller than K6, judging that the sample data is not available, wherein the number of the experimental cells is excessive;
further, the step S7 further includes step S8 of performing a corresponding processing method for the data availability condition:
if the sample data is available, normally carrying out subsequent data analysis;
if the sample data is unavailable due to the existence of a weighing failure or hole blockage in the experiment, carrying out the experiment again by using the cell suspension;
if sample data is not available due to an excessive number of experimental cells, the amount of sequencing data is increased.
Further, when the sequencing data amount is increased, the data amount of the complement measurement is as follows:
Gb=(5×10 4 -Read cell )×Barc cell
gb is the complement data quantity; read cell Mean read number for cells,Barc cell Is the number of barcode in the cell region.
A computer storage medium having stored thereon a computer program, characterized in that the program when executed by a processor implements a single cell transcriptome data availability analysis method as described above.
A terminal device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements a single cell transcriptome data availability analysis method as described above when executing the computer program.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention can judge the availability of single cell transcriptome data by calculating the variation amplitude of the gene expression quantity, distinguishing three types of barcode of a cell area, an empty droplet area and a magnetic bead area, and according to the quantity of the various barcode, the gene expression proportion and the sequencing data quantity of the cell droplet. Compared with the prior art, the technical scheme provided by the invention can systematically analyze the availability of single-cell transcriptome data, provide data availability early warning before downstream analysis, and save analysis time and energy of analysis personnel.
2. The invention further analyzes the situation that the sample data is not available, judges whether the single cell transcriptome data is not available due to experimental problems, and provides a corresponding processing method.
Drawings
FIG. 1 is a flow chart of an overall method according to an embodiment of the invention.
FIG. 2 is a scatter diagram of an embodiment of the present invention.
Fig. 3 is a schematic diagram of an inflection point according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Embodiment one:
referring to fig. 1, the present embodiment provides a single cell transcriptome data availability analysis method, which includes the following steps:
s1, according to the gene expression quantity (UMI counts, abbreviated as C UMI ) Ordering the barcode from big to small and assigning a rank R n
S2, obtaining an inflection point of the variation amplitude of the gene expression quantity, namely a sharp variation point; the specific solving method is as follows:
s21, ranking R in barcode n X-axis, gene expression level C UMI Drawing a scatter diagram for the Y axis; to amplify the gene expression level C UMI The present embodiment also ranks R of the barcode n And gene expression level C UMI Log is performed 10 Processing, i.e. in log 10 R n In log of X axis 10 C UMI For the Y-axis, a scatter plot is drawn as shown in fig. 2.
S22, taking the nearest point along the X axis on the scatter diagram, wherein the interval length can be set in a self-defined way or 0.2 or 0.3.
Finding the slope k between two adjacent points n The solution formula is: k (k) n =(y n -y n-1 )/(x n -x n-1 ) The method comprises the steps of carrying out a first treatment on the surface of the In the formula, (x) n ,y n ) Is the coordinates of the nth point, (x) n-1 ,y n-1 ) Coordinates of the n-1 th point; table 1 below shows the ranking and slope of gene expression levels for one example.
Table 1: ranking and slope of gene expression levels
Figure BDA0004082367430000071
S23, when the slope k n The trend of change of (2) is from large to small, i.e. k n <k n-1 And during the duration of the trend, the slope k n When the value is smaller than the set slope threshold for the first time, the value corresponds toThe point of (2) is set as an inflection point; in this embodiment, the slope threshold is-1.
As can be seen from table 1 above, the slope corresponding to barcode of rank 1996 is less than-1 for the first time with a continuous decrease, and is therefore considered to be the inflection point.
As shown in FIG. 3, by step S2, inflection points K can be found on the scatter plot m The number of inflection points is counted for use in subsequent steps.
S3, traversing all inflection points, and classifying the barcode into a cell area, an empty droplet area and a magnetic bead area according to the size of the gene expression quantity. Wherein the meaning represented by each region is as follows:
cell region: barcode represents a droplet comprising cells;
empty drop zone: barcode represents a droplet that does not contain cells but contains a cell suspension;
magnetic bead region: barcode represents a droplet that does not contain cells and does not contain a cell suspension;
the classification method of the cell area, the empty liquid drop area and the magnetic bead area is as follows:
setting two thresholds G1 and G2, (G1 > G2); traversing all inflection points
Corresponding gene expression level C UMI When G1 is greater, classifying all the barcode ranked before the inflection point into a cell region;
classifying the barcode which is ranked before the inflection point and is not in the cell region into an empty droplet region when the corresponding gene expression amount is located between G1 and G2;
when the corresponding gene expression level is smaller than G2, the barcode after the inflection point is classified into the bead region.
G1 and G2 can be adjusted according to practical situations and are generally set to 500 and 80.
S4, counting the barcode quantity Barc of the cell area, the empty droplet area and the magnetic bead area respectively cell 、Barc empty And Barc bead
S5, extracting expression profiles of all barcode in a cell area; calculating the expression proportion P of different genes in the barcode; quantity of barcode expressing a certain Gene (assumed to be Gene A) is C A Expression of the expression ratio PThe calculation formula of (C) is p=c A /Barc cell *100%;
The first gene number with a proportion of expression greater than P1 (50%) and the second gene number with a proportion of expression greater than P2 (70%) were counted.
S6, comparing the Read number Read of the upper reference genome by using 10X Genomics official software cellrange statistics total Calculating the average Read number of cells cell The method comprises the steps of carrying out a first treatment on the surface of the The formula is: read cell =Read total /Barc cell To determine whether the sequencing amount is sufficient.
S7, judging whether the sample data is available or not, and determining that at least one inflection point corresponds to the gene expression quantity C UMI Greater than G1, has a gene expression level C corresponding to at least 1 inflection point UMI Greater than G2 and less than G1, and the number of barcode Barc in the cell region cell Greater than K3 (in this example, K3 is taken to be 2000), the number of barcode Barc in the empty drop zone empty Greater than K4 (in this example, K4 is 30000), the average Read number of cells Read cell When the sample data is larger than K6 (in the embodiment, K6 is 20000), judging that the sample data is available; otherwise, it is determined that the sample data is not available.
In this embodiment, when the sample data is not available, the reason why the sample data is not available is further determined:
when there is only one inflection point corresponding to the gene expression level C UMI Greater than G2, and the first gene number is greater than K1 (in this embodiment, K1 is taken to be 900) or the second gene number is greater than K2 (in this embodiment, K2 is taken to be 300), judging that the sample data is not available, because the experiment has a weighing failure;
when the number of barcode in the cell region is Barc cell Number of barcode Barc of empty drop zone less than K3 empty If the data is smaller than K4, judging that the sample data is not available, wherein the experiment has hole blocking;
when the number of barcode in the cell region is Barc cell Number of barcode Barc of empty drop zone less than K3 empty When the number of the experimental cells is larger than K4, judging that the availability of the sample data is to be confirmed, wherein the number of the experimental cells is too small;
when the number of barcode in the cell region is Barc cell Greater than K5 (in this example, K5 is 20000), and the average Read number of cells Read cell Less than K6, judging that the sample data is not available because the number of experimental cells is excessive and the sequencing depth is insufficient;
s8, a corresponding processing method is carried out according to the data availability condition:
if the sample data is available, normally carrying out subsequent data analysis;
if the sample data is unavailable due to the existence of a weighing failure or hole blockage in the experiment, carrying out the experiment again by using the cell suspension;
if the sample data is not available due to excessive experimental cell numbers, the sequencing data volume is increased, and when the sequencing data volume is increased, the data volume of the complementary measurement is as follows:
Gb=(5×10 4 -Read cell )×Barc cell
gb is the complement data quantity; read cell As average read number of cells, barc cell Is the number of barcode in the cell region.
If the number of experimental cells is too small, the experiment is re-performed with the cell suspension.
Embodiment two:
a computer storage medium having a computer program stored thereon, wherein the program when executed by a processor implements the single cell transcriptome data availability analysis method according to embodiment one.
Embodiment III:
a terminal device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the single cell transcriptome data availability analysis method according to embodiment one of the present invention when executing the computer program.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (7)

1. A single cell transcriptome data availability processing method, comprising the steps of:
s1, sequencing barcode according to the gene expression quantity from large to small, and endowing rank R n At the same time, rank R of barcode n And gene expression level C UMI Log is performed 10 Processing;
s2, obtaining an inflection point of the variation amplitude of the gene expression quantity;
s3, traversing all inflection points, and classifying the barcode into a cell area, an empty droplet area and a magnetic bead area by combining the size of the gene expression quantity;
s4, counting the numbers of the barcode in the cell area, the empty droplet area and the magnetic bead area;
s5, extracting expression profiles of all barcode in a cell area;
s6, counting and comparing the read numbers of the upper reference genome, and calculating the average read number of the cells;
s7, judging that sample data is available when the gene expression quantity corresponding to at least one inflection point is larger than G1, the gene expression quantity corresponding to at least 1 inflection point is larger than G2 and smaller than G1, the number of barcode in a cell area is larger than K3, the number of barcode in an empty liquid drop area is larger than K4, and the average read number of cells is larger than K6; otherwise, judging that the sample data is not available;
when the sample data is not available, further judging the reason why the sample data is not available:
calculating the expression proportion of different genes in the barcode, and counting the number of first genes with the expression proportion larger than P1 and the number of second genes with the expression proportion larger than P2;
when the gene expression quantity corresponding to only one inflection point is larger than G2 and the first gene quantity is larger than K1 or the second gene quantity is larger than K2, judging that the sample data is not available, wherein the oil drops are not correctly contained in the cell suspension in the experiment;
when the number of the barcode in the cell area is smaller than K3 and the number of the barcode in the empty droplet area is smaller than K4, judging that the sample data is not available, wherein the hole blockage exists in the experiment;
when the number of the barcode in the cell area is smaller than K3 and the number of the barcode in the empty droplet area is larger than K4, judging that the availability of the sample data is to be confirmed, wherein the number of the experimental cells is too small;
when the number of barcode in the cell area is greater than K5 and the average read number of cells is less than K6, it is determined that the sample data is not available because the number of experimental cells is excessive.
S8, a corresponding processing method is carried out according to the data availability condition:
if the sample data is available, normally carrying out subsequent data analysis;
if the oil drops in the sample data do not contain the cell suspension correctly or the blocking holes are unavailable due to the experiment, the cell suspension is reused for the experiment;
if sample data is not available due to an excessive number of experimental cells, the amount of sequencing data is increased.
2. The method for processing availability of single cell transcriptome data according to claim 1, wherein the method for solving the inflection point of the magnitude of the change in the gene expression level is as follows:
s21, drawing a scatter diagram by taking the ranking of the barcode as an X axis and the gene expression quantity as a Y axis;
s22, taking the nearest point on the scatter diagram at a specified distance, and obtaining the slope between two adjacent points;
s23, when the change trend of the slope is changed from large to small and the slope is smaller than the set slope threshold for the first time in the trend duration, the corresponding point is set as an inflection point.
3. The method for processing availability of single cell transcriptome data according to claim 2, wherein ranking of the barcode and the gene expression level are logarithmically processed before the scattergram is drawn in S21.
4. The method for processing availability of single cell transcriptome data according to claim 1, wherein the classification method of the cell region, the empty droplet region and the magnetic bead region is as follows:
classifying the corresponding gene expression amount of barcode before the inflection point of G1 into a cell region; classifying the barcode whose corresponding gene expression level is located before the inflection point between G1 and G2 and is not in the cell region into an empty droplet region; and classifying the corresponding genes into a magnetic bead region after the inflection point of which the expression quantity of the corresponding genes is smaller than G2.
5. The method for processing availability of single cell transcriptome data according to claim 1, wherein the additional amount of sequencing data is increased by:
Gb=(5×10 4 -Read cell )×Barc cell
gb is the complement data quantity; read cell As average read number of cells, barc cell Is the number of barcode in the cell region.
6. A computer storage medium having stored thereon a computer program, which when executed by a processor implements the single cell transcriptome data availability processing method according to any one of claims 1 to 5.
7. A terminal device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the single cell transcriptome data availability processing method according to any one of claims 1 to 5 when the computer program is executed by the processor.
CN202310126779.0A 2022-11-02 2022-11-02 Single cell transcriptome data availability processing method, medium and equipment Active CN116072217B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310126779.0A CN116072217B (en) 2022-11-02 2022-11-02 Single cell transcriptome data availability processing method, medium and equipment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310126779.0A CN116072217B (en) 2022-11-02 2022-11-02 Single cell transcriptome data availability processing method, medium and equipment
CN202211363139.3A CN115424668B (en) 2022-11-02 2022-11-02 Single-cell transcriptome data availability analysis method, medium and equipment

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202211363139.3A Division CN115424668B (en) 2022-11-02 2022-11-02 Single-cell transcriptome data availability analysis method, medium and equipment

Publications (2)

Publication Number Publication Date
CN116072217A true CN116072217A (en) 2023-05-05
CN116072217B CN116072217B (en) 2023-07-25

Family

ID=84208051

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202211363139.3A Active CN115424668B (en) 2022-11-02 2022-11-02 Single-cell transcriptome data availability analysis method, medium and equipment
CN202310126779.0A Active CN116072217B (en) 2022-11-02 2022-11-02 Single cell transcriptome data availability processing method, medium and equipment

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202211363139.3A Active CN115424668B (en) 2022-11-02 2022-11-02 Single-cell transcriptome data availability analysis method, medium and equipment

Country Status (1)

Country Link
CN (2) CN115424668B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102952854A (en) * 2011-08-25 2013-03-06 深圳华大基因科技有限公司 Single cell sorting and screening method and device thereof
CN109979538A (en) * 2019-03-28 2019-07-05 广州基迪奥生物科技有限公司 A kind of analysis method based on the unicellular transcript profile sequencing data of 10X
CN112481273A (en) * 2020-12-29 2021-03-12 南通大学附属医院 Verification method for colorectal cancer suppressor gene and high DNA methylation of promoter region thereof
CN112522371A (en) * 2020-12-21 2021-03-19 广州基迪奥生物科技有限公司 Analysis method of spatial transcriptome sequencing data
CN113160887A (en) * 2021-04-23 2021-07-23 哈尔滨工业大学 Screening method of tumor neoantigen fused with single cell TCR sequencing data
CN113470743A (en) * 2021-07-16 2021-10-01 哈尔滨星云医学检验所有限公司 Differential gene analysis method based on BD single cell transcriptome and proteome sequencing data
CN115058503A (en) * 2022-06-24 2022-09-16 广州市碳码科技有限责任公司 Single cell sequencing method of barcode microdroplets

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017164936A1 (en) * 2016-03-21 2017-09-28 The Broad Institute, Inc. Methods for determining spatial and temporal gene expression dynamics in single cells
CN207525245U (en) * 2017-10-19 2018-06-22 江苏苏博生物医学股份有限公司 A kind of tumour individuation gene detecting kit
CN113463202B (en) * 2020-03-31 2022-04-15 广州序科码生物技术有限责任公司 Novel RNA high-throughput sequencing method, primer group and kit and application thereof
CN112599199A (en) * 2020-12-29 2021-04-02 上海派森诺生物科技股份有限公司 Analysis method suitable for 10x single cell transcriptome sequencing data
CN115050416A (en) * 2021-03-08 2022-09-13 中国科学院上海营养与健康研究所 Single cell transcriptome calculation analysis method and system fused with deep learning model
CN114944193A (en) * 2022-05-20 2022-08-26 南开大学 Analysis method and system for integrating single-cell transcriptome and spatial transcriptome data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102952854A (en) * 2011-08-25 2013-03-06 深圳华大基因科技有限公司 Single cell sorting and screening method and device thereof
CN109979538A (en) * 2019-03-28 2019-07-05 广州基迪奥生物科技有限公司 A kind of analysis method based on the unicellular transcript profile sequencing data of 10X
CN112522371A (en) * 2020-12-21 2021-03-19 广州基迪奥生物科技有限公司 Analysis method of spatial transcriptome sequencing data
CN112481273A (en) * 2020-12-29 2021-03-12 南通大学附属医院 Verification method for colorectal cancer suppressor gene and high DNA methylation of promoter region thereof
CN113160887A (en) * 2021-04-23 2021-07-23 哈尔滨工业大学 Screening method of tumor neoantigen fused with single cell TCR sequencing data
CN113470743A (en) * 2021-07-16 2021-10-01 哈尔滨星云医学检验所有限公司 Differential gene analysis method based on BD single cell transcriptome and proteome sequencing data
CN115058503A (en) * 2022-06-24 2022-09-16 广州市碳码科技有限责任公司 Single cell sequencing method of barcode microdroplets

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
AARON T. L. LUN: "EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data", 《GENOME BIOLOGY》, pages 1 - 9 *
凌明毅: "基于Nextflow的单细胞转录组数据分析框架设计", 《中国优秀硕士学位论文全文数据库基础科学辑》, no. 10, pages 006 - 146 *
陈志锋: "单细胞水平原钙粘蛋白基因簇的转录分析", 《中国优秀硕士学位论文全文数据库基础科学辑》, no. 03, pages 006 - 20 *

Also Published As

Publication number Publication date
CN115424668A (en) 2022-12-02
CN116072217B (en) 2023-07-25
CN115424668B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
Erlich et al. DNA Sudoku—harnessing high-throughput sequencing for multiplexed specimen analysis
CN102682224B (en) Method and device for detecting copy number variations
US20060229823A1 (en) Methods and computer software products for analyzing genotyping data
US20210381056A1 (en) Systems and methods for joint interactive visualization of gene expression and dna chromatin accessibility
KR20150107718A (en) Visualization tools for digital pcr data
CN112599199A (en) Analysis method suitable for 10x single cell transcriptome sequencing data
CN112349346A (en) Method for detecting structural variations in genomic regions
CN116486916B (en) Single cell transcriptome dying cell and multicellular filtration method, medium and equipment
US20190073445A1 (en) Identifying false positive variants using a significance model
JP4113189B2 (en) Method and system for detecting error spots on DNA chip
CN115472222B (en) Single cell transcriptome RNA pollution identification method, medium and equipment
CN116072217B (en) Single cell transcriptome data availability processing method, medium and equipment
US8086412B2 (en) Corrective methodology for processing results of transcriptome experiments obtained by differential analysis
CN116564419B (en) Space transcriptome characteristic enrichment difference analysis method and application thereof
CN107273715A (en) A kind of detection method and device
CN115948521A (en) Method for detecting aneuploid missing chromosome information
Reed et al. Identifying individual DNA species in a complex mixture by precisely measuring the spacing between nicking restriction enzymes with atomic force microscope
CN116525004B (en) Single cell expression pattern difference evaluation method, medium and device based on two groups of comparison
CN109554496A (en) The method of transposons activity change in a kind of plant of detection Stress treatment front and back
CN106661633A (en) Method for detecting presence/absence of fetal chromosomal aneuploidy
Wainer-Katsir et al. BIRD: identifying cell doublets via biallelic expression from single cells
Kick et al. CRMetrics-an R package for Cell Ranger Filtering and Metrics Visualisation
Hu et al. Detecting differential alternative splicing events in scRNA-seq with or without UMIs
CN118471332A (en) Computing device and method for identifying FFPE sample second-generation sequencing data artificial chimeric reads
CN116312786A (en) Single cell expression pattern difference evaluation method based on multi-group comparison

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant