CN110504005A - Data processing method - Google Patents

Data processing method Download PDF

Info

Publication number
CN110504005A
CN110504005A CN201910795698.3A CN201910795698A CN110504005A CN 110504005 A CN110504005 A CN 110504005A CN 201910795698 A CN201910795698 A CN 201910795698A CN 110504005 A CN110504005 A CN 110504005A
Authority
CN
China
Prior art keywords
data
cell
gene
screening
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910795698.3A
Other languages
Chinese (zh)
Inventor
杨圳
王文山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI GENMINIX INFORMATICS CO Ltd
Original Assignee
SHANGHAI GENMINIX INFORMATICS CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI GENMINIX INFORMATICS CO Ltd filed Critical SHANGHAI GENMINIX INFORMATICS CO Ltd
Priority to CN201910795698.3A priority Critical patent/CN110504005A/en
Publication of CN110504005A publication Critical patent/CN110504005A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Genetics & Genomics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Epidemiology (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Present invention discloses a kind of data processing method, the processing method of the lower machine data in ICB-scSeq technology is related in particular to.Using the data processing method, can at low cost, simply, rapidly carry out cell sequencing, there is sizable economic benefit and safety benefit.

Description

Data processing method
Technical field
The present invention relates to a kind of data processing methods.It particularly relates to ICB-scSeq (Intelligent Combinatorial Barcoding-single cell Sequencing, the unicellular sequencing of intelligences combination bar code method) skill The processing method of lower machine data in art.
Background technique
In past 10 years, as (next generation sequencing, NGS) technology and third is sequenced in the second generation The rapid development of generation sequencing (third generation sequencing, TGS) technology, causes the huge of life science It changes.Previous research needs to obtain enough nucleic acid from a large amount of cells and is sequenced, therefore sequencing result often indicates Be cell colony characterization, and the exclusive cell characteristics of individual cells are often ignored.In order to solve above-mentioned confinement problems, Unicellular sequencing technologies come into being.
Unicellular sequencing achieves achievement abundant in fields such as tumour, Developmental Biology, Neuscience.And it is slender The research of born of the same parents originally more can rapidly expand scientific payoffs, but unicellular sequencing technologies are there is also many problems, than if you need to Using fresh cell, sample utilisation is not high, expensive equipment and related reagent etc., this research to unicellular sequencing technologies Bring many inconvenience with popularization, and also have for the extensive development of unicellular life science it is many unfavorable.Therefore optimize The new unicellular sequencing technologies of exploitation, just seem very urgent.
ICB-scSeq (Intelligent Combinatorial Barcoding-single cell Sequencing, The unicellular sequencing of intelligences combination bar code method) it is the unicellular sequencing technologies researched and developed by the present inventors, it is a kind of Based on SPLIT (split-pool ligation-based transcriptome, the transcript profile sequencing based on the connection of segmentation pond) The method that technology passes through the unicellular sequencing of combination bar code (combinatorial barcoding) labeled RNA origin of cell.
Therefore, because there are above-mentioned technological deficiency, in the sequencing approach of ICB-scSeq, it is also desirable to find a kind of more preferable The data processing method from original lower machine data to downstream analysis, can at low cost, simply, rapidly carry out cell survey Sequence.
Summary of the invention
It is an object of the present invention to overcome the deficiencies of existing technologies, providing one kind can at low cost, simply, fastly The data processing method of cell sequencing is carried out fastly.
To achieve the above object, the following technical solutions are proposed: a kind of data processing method by the present invention, which is characterized in that packet It includes:
Initial data obtaining step, carry out both-end sequencing and to for the original of the unicellular sequencing of intelligences combination bar code method Data are obtained, and first end is the part cDNA, and second end is specific molecular label and cell bar code part;
Quality control and filtration step, are filtered acquired initial data and obtain filtered data;
Step is compared, filtered data is compared with reference genome sequence and obtains comparing rear data;
Specific molecular label duplicate removal step goes to the duplicate part of specific molecular label in data after comparison Remove data after obtaining duplicate removal;
Gene quantification step carries out gene quantification to data after duplicate removal and obtains quantitatively rear data;
Expression matrix construction step constructs expression matrix according to quantitatively rear data, which includes each cell In each gene original count value;
Cell screening step, Mitochondria content and expressing gene number to expression matrix are screened after obtaining screening Matrix;
Normalizing steps are standardized the original count value of matrix after screening and obtain normalized matrix;
Analytical procedure analyzes normalized matrix.
Provided data processing method according to the present invention, can at low cost, simply, rapidly carry out cell survey Sequence has sizable economic benefit and safety benefit.
Detailed description of the invention
Fig. 1 is the schematic diagram of the data processing method of first embodiment of the invention.
Fig. 2 is the schematic diagram of gene order used in the data processing method of Fig. 1.
Fig. 3 is the schematic diagram of the duplicate removal process in the data processing method of Fig. 1.
Fig. 4 is the display diagram of the achievement of the clustering in the data processing method of Fig. 1.
Fig. 5 is the display diagram of the achievement of the access enrichment analysis in the data processing method of Fig. 1.
Fig. 6 is another display diagram of the achievement of the access enrichment analysis in the data processing method of Fig. 1.
Specific embodiment
Below in conjunction with attached drawing of the invention, clear, complete description is carried out to the technical solution of the embodiment of the present invention.
First embodiment of the invention is a kind of data processing method.
Fig. 1 is the schematic diagram of the data processing method of first embodiment of the invention.As shown in Figure 1, the data processing Method includes: initial data obtaining step, quality control and filtration step, compares step, specific molecular label duplicate removal step Suddenly, gene quantification step, expression matrix construction step, cell screening step, normalizing steps, analytical procedure.
In initial data obtaining step, carry out both-end sequencing (paired-end sequencing) and to for intelligence The initial data of the unicellular sequencing (ICB-scSeq) of combination bar code method is obtained, and first end, that is, end read1 is the portion cDNA Point, second end, that is, end read2 is specific molecular label and the part cell bar code (UMI+cell barcode).CDNA is Refer to the DNA having with certain RNA chain in complementary base sequence.UMI (Unique Molecular indentifier) is specificity Molecular label.
In quality control and filtration step, acquired initial data is filtered and obtains filtered data. In the present embodiment, it schematically illustrates quality control and filtration step includes following sub-step: to acquired original The cell bar code part of the second end of beginning data is corrected;Construct the white list of cell bar code;It is extracted according to white list The sequence of first end;The sequence of extracted first end is screened to be filtered and obtain filtered data.But this Invention is not limited to this, and quality control and filtration step also may include other sub-steps.
Specifically, there are three sections of cell barcode in each read2, be BC1, BC2, BC3 respectively, every section is all The length (as shown in Figure 2) of 8bp.And the sequence of these barcode is fixed every time.For example, if barcode1 makes It is combined with 96 kinds, then illustrating that the sequence of barcode1 only has 96 kinds in total, each is 8bp.Therefore according to hamming Distance (Hamming distance) is equal to 1 calibration principle to be corrected to every read.
In each read, BC1 is extracted, three sections of 8bp sequences of BC2, the position BC3 are used as candidate barcode Sequence (is labeled as barcode1-new, barcode2-new, barcode3-new).Then successively to barcode1-new with All sequences in list through the barcode1 determined are compared, and calculate hamming distance, are denoted as hd.If Hd is equal to 0, then without changing, if hd is equal to 1, the sequence of the barcode1-new is changed to corresponding barcode1's Sequence.To complete the correction course of barcode sequence.
After the correction for completing cell barcode, cell barcode sequence is closed according to the cell number estimated And as the unique identification of a cell (cell UID), the white list of a cell barcode is constructed.In this white list List inside, be the UID of all cells that can be identified.
The white list of the cell barcode built up according to previous step extracts the cDNA sequence inside the end read1. To the sequence of any one read1, if the cell UID inside read2 corresponding to it is in cell barcode white list Face, then this read1 will be extracted.Open-Source Tools umi-tools can be used in building white list and abstraction sequence It is handled.
After having extracted read1 sequence, it is also necessary to further be screened to sequence, mainly remove the polyA structure at end The low quality value at (being shown below) and sequence both ends.In following formula, upper row is original series, below a behavior removal The sequence of the low quality value of the polyA structure and sequence both ends at end.
In comparing step, the read1 sequence obtained according to above-mentioned screening is compared with the sequence of reference genome, The comparison, which can be used, compares software STAR to carry out.
According to comparison as a result, having obtained the bam file for having already passed through sequence.The sequence root that each is compared It is annotated according to the GTF file of reference genome, that is, carries out the specified of gene.Purpose is the sequence on clear each compares Which gene column belong to after the annotation of GTF file.This, which is specified, can be used Open-Source Tools featureCounts to complete.
In the specific molecular label duplicate removal step, according to previous step as a result, it has been found that each compares Which gene read belongs to.Because PCR-bias when ICB-scSeq continues library after sample in order to eliminate and in every sequence In introduce the UMI sequence of one section of 10bp long.In this way, if there are two identical sequences within the scope of the same gene And if the 10bp of the UMI of sequence is also identical, it is considered that this two read are from same cDNA points Son needs duplicate removal.As shown in figure 3, show five read on the left of Fig. 3, but in this five read, above three Read be it is duplicate, below two read be also duplicate, therefore after duplicate removal, the read on right side only has two.
In gene quantification step, gene quantification is carried out to data after duplicate removal and obtains quantitatively rear data.
In expression matrix construction step, expression matrix is constructed according to quantitatively rear data, which includes each The original count value (raw counts) of each gene in cell.In this matrix, each column represent a cell UID, every a line represent the ID of a gene, as shown in the table.
In cell screening step, Mitochondria content and expressing gene number to expression matrix are screened and are sieved Matrix after choosing.Specifically, the data of each cell inside expression matrix are calculated, calculates all of chondriogen The ratio of expression value just screens out this cell if this ratio is more than the threshold value of setting.Threshold value is, for example, 5%, but It is to be not limited to this, also can be set to other threshold values.Further need exist for the number to expressing gene in cell each in expression matrix Amount is screened, and general screening criteria is, for example, that the number of minimum expression is 200, and most highly expressed number is 2500, still It is not limited to this, also can be set to other ranges.Seurat can be used to carry out in screening step.By screening twice, obtain Expression matrix after one screening, can carry out the processing of next step.
In normalizing steps, the standard of obtaining is standardized to the original count value of the expression matrix after screening Change matrix.Since in unicellular sequencing procedure, the number that each cell measures reads is inhomogenous, in order to eliminating because Quantitative error caused by depth is sequenced, needs to be standardized raw counts.Normalizing steps can be used Seurat is carried out, and standardized calculation formula is as follows:
The raw counts, AllCount that wherein CountOfGene represents each gene in each cell represent each thin The sum of the raw counts of all genes in born of the same parents.
In analytical procedure, normalized matrix is analyzed.In the present embodiment, analytical procedure is in cell level For clustering step, analytical procedure is enriched with analytical procedure in gene level for variance analysis step and access, but simultaneously It is without being limited thereto, it is also possible to other suitable analytical procedures.
The analysis method of clustering is as follows.
First carry out feature extraction, to it is all measure it is unicellular carry out cluster sub-clustering analysis.First to the table after standardization The feature that high variation is calculated up to matrix, comes out these feature extractions and carries out subsequent analysis.
Then sized analysis is carried out to matrix data, in order to be eliminated as much as some data sources error (including Technical error, the error of batch error and some biological origins), recurrence processing is carried out to matrix data, excludes these Error, to improve the effect of subsequent dimensionality reduction and cluster.
Then linear Dimension Reduction Analysis is carried out, is utilized PCA (principal component analysis, principal component analysis) Method to have been subjected to sized analysis data carry out Dimension Reduction Analysis.
Then cluster grouping analysis is carried out, according to the PC (principal component) for the conspicuousness that previous step identifies, using based on figure The clustering method of shape.This method is calculated according to KNN (K-nearest neighbor, the K arest neighbors) figure and Louvain of building Method clusters to be made iteratively, and finally all cells is gathered inside different monoids.Above analytic process can be used Seurat is carried out.
The displaying of UMAP two dimension is finally carried out, as shown in figure 4, according to previous step cluster as a result, using UMAP (uniform Manifold approximation and projection, uniform manifold is approximate and projection) method carry out two-dimentional displaying. Seurat can be used to be analyzed in the methods of exhibiting.
The analysis method of clustering is as follows.
According to cluster as a result, using Wilcoxon rank sum test (Wilcoxen order to all cluster And examine) method carry out differential gene screening analysis, obtain the column of a difference expression gene about all cluster Table.As a result as shown in the table.
The analysis method of access enrichment analysis is as follows.
The enrichment analysis of the first access is GO (Gene Ontology, Gene Ontology) enrichment analysis.As shown in figure 5, root According to previous step differential gene as a result, carrying out GO enrichment analysis to the differential gene of each cluster.
In addition to GO be enriched with analyze, moreover it is possible to carry out KEGG (Kyoto Encyclopedia of Genes and Genomes, Capital of a country gene and genomic encyclopedia) access enrichment analysis, as shown in fig. 6, identifying the difference base inside each cluster It because of conspicuousness is enriched to inside which access, bubble diagram has been used to be shown.
As described above, using the data processing method of first embodiment, can at low cost, simply, rapidly into The sequencing of row cell, has sizable economic benefit and safety benefit.
It should be noted that each unit mentioned in each equipment embodiment of the present invention is all logic unit, physically, One logic unit can be a physical unit, be also possible to a part of a physical unit, can also be with multiple physics The combination of unit realizes that the Physical realization of these logic units itself is not most important, these logic units institute reality The combination of existing function is only the key for solving technical problem proposed by the invention.In addition, in order to protrude innovation of the invention Part, there is no the technical problem relationship proposed by the invention with solution is less close for the above-mentioned each equipment embodiment of the present invention Unit introduce, this does not indicate above equipment embodiment and there is no other units.
It should be noted that in the claim and specification of this patent, such as first and second or the like relationship Term is only used to distinguish one entity or operation from another entity or operation, without necessarily requiring or implying There are any actual relationship or orders between these entities or operation.Moreover, the terms "include", "comprise" or its Any other variant is intended to non-exclusive inclusion so that include the process, methods of a series of elements, article or Equipment not only includes those elements, but also including other elements that are not explicitly listed, or further include for this process, Method, article or the intrinsic element of equipment.In the absence of more restrictions, being wanted by what sentence " including one " limited Element, it is not excluded that there is also other identical elements in the process, method, article or apparatus that includes the element.
Although being shown and described to the present invention by referring to some of the preferred embodiment of the invention, It will be understood by those skilled in the art that can to it, various changes can be made in the form and details, without departing from this hair Bright spirit and scope.

Claims (7)

1. a kind of data processing method characterized by comprising
Initial data obtaining step, carry out both-end sequencing and to the initial data for the unicellular sequencing of intelligences combination bar code method It is obtained, first end is the part cDNA, and second end is specific molecular label and cell bar code part;
Quality control and filtration step, are filtered acquired initial data and obtain filtered data;
Step is compared, filtered data is compared with reference genome sequence and obtains comparing rear data;
Specific molecular label duplicate removal step, the duplicate part of specific molecular label in data after comparison is removed and Obtain data after duplicate removal;
Gene quantification step carries out gene quantification to data after duplicate removal and obtains quantitatively rear data;
Expression matrix construction step constructs expression matrix according to quantitatively rear data, which includes in each cell The original count value of each gene;
Cell screening step, square after Mitochondria content and expressing gene number to expression matrix are screened and screened Battle array;
Normalizing steps are standardized the original count value of matrix after screening and obtain normalized matrix;
Analytical procedure analyzes normalized matrix.
2. data analysing method according to claim 1, which is characterized in that
Quality control and filtration step include following sub-step:
The cell bar code part of the second end of acquired initial data is corrected;
Construct the white list of cell bar code;
The sequence of first end is extracted according to white list;
The sequence of extracted first end is screened to be filtered and obtain filtered data.
3. data analysing method according to claim 1, which is characterized in that
In cell screening step, the threshold value to the screening of Mitochondria content is 5%.
4. data analysing method according to claim 1, which is characterized in that
In cell screening step, the range to the screening of expressing gene number is 200-2500.
5. data analysing method according to claim 1, which is characterized in that
It in normalizing steps, is standardized using following calculating formula, wherein CountOfGene is represented in each cell The original count value of each gene, AllCount represent the sum of the original count value of all genes in each cell,
6. data analysing method according to claim 1, which is characterized in that
The analytical procedure is clustering step in cell level.
7. data analysing method according to claim 1, which is characterized in that
The analytical procedure is enriched with analytical procedure in gene level for variance analysis step and access.
CN201910795698.3A 2019-08-27 2019-08-27 Data processing method Pending CN110504005A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910795698.3A CN110504005A (en) 2019-08-27 2019-08-27 Data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910795698.3A CN110504005A (en) 2019-08-27 2019-08-27 Data processing method

Publications (1)

Publication Number Publication Date
CN110504005A true CN110504005A (en) 2019-11-26

Family

ID=68589756

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910795698.3A Pending CN110504005A (en) 2019-08-27 2019-08-27 Data processing method

Country Status (1)

Country Link
CN (1) CN110504005A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270953A (en) * 2020-10-29 2021-01-26 哈尔滨因极科技有限公司 Analysis method, device and equipment based on BD single cell transcriptome sequencing data

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090222506A1 (en) * 2008-02-29 2009-09-03 Evident Software, Inc. System and method for metering and analyzing usage and performance data of a virtualized compute and network infrastructure
CN102952854A (en) * 2011-08-25 2013-03-06 深圳华大基因科技有限公司 Single cell sorting and screening method and device thereof
CN106156541A (en) * 2015-03-27 2016-11-23 深圳华大基因科技有限公司 The method and apparatus analyzing the immunity difference of individual two class states
CN107273711A (en) * 2017-06-22 2017-10-20 宁波大学 A kind of shrimp disease quantitative forecasting technique based on enteron aisle bacterial indicator
CN107463801A (en) * 2017-07-31 2017-12-12 浙江绍兴千寻生物科技有限公司 A kind of Drop seq data quality controls and analysis method
CN107723343A (en) * 2017-11-28 2018-02-23 宜昌美光硅谷生命科技股份有限公司 A kind of method of gene quantification analysis
CN108319817A (en) * 2018-01-15 2018-07-24 臻和(北京)科技有限公司 The processing method and processing device of Circulating tumor DNA repetitive sequence
CN109072193A (en) * 2015-04-03 2018-12-21 达纳-法伯癌症研究所有限公司 The composition and method of B cell genome editor
CN109658981A (en) * 2018-12-10 2019-04-19 海南大学 A kind of data classification method of unicellular sequencing
CN109979538A (en) * 2019-03-28 2019-07-05 广州基迪奥生物科技有限公司 A kind of analysis method based on the unicellular transcript profile sequencing data of 10X

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090222506A1 (en) * 2008-02-29 2009-09-03 Evident Software, Inc. System and method for metering and analyzing usage and performance data of a virtualized compute and network infrastructure
CN102952854A (en) * 2011-08-25 2013-03-06 深圳华大基因科技有限公司 Single cell sorting and screening method and device thereof
CN106156541A (en) * 2015-03-27 2016-11-23 深圳华大基因科技有限公司 The method and apparatus analyzing the immunity difference of individual two class states
CN109072193A (en) * 2015-04-03 2018-12-21 达纳-法伯癌症研究所有限公司 The composition and method of B cell genome editor
CN107273711A (en) * 2017-06-22 2017-10-20 宁波大学 A kind of shrimp disease quantitative forecasting technique based on enteron aisle bacterial indicator
CN107463801A (en) * 2017-07-31 2017-12-12 浙江绍兴千寻生物科技有限公司 A kind of Drop seq data quality controls and analysis method
CN107723343A (en) * 2017-11-28 2018-02-23 宜昌美光硅谷生命科技股份有限公司 A kind of method of gene quantification analysis
CN108319817A (en) * 2018-01-15 2018-07-24 臻和(北京)科技有限公司 The processing method and processing device of Circulating tumor DNA repetitive sequence
CN109658981A (en) * 2018-12-10 2019-04-19 海南大学 A kind of data classification method of unicellular sequencing
CN109979538A (en) * 2019-03-28 2019-07-05 广州基迪奥生物科技有限公司 A kind of analysis method based on the unicellular transcript profile sequencing data of 10X

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BENJAMÍN SIGURGEIRSSON等: "Analysis of stranded information using an automated procedure for strand specific RNA sequencing", 《BMC GENOMICS》 *
P VAN LOO等: "Allele-specific copy number analysis of tumors", 《PNAS》 *
ZHAOCHENXU: "PCR Array 简单实用的检测基因表达的高通量方法", 《HTTPS://WWW.ANTPEDIA.COM/NEWS/76/N-2283776.HTML》 *
戚礼兴: "菜粉蝶不同发育阶段mRNA与miRNA转录组的高通量测序分析", 《中国优秀硕士学位论文全文数据库 农业科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270953A (en) * 2020-10-29 2021-01-26 哈尔滨因极科技有限公司 Analysis method, device and equipment based on BD single cell transcriptome sequencing data

Similar Documents

Publication Publication Date Title
CN109979538B (en) Analysis method based on 10X single cell transcriptome sequencing data
Amaratunga et al. Exploration and analysis of DNA microarray and protein array data
CN111261229B (en) Biological analysis process of MeRIP-seq high-throughput sequencing data
AU2021257920A1 (en) Variant classifier based on deep neural networks
CN109196123B (en) SNP molecular marker combination for rice genotyping and application thereof
WO2019200338A1 (en) Variant classifier based on deep neural networks
CN114999573A (en) Genome variation detection method and detection system
CN115198023A (en) Hainan cattle liquid phase breeding chip and application thereof
CN107002120A (en) Sequence measurement
CN111863127A (en) Method for constructing genetic control network of plant transcription factor to target gene
CN113470743A (en) Differential gene analysis method based on BD single cell transcriptome and proteome sequencing data
CN110504005A (en) Data processing method
Yan et al. Identification of cell-type marker genes from plant single-cell RNA-seq data using machine learning
CN117037905A (en) Ancestral information mark-based chicken variety identification method, ancestral information mark-based chicken variety identification system, ancestral information mark-based chicken variety identification equipment and ancestral information mark-based chicken variety identification medium
CN116959562A (en) Method for identifying cell subpopulations associated with disease phenotypes
CN116312783A (en) System for predicting DNA synthesis difficulty and application thereof
Rahman et al. Genetic diversity, population structure analysis and codon substitutions of Indicine Badri cattle using ddRAD sequencing
CN112102880A (en) Method for identifying variety, and method and device for constructing prediction model thereof
Klapproth et al. Tailored machine learning models for functional RNA detection in genome-wide screens
Lee et al. A beginner's guide to single-cell transcriptomics
Ostash et al. Visualizing codon usage within and across genomes: concepts and tools
Jing et al. ScSmOP: a universal computational pipeline for single-cell single-molecule multiomics data analysis
CN116168761B (en) Method and device for determining characteristic region of nucleic acid sequence, electronic equipment and storage medium
Wainer-Katsir et al. BIRD: identifying cell doublets via biallelic expression from single cells
CN117095747B (en) Method for detecting group inversion or transposon endpoint genotype based on linear ubiquitin genome and artificial intelligence model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20191126