CN103049680A - gene sequencing data reading method and system - Google Patents

gene sequencing data reading method and system Download PDF

Info

Publication number
CN103049680A
CN103049680A CN2012105920612A CN201210592061A CN103049680A CN 103049680 A CN103049680 A CN 103049680A CN 2012105920612 A CN2012105920612 A CN 2012105920612A CN 201210592061 A CN201210592061 A CN 201210592061A CN 103049680 A CN103049680 A CN 103049680A
Authority
CN
China
Prior art keywords
gene sequencing
blocks
files
sequencing data
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012105920612A
Other languages
Chinese (zh)
Other versions
CN103049680B (en
Inventor
孟金涛
魏延杰
成杰峰
冯圣中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Senris Biotechnology Shenzhen Co ltd
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201210592061.2A priority Critical patent/CN103049680B/en
Publication of CN103049680A publication Critical patent/CN103049680A/en
Application granted granted Critical
Publication of CN103049680B publication Critical patent/CN103049680B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the technical field of bioinformatics, and provides a gene sequencing data reading method, which comprises the following steps: analyzing the user parameters to determine the number of tasks; dividing sequencing data into file blocks with the same size according to the number of tasks; adjusting the starting address and the ending address of each file block; and each task reads the adjusted file block result. The invention also provides a gene sequencing data reading system and a gene sequencing data analysis device with the system. The invention realizes the parallel reading of gene sequencing data, and each file block has uniform size, and avoids dividing a sequence into two different file blocks.

Description

Gene sequencing method for reading data and system
Technical field
The present invention relates to the bioinformatics technique field, be specifically related to a kind of gene sequencing method for reading data and system.
Background technology
The order-checking of biomacromolecule is running through the development of bioinformatics from start to finish, especially to the order-checking of nucleic acid and protein.Comprise all eucaryotic cell structures and the hereditary information of vital movement in the biological genome, fundamentally instructing the Rapid development of biosome.The research that hereditary information accurate and the Real-time Obtaining biosome can effectively be guided life science.The hereditary information of sequencing technologies on can quick obtaining DNA is explained genomic diversity and complicacy comprehensively, is playing the part of more and more important role in biological information research.
In nearest several years, the sequencing technologies of a new generation has brought dramatic change to bioinformatics, has obtained remarkable development at aspects such as order-checking principle, details of operation, technological expansion.With respect to traditional Sanger sequencing, the new-generation sequencing technology platform has been avoided clone's process, directly uses joint to carry out Parallel PC R, sequencing reaction, so its data volume is largely increased, and can check order to more DNA within the shorter time.As use the Sanger sequencing to draw first human genome collection of illustrative plates front and back and expend altogether 13 years and hundreds of platform sequenator, and new-generation sequencing can be finished this work in the time in some months now.In addition, the cost of new-generation sequencing also reduces greatly.
Because (such as yellow, cucumber, panda genome) differs in size the length of genome source sequence from 100,000 bases (such as pig pox virus, Escherichia coli) to 1,000,000,000 bases, and complex environment (such as seawater, human body large intestine etc.) grand genomic data even can reach the over ten billion base, and to reach 30-100 doubly to these samples its coverage that checks order, this so that the gene order fragment that produces increase severely.The magnanimity sequence data is processed meeting consume huge internal memory, therefore the normal mode of parallel processing of using is cut apart the magnanimity sequence data, carry out in the prior art will selecting suitable sequences segmentation strategy before the gene sequencing Data Segmentation, avoid in a sequences segmentation to the two different blocks of files.
Summary of the invention
The present invention is intended to solve above-mentioned problems of the prior art, proposes a kind of gene sequencing method for reading data, comprises the steps:
Step a: customer parameter is resolved, and number sets the tasks;
Step b: the blocks of files that sequencing data is divided into formed objects according to the task number;
Step c: start address and termination address to each blocks of files are adjusted;
Steps d: each task reads the blocks of files result after adjusting.
Preferably, before described step a, also comprise the steps: task is carried out initialization, between all nodes, connect, and nodal information, mission bit stream are added up.
Preferably, described step b is specially: sequencing data is divided into the blocks of files of formed objects according to the task number, obtains reference position and the final position of each blocks of files; Described step c is specially: the starting point that the reference position of each blocks of files of step b gained is adjusted into first sequence after the described reference position; The final position of each blocks of files of step b gained is adjusted into the starting point of first sequence behind the described final position, or is adjusted into the file full stop behind the described final position.
Preferably, described steps d is carried out many viewports parallel file for each task to the blocks of files result after adjusting and is read.
Preferably, described task is process, or the thread in the program.
Preferably, described process is the MPI process.
Preferably, described customer parameter comprises hardware performance, the total size of gene sequencing data, homologous gene reference sequences length.
Preferably, the form of described gene sequencing data is FASTA form or FASTQ form.
The present invention also provides a kind of gene sequencing data reading system, comprising:
The Parameter analysis of electrochemical unit, in order to customer parameter is resolved, number sets the tasks;
Cutting unit is in order to be divided into sequencing data according to the task number blocks of files of formed objects;
Adjustment unit is adjusted in order to start address and termination address to each blocks of files;
Reading unit reads the blocks of files result after adjusting in order to each task as a result.
Preferably, described system also comprises: initialization unit in order to task is carried out initialization, connects between all nodes, and nodal information, mission bit stream is added up.
The present invention provides a kind of gene sequencing data analysis set-up in addition, and described gene sequencing data analysis set-up is provided with said gene sequencing data reading system.
Beneficial effect of the present invention is, realized that the parallel of gene sequencing data read, and each blocks of files size evenly, also avoided in a sequences segmentation to the two different blocks of files.
Description of drawings
Fig. 1 is the realization flow figure of the gene sequencing method for reading data that provides of the embodiment of the invention 1.
Fig. 2 is FASTA data layout exemplary plot.
Fig. 3 is FASTQ data layout exemplary plot.
Fig. 4 is the realization flow figure of the gene sequencing method for reading data that provides of the embodiment of the invention 2.
Fig. 5 is the parallel schematic diagram that read of the many viewports in the embodiment of the invention 2.
Fig. 6 is read distributed number figure in the blocks of files of application examples 1 of the present invention.
Fig. 7 is that the time of reading in the application examples 2 of the present invention is with task number change figure.
Fig. 8 is the structured flowchart of the gene sequencing data reading system that provides of the embodiment of the invention 4.
Fig. 9 is the structured flowchart of the gene sequencing data reading system that provides of the embodiment of the invention 5.
Embodiment
In order to make those skilled in the art better understand the application's technical scheme, below in conjunction with the accompanying drawing in the embodiment of the present application, the technical scheme in the embodiment of the present application is carried out clear, complete description.Should be appreciated that specific embodiment described herein only in order to explain the present invention, is not intended to limit the present invention.
The embodiment of the invention at first is evenly divided into each identical blocks of files of size according to the quantity of task with the gene sequencing data, again start address and the termination address of each blocks of files are adjusted, read respectively the different blocks of files result of gene sequencing data by parallel task at last.Realized that not only the parallel of gene sequencing data read, and each blocks of files size evenly, also avoided in a sequences segmentation to the two different blocks of files.
Embodiment 1
Embodiments of the invention 1 provide a kind of gene sequencing method for reading data.As shown in Figure 1, the method comprises the steps:
Step S101: customer parameter is resolved, and number sets the tasks.Customer parameter described in the present embodiment comprises rigid performance, the total size of gene sequencing data, homologous gene reference sequences length etc., according to the number of the required task of customer parameter choose reasonable.Task in the present embodiment is the MPI process.
Step S102: the blocks of files that sequencing data is divided into formed objects according to the task number.Be specially the blocks of files that sequencing data is divided into formed objects according to the task number in the present embodiment, obtain reference position and the final position of each blocks of files.Be n such as the task number, the total size of gene sequencing data is S, i(i=0 then, and 1,2 ..., n-1) reference position of individual blocks of files is i*S/n, final position is (i+1) * S/n.
Step S103: start address and termination address to each blocks of files are adjusted.Be specially the starting point that reference position with each blocks of files of step S102 gained is adjusted into first sequence after the described reference position in the present embodiment; The final position of each blocks of files of step S102 gained is adjusted into the starting point of first sequence behind the described final position, or is adjusted into the file full stop behind the described final position.Be that the starting point of first sequence is start(i behind the reference position i*S/n), the starting point of first sequence is end(i behind final position (i+1) the * S/n), or the file full stop behind final position (i+1) the * S/n is end(i).
Step S104: each task reads the blocks of files result after adjusting.Quantity is that task and the quantity of n are that the blocks of files of n is corresponding one by one in the present embodiment, and each task is clearly known the accurate location of corresponding blocks of files, by reference position start(i) to final position end(i) order reads.
Gene sequencing data layout described in the present embodiment is specially FASTA form (search sequence file) or FASTQ form (quality information file).Described FASTA formatted file as shown in Figure 2, descriptive information and the arrangement set information of every sequence have been stored, for each bar sequence information, the first row all is take "〉" be message identification, sequence mark as this sequence, and recorded this sequence information and come from chromosome position and other biological information in the species, second has subsequently recorded the sequence self-information.Described FASTQ formatted file as shown in Figure 3, the storage take the order-checking section of reading as unit, every the section of reading accounts for four lines, the first row and the third line are comprised of file identification sign and the section of reading name (ID), the first row starts with "@", the third line starts with "+", the second behavior base sequence, and fourth line is corresponding sequencing quality mark.
Those having ordinary skill in the art will appreciate that, realize that all or part of step in the present embodiment method is to come the relevant hardware of instruction to finish by program, described program can be stored in the computer read/write memory medium, and described storage medium can adopt ROM/RAM, disk, CD etc.
Method by the present embodiment has realized that not only the parallel of gene sequencing data read, and each blocks of files size evenly, also avoided in a sequences segmentation to the two different blocks of files.
Embodiment 2
Embodiments of the invention 2 provide a kind of gene sequencing method for reading data.As shown in Figure 4, the method comprises the steps:
Step S201: task is carried out initialization, between all nodes, connect, and nodal information, progress information are added up.Carry out task initialization in the present embodiment, obtain with all computer node information of calculating, add up with the task identification of group communication number and all potential task numbers that can participate in group communication.Task in the present embodiment is the MPI process.
Step S202: customer parameter is resolved, and number sets the tasks.
Step S203: the blocks of files that sequencing data is divided into formed objects according to the task number.
Step S204: start address and termination address to each blocks of files are adjusted.
Above step is described in detail in embodiment 1, does not give unnecessary details one by one at this.
Step S205: each task is carried out many viewports parallel file to the blocks of files result after adjusting and is read.Each task is carried out many viewports parallel file to the blocks of files result after adjusting and is read in the present embodiment, according to the actual requirements the file type of data is classified, whether the data that limit in the described file type according to different file types again can be accessed by viewport, and the data that can not be accessed by viewport are sightless to viewport.The other types that for example file type can be divided into fundamental type and from fundamental type, derive, limiting fundamental type is to be accessed by viewport again, and other types are that viewport is sightless, and as shown in Figure 5, the data of seeing from viewport are fundamental type.Certainly, also can carry out to data the classification of other modes, can be accessed by viewport as the file type of paying close attention in the gene sequencing data analysis being defined as, will to be defined as viewport sightless with the not high file type of the gene sequencing data analysis degree of correlation.
Those having ordinary skill in the art will appreciate that, realize that all or part of step in the present embodiment method is to come the relevant hardware of instruction to finish by program, described program can be stored in the computer read/write memory medium, and described storage medium can adopt ROM/RAM, disk, CD etc.
Application examples 1
Utilize the gene sequencing method for reading data of embodiment 2 to read saccharomycete Solexa sequenator sequencing data.At first consider according to customer parameters such as the total size of hardware performance saccharomycete gene order-checking data, homologous gene reference sequences length, selecting number of tasks is 16.Then according to the task number sequencing data is divided into the blocks of files of formed objects; And start address and the termination address of each blocks of files adjusted, blocks of files information is as shown in table 1, and wherein, each all represents a character in the data file, and unit is bit.
Table 1 saccharomycete sequencing data blocks of files information table
Figure BDA00002698586000061
Read quantity in each blocks of files is added up, the result as shown in Figure 6, contained read quantity differs less in each blocks of files, read quantity is evenly distributed in each blocks of files.
Application examples 2
Utilize the gene sequencing method for reading data of embodiment 2 to read saccharomycete Solexa sequenator sequencing data.It is 1,2,3,4,5 that number of tasks is set respectively ..., 16, calculate respectively gene sequencing and read the time.The result as shown in Figure 7, when number of processes was 1-10, the time of reading reduced along with the increase of number of processes, after number of processes reached 10, the variation of reading the time tended towards stability because task quantity is when reaching 10, the IO value of storage system reaches capacity.
Embodiment 3
Embodiments of the invention 3 provide a kind of gene sequencing method for reading data.Utilize the different threads in the program to finish reading of gene sequencing data by a high performance large scale computer in the present embodiment, the method comprises the steps:
Step S301: customer parameter is resolved the number of threads of determine procedures.
Step S302: the blocks of files that sequencing data is divided into formed objects according to the number of threads of program.
Step S303: start address and termination address to each blocks of files are adjusted.
Step S304: each thread carries out file to the blocks of files result after adjusting and reads.
Step S302 is described in detail in embodiment 1 to step S304, does not give unnecessary details one by one at this.
Embodiment 4
The embodiment of the invention 4 provides a kind of gene sequencing data reading system.As shown in Figure 8, for convenience of description, the part relevant with the embodiment of the invention only is shown.
See also Fig. 8, described gene sequencing data reading system 1 comprises Parameter analysis of electrochemical unit 11, cutting unit 12, adjustment unit 13 and reading unit 14 as a result.
In the gene sequencing data read process, the 11 pairs of customer parameters in Parameter analysis of electrochemical unit are resolved, and number sets the tasks.The task number that cutting unit 12 is determined according to Parameter analysis of electrochemical unit 11 is divided into sequencing data the blocks of files of formed objects.Start address and the termination address of each blocks of files that 13 pairs of cutting units 12 of adjustment unit are cut apart are adjusted.Blocks of files result after reading unit 14 is adjusted adjustment unit 13 as a result reads.
Be the identical blocks of files of size by cutting unit 12 with the gene sequencing Data Segmentation, can guarantee that sequence contained in each blocks of files is read the hop count amount suitable, guarantee the sequence section of reading being evenly distributed in each blocks of files.Adjustment unit 13 is adjusted start address and the termination address of each blocks of files that cutting unit 12 is cut apart, and guarantees that a sequence can not be split in two different blocks of files.Reading unit 14 is that task and the quantity of n are that the blocks of files of n is corresponding one by one with quantity as a result, and each task is clearly known the accurate location of corresponding blocks of files, is sequentially read by the final position of the reference position after adjusting after adjust.For example, can carry out many viewports parallel file to the blocks of files result after adjusting reads.
Embodiment 5
The embodiment of the invention 5 provides a kind of gene sequencing data reading system.As shown in Figure 9, for convenience of description, the part relevant with the embodiment of the invention only is shown.
See also Fig. 8, described gene sequencing data reading system 1 comprises initialization unit 10, Parameter analysis of electrochemical unit 11, cutting unit 12, adjustment unit 13 and reading unit 14 as a result.
10 pairs of MPI programs of initialization unit are carried out initialization, connect between all nodes, and nodal information, progress information are added up.Initialization unit 10 is carried out the MPI program initialization in the present embodiment, obtains with all computer node information of calculating, adds up with the process identification number of group communication and all potential number of processes that can participate in group communication.
Parameter analysis of electrochemical unit 11, cutting unit 12, adjustment unit 13 and as a result reading unit 14 in embodiment 4, be described in detail, do not give unnecessary details one by one at this.
Embodiment 6
The embodiment of the invention 6 provides a kind of gene sequencing data analysis set-up, and this gene sequencing data analysis set-up is provided with the gene sequencing data reading system that embodiment 4 or embodiment 5 provide.The specific works principle is as indicated above, does not give unnecessary details one by one at this.
The gene sequencing data analysis set-up that the present embodiment provides has been realized the parallel parsing of gene sequencing data, and each blocks of files size evenly, also avoided in a sequences segmentation to the two different blocks of files.
Above-described embodiment of the present invention does not consist of the restriction to protection domain of the present invention.Any modification of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., all should be included within the claim protection domain of the present invention.

Claims (11)

1. a gene sequencing method for reading data is characterized in that, comprises the steps:
Step a: customer parameter is resolved, and number sets the tasks;
Step b: the blocks of files that sequencing data is divided into formed objects according to the task number;
Step c: start address and termination address to each blocks of files are adjusted;
Steps d: each task reads the blocks of files result after adjusting.
2. gene sequencing method for reading data according to claim 1 is characterized in that, also comprises the steps: task is carried out initialization before described step a, connects between all nodes, and nodal information, mission bit stream are added up.
3. gene sequencing method for reading data according to claim 1 is characterized in that, described step b is specially: sequencing data is divided into the blocks of files of formed objects according to the task number, obtains reference position and the final position of each blocks of files; Described step c is specially: the starting point that the reference position of each blocks of files of step b gained is adjusted into first sequence after the described reference position; The final position of each blocks of files of step b gained is adjusted into the starting point of first sequence behind the described final position, or is adjusted into the file full stop behind the described final position.
4. gene sequencing method for reading data according to claim 1 is characterized in that, described steps d is carried out many viewports parallel file for each task to the blocks of files result after adjusting and read.
5. each described gene sequencing method for reading data is characterized in that according to claim 1-4, and described task is process, or the thread in the program.
6. gene sequencing method for reading data according to claim 5 is characterized in that, described process is the MPI process.
7. gene sequencing method for reading data according to claim 1 is characterized in that, described customer parameter comprises hardware performance, the total size of gene sequencing data, homologous gene reference sequences length.
8. gene sequencing method for reading data according to claim 1 is characterized in that, the form of described gene sequencing data is FASTA form or FASTQ form.
9. a gene sequencing data reading system is characterized in that, comprising:
The Parameter analysis of electrochemical unit, in order to customer parameter is resolved, number sets the tasks;
Cutting unit is in order to be divided into sequencing data according to the task number blocks of files of formed objects;
Adjustment unit is adjusted in order to start address and termination address to each blocks of files;
Reading unit reads the blocks of files result after adjusting in order to each task as a result.
10. gene sequencing data reading system according to claim 9 is characterized in that, described system also comprises: initialization unit in order to task is carried out initialization, connects between all nodes, and nodal information, mission bit stream is added up.
11. a gene sequencing data analysis set-up is characterized in that, described gene sequencing data analysis set-up is provided with each described gene sequencing data reading system such as claim 9-10.
CN201210592061.2A 2012-12-29 2012-12-29 gene sequencing data reading method and system Active CN103049680B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210592061.2A CN103049680B (en) 2012-12-29 2012-12-29 gene sequencing data reading method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210592061.2A CN103049680B (en) 2012-12-29 2012-12-29 gene sequencing data reading method and system

Publications (2)

Publication Number Publication Date
CN103049680A true CN103049680A (en) 2013-04-17
CN103049680B CN103049680B (en) 2016-09-07

Family

ID=48062314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210592061.2A Active CN103049680B (en) 2012-12-29 2012-12-29 gene sequencing data reading method and system

Country Status (1)

Country Link
CN (1) CN103049680B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559020A (en) * 2013-11-07 2014-02-05 中国科学院软件研究所 Method for realizing parallel compression and parallel decompression on FASTQ file containing DNA (deoxyribonucleic acid) sequence read data
CN104657627A (en) * 2013-11-18 2015-05-27 广州中国科学院软件应用技术研究所 Searching and determining method and system started from FASTQ format read segment
CN106096332A (en) * 2016-06-28 2016-11-09 深圳大学 Parallel fast matching method and system thereof towards the DNA sequence stored
CN106407743A (en) * 2016-08-31 2017-02-15 上海美吉生物医药科技有限公司 Cluster-based high-throughput data analyzing method
CN106603591A (en) * 2015-10-14 2017-04-26 北京聚道科技有限公司 Processing method and system facing transmission and preprocessing of genome detection data
CN107145766A (en) * 2017-03-27 2017-09-08 中国科学院深圳先进技术研究院 Gene order read method and reading system
CN107169313A (en) * 2017-03-29 2017-09-15 中国科学院深圳先进技术研究院 The read method and computer-readable recording medium of DNA data files
CN109616156A (en) * 2018-12-03 2019-04-12 郑州云海信息技术有限公司 A kind of gene sequencing date storage method and device
CN109997194A (en) * 2016-11-03 2019-07-09 伊路米纳有限公司 The system and method for exceptional value conspicuousness evaluation
CN110506272A (en) * 2016-10-11 2019-11-26 基因组系统公司 For accessing with the method and apparatus of the biological data of access unit structuring
CN110750362A (en) * 2019-12-19 2020-02-04 深圳华大基因科技服务有限公司 Method and apparatus for analyzing biological information, and storage medium
CN111326216A (en) * 2020-02-27 2020-06-23 中国科学院计算技术研究所 Rapid partitioning method for big data gene sequencing file
CN113192558A (en) * 2021-05-26 2021-07-30 北京自由猫科技有限公司 Reading and writing method for third-generation gene sequencing data and distributed file system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
曹宗雁 等: "超大规模序列比对计算的并行优化", 《计算机应用》 *
郭新 等: "基于大规模序列比对软件的并行优化方案", 《计算机工程》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559020B (en) * 2013-11-07 2016-07-06 中国科学院软件研究所 A kind of DNA reads ordinal number according to the compression of FASTQ file in parallel and decompression method
CN103559020A (en) * 2013-11-07 2014-02-05 中国科学院软件研究所 Method for realizing parallel compression and parallel decompression on FASTQ file containing DNA (deoxyribonucleic acid) sequence read data
CN104657627B (en) * 2013-11-18 2017-12-05 广州中国科学院软件应用技术研究所 The searching of FASTQ forms read beginning and determination methods and system
CN104657627A (en) * 2013-11-18 2015-05-27 广州中国科学院软件应用技术研究所 Searching and determining method and system started from FASTQ format read segment
CN106603591A (en) * 2015-10-14 2017-04-26 北京聚道科技有限公司 Processing method and system facing transmission and preprocessing of genome detection data
CN106603591B (en) * 2015-10-14 2020-02-07 北京聚道科技有限公司 Processing method and system for genome detection data transmission and preprocessing
CN106096332A (en) * 2016-06-28 2016-11-09 深圳大学 Parallel fast matching method and system thereof towards the DNA sequence stored
CN106407743A (en) * 2016-08-31 2017-02-15 上海美吉生物医药科技有限公司 Cluster-based high-throughput data analyzing method
CN106407743B (en) * 2016-08-31 2019-03-05 上海美吉生物医药科技有限公司 A kind of high-throughput data analysing method based on cluster
CN110506272A (en) * 2016-10-11 2019-11-26 基因组系统公司 For accessing with the method and apparatus of the biological data of access unit structuring
CN110506272B (en) * 2016-10-11 2023-08-01 基因组系统公司 Method and device for accessing bioinformatic data structured in access units
CN109997194A (en) * 2016-11-03 2019-07-09 伊路米纳有限公司 The system and method for exceptional value conspicuousness evaluation
CN107145766A (en) * 2017-03-27 2017-09-08 中国科学院深圳先进技术研究院 Gene order read method and reading system
CN107169313A (en) * 2017-03-29 2017-09-15 中国科学院深圳先进技术研究院 The read method and computer-readable recording medium of DNA data files
CN109616156A (en) * 2018-12-03 2019-04-12 郑州云海信息技术有限公司 A kind of gene sequencing date storage method and device
CN110750362A (en) * 2019-12-19 2020-02-04 深圳华大基因科技服务有限公司 Method and apparatus for analyzing biological information, and storage medium
CN111326216A (en) * 2020-02-27 2020-06-23 中国科学院计算技术研究所 Rapid partitioning method for big data gene sequencing file
CN113192558A (en) * 2021-05-26 2021-07-30 北京自由猫科技有限公司 Reading and writing method for third-generation gene sequencing data and distributed file system

Also Published As

Publication number Publication date
CN103049680B (en) 2016-09-07

Similar Documents

Publication Publication Date Title
CN103049680A (en) gene sequencing data reading method and system
Zhang et al. Comprehensive profiling of circular RNAs with nanopore sequencing and CIRI-long
Wu et al. Detection of differentially methylated regions from whole-genome bisulfite sequencing data without replicates
US20200232029A1 (en) Systems and methods for mitochondrial analysis
Slatko et al. Overview of next‐generation sequencing technologies
Laver et al. Assessing the performance of the oxford nanopore technologies minion
Davis et al. Kraken: a set of tools for quality control and analysis of high-throughput sequence data
Adiconis et al. Comparative analysis of RNA sequencing methods for degraded or low-input samples
Tulin et al. A quantitative reference transcriptome for Nematostella vectensis earlyembryonic development: a pipeline for de novo assembly in emergingmodel systems
Ozsolak et al. RNA sequencing: advances, challenges and opportunities
Lake et al. Deriving the genomic tree of life in the presence of horizontal gene transfer: conditioned reconstruction
Izuogu et al. PTESFinder: a computational method to identify post-transcriptional exon shuffling (PTES) events
Ye et al. Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis
Kruse et al. A complex network framework for unbiased statistical analyses of DNA–DNA contact maps
Deschamps et al. Characterization, correction and de novo assembly of an Oxford Nanopore genomic dataset from Agrobacterium tumefaciens
CN103902852A (en) Gene expression quantitative method and device
CN105760706A (en) Compression method for next generation sequencing data
McDonald et al. The evolutionary dynamics of tRNA-gene copy number and codon-use in E. coli.
Shiau et al. High throughput single cell long-read sequencing analyses of same-cell genotypes and phenotypes in human tumors
Sauvage et al. Promising prospects of nanopore sequencing for algal hologenomics and structural variation discovery
Theis et al. RNA 3D modules in genome-wide predictions of RNA 2D structure
Florea Bioinformatics of alternative splicing and its regulation
Wang et al. UNI-RNA: universal pre-trained models revolutionize RNA research
Galata et al. Functional meta-omics provide critical insights into long-and short-read assemblies
Wong et al. SpliceWiz: interactive analysis and visualization of alternative splicing in R

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CB03 Change of inventor or designer information

Inventor after: Meng Jintao

Inventor after: Wei Yanjie

Inventor after: Cheng Jiefeng

Inventor after: Feng Shengzhong

Inventor before: Meng Jintao

Inventor before: Wei Yanjie

Inventor before: Cheng Jiefeng

Inventor before: Feng Shengzhong

CB03 Change of inventor or designer information
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211202

Address after: 518000 A-301, office building, Shenzhen Institute of advanced technology, No. 1068, Xue Yuan Avenue, Shenzhen University Town, Shenzhen, Guangdong, Nanshan District, China

Patentee after: Shenzhen shen-tech advanced Cci Capital Ltd.

Address before: 1068 No. 518055 Guangdong city in Shenzhen Province, Nanshan District City Xili University School Avenue

Patentee before: SHENZHEN INSTITUTES OF ADVANCED TECHNOLOGY

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220118

Address after: 518000 b402, blocks a and B, Nanshan medical device Industrial Park, No. 1019, Nanhai Avenue, Yanshan community, merchants street, Nanshan District, Shenzhen, Guangdong

Patentee after: Shenzhen hongzhituoxin venture capital enterprise (L.P.)

Address before: 518000 A-301, office building, Shenzhen Institute of advanced technology, No. 1068, Xue Yuan Avenue, Shenzhen University Town, Shenzhen, Guangdong, Nanshan District, China

Patentee before: Shenzhen shen-tech advanced Cci Capital Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220429

Address after: 518000 b402, blocks a and B, Nanshan medical device Industrial Park, No. 1019, Nanhai Avenue, Yanshan community, merchants street, Nanshan District, Shenzhen, Guangdong

Patentee after: Senris Biotechnology (Shenzhen) Co.,Ltd.

Address before: 518000 b402, blocks a and B, Nanshan medical device Industrial Park, No. 1019, Nanhai Avenue, Yanshan community, merchants street, Nanshan District, Shenzhen, Guangdong

Patentee before: Shenzhen hongzhituoxin venture capital enterprise (L.P.)