CN103714263B - The wrong two-way side identification of two-way multistep De Bruijns and minimizing technology - Google Patents

The wrong two-way side identification of two-way multistep De Bruijns and minimizing technology Download PDF

Info

Publication number
CN103714263B
CN103714263B CN201310672170.XA CN201310672170A CN103714263B CN 103714263 B CN103714263 B CN 103714263B CN 201310672170 A CN201310672170 A CN 201310672170A CN 103714263 B CN103714263 B CN 103714263B
Authority
CN
China
Prior art keywords
way
bruijns
coverage
multistep
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310672170.XA
Other languages
Chinese (zh)
Other versions
CN103714263A (en
Inventor
孟金涛
张慧琳
彭丰斌
魏彦杰
冯圣中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201310672170.XA priority Critical patent/CN103714263B/en
Publication of CN103714263A publication Critical patent/CN103714263A/en
Application granted granted Critical
Publication of CN103714263B publication Critical patent/CN103714263B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention discloses the identification of wrong two-way side and the minimizing technology of a kind of two-way multistep De Bruijns, including step, S1, reading sequencing data source file, and constructs two-way multistep De Bruijns;The weighted average coverage W on all two-way side of the whole two-way multistep De Bruijns of S2, statistics;All sides of S3, the whole two-way multistep De Bruijns of traversal, 0.25 times by coverage less than mean coverage W two-way when being defined as wrong two-way, and remove it.The present invention can partly remove the mistake of generally existing in De Bruijns, and above-mentioned mistake includes false links, Tip types mistake, alveolitoid mistake;Contigs will be allow to continue extension, De Bruijns continue to shrink;The two-way side of mistake can be effectively found, and avoids the destruction to correct two-way side, such that it is able to improve the length of contigs to a certain extent, improve the quality of contig.

Description

The wrong two-way side identification of two-way multistep De Bruijns and minimizing technology
Technical field
The present invention relates to gene sequencing field, and in particular to the identification of the wrong two-way side of two-way multistep De Bruijns with Minimizing technology.
Background technology
Gene sequencing with algorithm and Mathematical Modeling as core, including:The storage of gene data and acquisition, sequence ratio To, sequencing with splicing, predictive genes, biological evolution and Phylogenetic Analysis, protein structure prediction, RNA structure predictions, molecule Design and drug design, metabolism network analysis, genetic chip, DNA calculating etc..Biotechnology and Computerized Information Processing Tech Combine closely, accelerate the speed for the treatment of biological data so that biology is made within the time short to the greatest extent accurately being annotated Release, accelerate the development of bioinformatics.
Gene sequencing is that magnanimity gene sequence data is analyzed, so as to extract and dig know according to new biological information Know.It is related to machine learning in computer technology, pattern-recognition, books analysis and excavation, Combinational Mathematics, stochastic model, word Symbol string, pattern algorithm, Distributed Calculation, high-performance calculation, parallel computation etc..
Gene is the most basic genetic code of the mankind, represents everyone life-information.Something lost is there is in gene order The nuance of open position point, the polymorphism of these genetic codes has quite close with the health of the mankind, pathogenesis, therapeutic treatment The relation cut.
Since being come out from Sanger sequencing technologies in 1977, by the development of more than 30 years, DNA sequencing technology development is prominent to fly Push ahead vigorously, the second generation sequencing technologies with the characteristics of high flux, short sequence gradually dominate the market, with the characteristics of single-molecule sequencing Three generations's sequencing technologies also engender, occupy different advantages in sequencing feature respectively.The number of traditional gene order surveying method It is more perfect at present according to extracting with analysis software by the research and development over nearly 10 years.But, the hair of sequencing technologies Exhibition, brings the change of sequencing data so that the data processing software that there is currently can not meet current biological medical research Demand.
High-flux sequence method of new generation can in a short time complete the survey of whole gene group data in the application of technology It is fixed.The analysis and processing method of the gene data also simultaneously to obtaining with rapid changepl. never-ending changes and improvements of high-flux sequence method proposes challenge.Mesh Before, the wide bioinformatics platform of the mass data processing of high throughput sequencing technologies can be met in the urgent need to exploitation.In face of personal base Because of group plan and the personalized medicine prospect in future, the sequencing technologies of high efficiency, low cost turn into inevitable trend.Meanwhile, simplify high The complete sequencing solution such as one-stop complete bioinformatic data analysis platform of effect, be also it is particularly important can not or Scarce developing direction.
Although but the high-flux sequence method sequencing throughput of a new generation is high, can but introduce sequencing error, while surveying Sequence sample is in itself due to gene mutation, SNP, the uneven two-way multistep De Bruijns that will be constructed when genome is assembled of sequencing It is middle introduce mistake summit, and mistake two-way side.And the two-way side of these wrong summits and mistake is to whole De Bifurcated is readily incorporated in Bruijn, and hinders graph embedding process.
The assembling of the short genetic fragment that the high-flux sequence method of a new generation is produced causes substantial amounts of sequencing mistake, increases The amount of calculation of packing algorithm.Substantial amounts of sequencing mistake so that assembly defect rate increases, and has had a strong impact on assembling result.
Current packing algorithm strategy is divided into two classes, and one is the algorithm based on Overlap-Layout-Consensus (OLC), Another is the algorithm based on DeBruijn figures.Wherein, based on OLC packing algorithms exploitation software, such as SSAKE, VCAKE, SHARCGS etc., more takes advantage in gene length sequence assembling, but is not fully applied to the short sequence assembling of a new generation.With OLC packing algorithms are different, and DeBruijn algorithms no longer organize data in units of read, but enter line number in units of k-mers According to assembling, its advantage mainly has the following aspects:First, sequence assembling is carried out in units of k-mers, node is not influenceed Quality, reduces amount of redundant data;Secondly, repeat region only occurs once in figure, is easy to identification, can avoid the group of mistake Dress, reduces error rate;Finally, the strategy that there will be overlapping region to be mapped on same arc is taken, so as to simplify searching route. At present, many short sequence assembling algorithms all use this framework, such as Velvet, IDBA, SOAPdenovo, ABySS.
Velvet effectively utilizes De Bruijns, realizes efficient short sequence assembling.Velvet is with k-mer as base Our unit builds De Bruijns, and using the structure of figure, with reference to corresponding sequence signature, the construction of simplification figure eventually finds one Bar optimal path completes assembling process.In three kinds of structures that Velvet produces the data that focus concentrates on mistake, i.e. tip, Bubble and erroneous connection.According to length principle and minority principle, by length going less than 2k Remove;Merge bubble using the depth-first search strategy in Tour Bus algorithms, finally eliminated using coverage threshold method erroneous connection.The method also takes full advantage of paired-end both end informations, further solves repeat and asks Topic, optimizes assembling effect.Velvet makes full use of the structural property of figure, simplifies data redundancy, speed relatively before algorithm There is very big improvement.Although it does not carry out error correction in pretreatment stage to sequence, its prevention mechanism to mistake, very The defect that compensate for this respect in big degree.This causes that it is preferably applied in the assembling of large-scale genome sequence.
IDBA is also based on De Bruijns, realizes easily and efficiently short sequence assembling.IDBA is basic with k-mer Unit, using a k codomain (Kmin-Kmax) for change, the length of k-mers is obtained instead of using fixed k values.Due to Genome is filled with k-mers for unit, it will usually form many overlapped elements so that assembling is faced with errors present assembling, top Point missing and the low problem of coverage.The size of correct selection k values turns into a key factor of assembling.Some mistakes The generation of reads, also causes to generate substantial amounts of branching.K values are smaller, and branching problems are more serious, and k values are bigger, The reapt regions for then occurring then tail off, and directly affects the quality of assembling.IDBA is assembled using unfixed k values, very well The quality for solving the problems, such as branching, improve assembling.Other IDBA is made by deleting the wrong k-mers of low coverage rate The memory usage for obtaining IDBA is substantially reduced, while also improving the processing speed of IDBA.
SOAPdenovo is capable of the assembling for completing hundreds of millions of reads of high-effect high-quality.SOAPdenovo is inherited The advantage of OLC algorithms and De Bruijn algorithms so that its assembling quality is greatly improved.SOAP is by preset k-mer threshold values Method, takes filtering, the mode of error correction reduces the generation of faulty sequence.Meanwhile, use for reference the method success of Velvet softwares Bubble is processed so that its mean coverage increases.In addition, SOAPdenovo make use of the both end information to carry out overlay region Domain matches, and merges read generation contig fragments, graph structure of the generation based on contig, so that, SOAPdenovo is significantly simple The complexity of contig figures is changed.
ABySS introduces the thought of parallel computation, builds a Linux system in cluster, and a distributed De is set up on cluster Bruijn structure, data distribution formula is stored on each node.Its use MPI communication mechanisms complete node between it is mutual Communication.From figure, correction process to fixed point fusion below is built, the reproduction of whole gene group sequence is finally completed, it is in operation Time and memory consumption aspect occupy very big advantage, and its error rate is extremely low, single in aspect of performance particularly cluster Machine internal memory is had greatly improved using upper, is increasingly widely applied.
But the above error correction goes the strategy of mistake to be all based on the homogeneous hypotheses of abundance, but actual sequencing middle part Point summit is erroneously interpreted as wrong summit because sequencing abundance ratio is relatively low, and those are in the SNP or prominent in repetitive sequence Change but due to going abundance too high, and may be considered as correct summit.Strategy more than in addition is both for De Bruijns Summit classified and gone mistake, and be directed to De Bruijns, be based particularly on the side of two-way multistep De Bruijns Mistake is classified and goes the work of mistake not relate to but.
The content of the invention
Present invention aim at solving the problems, such as prior art, there is provided a kind of mistake of two-way multistep De Bruijns Two-way side recognizes and minimizing technology by mistake.
Technical scheme includes a kind of wrong two-way side identification of two-way multistep De Bruijns and removal side Method, including step,
S1, reading sequencing data source file, and construct two-way multistep De Bruijns;
The weighted average coverage W on all two-way side of the whole two-way multistep De Bruijns of S2, statistics;
All sides of the whole two-way multistep De Bruijns of S3, traversal, are less than mean coverage W's by coverage 0.25 times two-way when being defined as wrong two-way, and remove it.
Preferably, the De Bruijns constitution step is,
S11, one sequence s of reading;
S12, sequence s sliding windows are cut into multiple fragment t, choose fragment t its conventional number and be cur and mark The conventional number of its forward and backward fragment is respectively pre, lat;
If the coding of S13, t is encoded less than its complementary fragment, pre, the value of lat are exchanged;
S14, the side for pointing to pre to represent in the corresponding bit positions 1 of the forward position mapping table of cur;
S15, the side for pointing to lat to represent in the corresponding bit positions 1 of the reverse position mapping table of cur;
S16, repeat step S12-S15, other fragments t for the treatment of sequence s, until completing whole fragment t of sequence s, hold Row step S17;
S17, reading one new sequence s, repeat step S12-S16;Until having processed all of sequence, step is performed S18;
S18, the construction for completing two-way multistep de Bruijns.
Preferably, the step S2 includes,
S21, the initialization weighted average coverage W are 0, and the coverage summation Sum on all sides is 0, the length on all sides Degree Len is 0;
The whole two-way multistep De Bruijns of S22, traversal, access each summit V in figure, to each of the summit V Bar side performs step S23;
S23, assume treatment summit V i-th side, take i-th while coverage V.multiplicity [i] be multiplied by while Weighted value V.arcs [i] .length (), and be accumulated in coverage summation Sum;Simultaneously by the length value V.arcs of each edge [i] .length () is accumulated in the length Len on side, until all sides of summit V are all accumulated;
S24, the weighted average coverage W of the whole two-way multistep De Bruijns of calculating are:
W=Sum/Len。
Preferably, the step S3 includes,
The whole two-way multistep De Bruijns of S31, traversal, access each summit V in figure;To the summit V's Step S32 is performed per a line;
S32, i-th side for assuming treatment summit V, take i-th coverage V.multiplicity [i] on side, obtain i-th Weighting coverage wi=V.multiplicity [i] on bar side;
If S33, wi<W*0.25, then delete i-th side, i.e. coverage V.multiplicity [i]=0, the weight on side Value V.arcs [i]=string (" ");
If S34, i<8, then continue step S32, otherwise into step S35;
S35, judge whether all of side is processed, be to terminate, otherwise return to step S31 is continued with.
Beneficial effects of the present invention include:The mistake of generally existing in De Bruijns, above-mentioned mistake can partly be removed Including false links, Tip types mistake, alveolitoid mistake;Contigs will be allow to continue extension, De Bruijns continue to shrink; The two-way side of mistake can be effectively found, and avoids the destruction to correct two-way side, such that it is able to carry to a certain extent The length of contigs high, improves the quality of contig.
Brief description of the drawings
Fig. 1 is the identification process figure on the wrong two-way side of one embodiment of the invention.
Fig. 2 is the removal flow chart on the wrong two-way side of one embodiment of the invention.
Specific embodiment
The present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.
The embodiment of the present invention is solved in high flux gene sequencing data, the mistake produced due to the inaccurate of sequencing instrument Sequencing sequence.And the mistake that the two-way multistep De Bruijns that above-mentioned mistake will be constructed to follow-up sequence assembling algorithm are introduced Two-way side is missed, and the two-way side of above-mentioned mistake can produce in target figure:(1) false links, (2) Tip type mistakes, (3) alveolitoid Mistake, above-mentioned mistake and source genome sequence repetitive sequence in itself, gene mutation point position etc. are stirred together, will make Contigs cannot extend, and De Bruijns cannot continue to shrink, and follow-up gene sequencing cannot be carried out effectively.
The identification of wrong two-way side and the minimizing technology of a kind of two-way multistep De Bruijns, including step are provided,
S1, reading sequencing data source file, and construct two-way multistep De Bruijns;
The weighted average coverage W on all two-way side of the whole two-way multistep De Bruijns of S2, statistics;
All sides of the whole two-way multistep De Bruijns of S3, traversal, are less than mean coverage W's by coverage 0.25 times two-way when being defined as wrong two-way, and remove it.System multiple can also be other feasible coefficient values, no It is limited to 0.25.
The embodiment of the present invention can partly remove the mistake of generally existing in De Bruijns, and above-mentioned mistake includes mistake Link, Tip types mistake, alveolitoid mistake;Contigs will be allow to continue extension, De Bruijns continue to shrink;Can be effective Discovery mistake two-way side, and the destruction to correct two-way side is avoided, such that it is able to improve contigs to a certain extent Length, improve contig quality.
Preferably, the De Bruijns constitution step is,
S11, one sequence s of reading;
S12, sequence s sliding windows are cut into multiple fragment t, choose fragment t its conventional number and be cur and mark The conventional number of its forward and backward fragment is respectively pre, lat;
If the coding of S13, t is encoded less than its complementary fragment, pre, the value of lat are exchanged;
S14, the side for pointing to pre to represent in the corresponding bit positions 1 of the forward position mapping table of cur;
S15, the side for pointing to lat to represent in the corresponding bit positions 1 of the reverse position mapping table of cur;
S16, repeat step S12-S15, other fragments t for the treatment of sequence s, until completing whole fragment t of sequence s, hold Row step S17;
S17, reading one new sequence s, repeat step S12-S16;Until having processed all of sequence, step is performed S18;
S18, the construction for completing two-way multistep de Bruijns.
As shown in figure 1, the step S2 includes,
S21, the initialization weighted average coverage W are 0, and the coverage summation Sum on all sides is 0, the length on all sides Degree Len is 0;
The whole two-way multistep De Bruijns of S22, traversal, access each summit V in figure, to each of the summit V Bar side performs step S23;
S23, assume treatment summit V i-th side, take i-th while coverage V.multiplicity [i] be multiplied by while Weighted value V.arcs [i] .length (), and be accumulated in coverage summation Sum;Simultaneously by the length value V.arcs of each edge [i] .length () is accumulated in the length Len on side, until all sides of summit V are all accumulated;
S24, the weighted average coverage W of the whole two-way multistep De Bruijns of calculating are:
W=Sum/Len。
As shown in Fig. 2 the step S3 includes,
The whole two-way multistep De Bruijns of S31, traversal, access each summit V in figure;To the summit V's Step S32 is performed per a line;
S32, i-th side for assuming treatment summit V, take i-th coverage V.multiplicity [i] on side, obtain i-th Weighting coverage wi=V.multiplicity [i] on bar side;
If S33, wi<W*0.25, then delete i-th side, i.e. coverage V.multiplicity [i]=0, the weight on side Value V.arcs [i]=string (" ");
If S34, i<8, then continue step S32, otherwise into step S35;
S35, judge whether all of side is processed, be to terminate, otherwise return to step S31 is continued with.
The specific embodiment of present invention described above, is not intended to limit the scope of the present invention..Any basis Various other corresponding change and deformation done by technology design of the invention, should be included in the guarantor of the claims in the present invention In the range of shield.

Claims (3)

1. a kind of wrong two-way side of two-way multistep De Bruijns recognizes and minimizing technology, it is characterised in that including step:
S1, reading sequencing data source file, and construct two-way multistep De Bruijns;
The weighted average coverage W on all two-way side of the whole two-way multistep De Bruijns of S2, statistics;The step S2 includes:
S21, the initialization weighted average coverage W are 0, and the coverage summation Sum on all sides is 0, the length Len on all sides It is 0;
The whole two-way multistep De Bruijns of S22, traversal, access each summit V in figure, to every a line of the summit V Perform step S23;
S23, assume treatment summit V i-th side, take i-th while coverage V.multiplicity [i] be multiplied by while weight Value V.arcs [i] .length (), and be accumulated in coverage summation Sum;Simultaneously by the length value V.arcs [i] of each edge .length () is accumulated in the length Len on side, until all sides of summit V are all accumulated;
S24, the weighted average coverage W of the whole two-way multistep De Bruijns of calculating are:
W=Sum/Len;
All sides of the whole two-way multistep De Bruijns of S3, traversal, by coverage less than the 0.25 of mean coverage W Again two-way when being defined as wrong two-way, and remove it.
2. wrong two-way side as claimed in claim 1 recognizes and minimizing technology, it is characterised in that the De Bruijns structure Making step is:
S11, one sequence s of reading;
S12, sequence s sliding windows are cut into multiple fragment t, choose fragment t its conventional number be cur and before marking it, The conventional number of fragment afterwards is respectively pre, lat;
If the coding of S13, t is encoded less than its complementary fragment, pre, the value of lat are exchanged;
S14, the side for pointing to pre to represent in the corresponding bit positions 1 of the forward position mapping table of cur;
S15, the side for pointing to lat to represent in the corresponding bit positions 1 of the reverse position mapping table of cur;
S16, repeat step S12-S15, other fragments t for the treatment of sequence s, until completing whole fragment t of sequence s, perform step Rapid S17;
S17, reading one new sequence s, repeat step S12-S16;Until having processed all of sequence, step S18 is performed;
S18, the construction for completing two-way multistep de Bruijns.
3. wrong two-way side as claimed in claim 1 recognizes and minimizing technology, it is characterised in that the step S3 includes:
The whole two-way multistep De Bruijns of S31, traversal, access each summit V in figure;To each of the summit V Bar side performs step S32;
S32, i-th side for assuming treatment summit V, take i-th coverage V.multiplicity [i] on side, obtain i-th side Weighting coverage wi=V.multiplicity [i];
If S33, wi<W*0.25, then delete i-th side, i.e. coverage V.multiplicity [i]=0, the weighted value on side V.arcs [i]=string (" ");
If S34, i<8, then continue step S32, otherwise into step S35;
S35, judge whether all of side is processed, be to terminate, otherwise return to step S31 is continued with.
CN201310672170.XA 2013-12-10 2013-12-10 The wrong two-way side identification of two-way multistep De Bruijns and minimizing technology Active CN103714263B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310672170.XA CN103714263B (en) 2013-12-10 2013-12-10 The wrong two-way side identification of two-way multistep De Bruijns and minimizing technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310672170.XA CN103714263B (en) 2013-12-10 2013-12-10 The wrong two-way side identification of two-way multistep De Bruijns and minimizing technology

Publications (2)

Publication Number Publication Date
CN103714263A CN103714263A (en) 2014-04-09
CN103714263B true CN103714263B (en) 2017-06-13

Family

ID=50407229

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310672170.XA Active CN103714263B (en) 2013-12-10 2013-12-10 The wrong two-way side identification of two-way multistep De Bruijns and minimizing technology

Country Status (1)

Country Link
CN (1) CN103714263B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093121A (en) * 2012-12-28 2013-05-08 深圳先进技术研究院 Compressed storage and construction method of two-way multi-step deBruijn graph
CN103258145A (en) * 2012-12-22 2013-08-21 中国科学院深圳先进技术研究院 Parallel gene splicing method based on De Bruijn graph

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100063742A1 (en) * 2008-09-10 2010-03-11 Hart Christopher E Multi-scale short read assembly
CN103080333B (en) * 2010-09-14 2015-06-24 深圳华大基因科技服务有限公司 Methods and systems for detecting genomic structure variations
US20120191356A1 (en) * 2011-01-21 2012-07-26 International Business Machines Corporation Assembly Error Detection

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103258145A (en) * 2012-12-22 2013-08-21 中国科学院深圳先进技术研究院 Parallel gene splicing method based on De Bruijn graph
CN103093121A (en) * 2012-12-28 2013-05-08 深圳先进技术研究院 Compressed storage and construction method of two-way multi-step deBruijn graph

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DGraph: Algorithms for Shortgun Reads Assembly Using De Bruijn Graph;Jintao Meng等;《IFIP International Federation for Information Processing 2012》;20120908;正文第17页第2.2.1节 *

Also Published As

Publication number Publication date
CN103714263A (en) 2014-04-09

Similar Documents

Publication Publication Date Title
US10600217B2 (en) Methods for the graphical representation of genomic sequence data
US11043285B2 (en) Bioinformatics systems, apparatus, and methods executed on an integrated circuit processing platform
Koren et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads
US9014989B2 (en) Bioinformatics systems, apparatuses, and methods executed on an integrated circuit processing platform
Martin et al. Next-generation transcriptome assembly
CN103093121B (en) The compression storage of two-way multistep deBruijn figure and building method
CN107133493B (en) Method for assembling genome sequence, method for detecting structural variation and corresponding system
WO2010129301A2 (en) Method, computer-accessible medium and system for base-calling and alignment
JP2006075162A (en) Transcript mapping method of gene and system therefor
Yan et al. Scaling logical density of DNA storage with enzymatically-ligated composite motifs
CN103699819B (en) The summit extended method of elongated kmer based on multistep two-way De Bruijn inquiry
CN103699813B (en) Method for identifying and removing repeated bidirectional edges of bidirectional multistep De Bruijn graph
Garrison Graphical pangenomics
WO2016141077A1 (en) Bioinformatics systems, apparatuses, and methods executed on an integrated circuit processing platform
El-Mabrouk et al. Gene family evolution—an algorithmic framework
CN103714263B (en) The wrong two-way side identification of two-way multistep De Bruijns and minimizing technology
CN103699818B (en) Two-way side extended method based on the elongated kmer inquiries of the two-way De Bruijns of multistep
CN103699817B (en) Method for identifying and removing self-loop bidirectional edges of bidirectional multistep De Bruijn graph
CN103699814B (en) Method for identifying and removing tips of bidirectional multistep De Bruijn graph
Saggese et al. STAble: a novel approach to de novo assembly of RNA-seq data and its application in a metabolic model network based metatranscriptomic workflow
Martin Algorithms and tools for the analysis of high throughput DNA sequencing data
Jain et al. GAMS: genome assembly on Multi-GPU using string graph
Pham et al. ac4C-AFL: A high-precision identification of human mRNA N4-acetylcytidine sites based on adaptive feature representation learning
Chen Gene Sequence Assembly and Application
Masoudi-Nejad et al. De novo assembly algorithms

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant