CN107841542A - A kind of generation sequence assemble method of genome contig two and system - Google Patents

A kind of generation sequence assemble method of genome contig two and system Download PDF

Info

Publication number
CN107841542A
CN107841542A CN201610832844.1A CN201610832844A CN107841542A CN 107841542 A CN107841542 A CN 107841542A CN 201610832844 A CN201610832844 A CN 201610832844A CN 107841542 A CN107841542 A CN 107841542A
Authority
CN
China
Prior art keywords
sequence
contig
library
genome
generation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610832844.1A
Other languages
Chinese (zh)
Inventor
邓天全
贺丽娟
杨林峰
刘亚斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BGI Technology Solutions Co Ltd
Original Assignee
BGI Technology Solutions Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BGI Technology Solutions Co Ltd filed Critical BGI Technology Solutions Co Ltd
Priority to CN201610832844.1A priority Critical patent/CN107841542A/en
Publication of CN107841542A publication Critical patent/CN107841542A/en
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Microbiology (AREA)
  • Biomedical Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Plant Pathology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a kind of generation sequence assemble method of genome contig two and system, methods described to include:Sample gene group DNA is interrupted to the first predetermined length scope;Glue is cut to the DNA fragmentation interrupted and selects the second predetermined length scope, to build the library of different Insert Fragments;Double end sequencings are carried out to the DNA fragmentation of the second predetermined length scope, obtain that there is overlapping first to read long sequence and the second long sequence of reading for specific DNA fragments;First obtained to the double end sequencings in each library reads long sequence and the second long sequence of reading is spliced, and obtains the spliced sequence in each library;Sequence assembling is carried out to the spliced sequence in each library to obtain genome Contig.The method of the present invention builds storehouse by experiment and sequencing reading length is chosen, and combines splicing to realize that extending sequence reads length, is finally assembled with the sequence after extending, obtains contig, it is possible to increase the index and accuracy of genome contig assembling.

Description

A kind of generation sequence assemble method of genome contig two and system
Technical field
The present invention relates to gene sequencing technology field, more particularly to a kind of generation sequence assemble method of genome contig two and System.
Background technology
At present, (Whole-genome shotgun are sequenced with whole-genome shotgun sequencing in genome assembling project Sequencing, WGS) it is main flow design, it is mainly according to the specific feature of the repetitive sequence of genome, different length of arranging in pairs or groups The DNA Insert Fragments of degree carry out double end sequencings, can ensure single alkali in the case where the average sequencing depth of full-length genome is enough The accuracy of base and the integrality of genome.With second generation sequencing technologies (Next-generation sequencing, NGS) Maturation and popularization, sequencing cost substantially reduce, the sequencing of whole-genome shotgun sequencing based on second generation sequencing technologies turns into various The mainstream scheme of Genome Project sequencing.And the contig that whether can assemble a high quality is often related to genome skeleton One important factor in order of sequence assembling effect quality.
The contig assembled is arranged from big to small, when its cumulative length is just beyond all assembling sequence total lengths When 50%, the size of last contig (Contig) is N50 size, and N50 has to the integrality for evaluating gene sequencing Significance.N60 from big to small arranges the contig assembled, when its cumulative length is total just beyond all assembling sequences During length 60%, the size of last contig (Contig) is N60 size.N70, N80, N90 are by that analogy.
The content of the invention
Sequence reads long (reads) and genome assembling effect is had a major impact, and method and system of the invention passes through experiment Build storehouse and sequencing reading length is chosen, and combine splicing to realize that extending sequence reads length, is finally assembled with the sequence after extending, obtained To contig, it is possible to increase the index and accuracy of genome contig assembling.
According to the first aspect of the invention, the present invention provides a kind of generation sequence assemble method of genome contig two, including: Sample gene group DNA is interrupted to the first predetermined length scope;Glue is cut to the DNA fragmentation interrupted and selects the second predetermined length model Enclose, to build the library of different Insert Fragments;Double end sequencings, pin are carried out to the DNA fragmentation of above-mentioned second predetermined length scope Specific DNA fragments are obtained with there is overlapping first to read long sequence and the second long sequence of reading;The double end sequencings in each library are obtained To first read long sequence and second and read long sequence and spliced, obtain the spliced sequence in each library;Each library is spelled Sequence after connecing carries out sequence assembling to obtain genome Contig.
Further, above-mentioned sample gene group DNA is interrupted using ultrasound.
Further, above-mentioned first predetermined length scope is 100bp-600bp or 100bp-500bp.
Further, above-mentioned second predetermined length scope be 170bp-180bp, 260bp-280bp, 450bp-470bp or 550bp-570bp。
Further, the long sequence of above-mentioned first reading and the second sequence length for reading long sequence are 100bp-300bp.
Further, the long sequence of above-mentioned first reading and the second sequence length for reading long sequence are 100bp, 150bp, 250bp Or 300bp.
Further, it is above-mentioned to be specifically to the spliced sequence in each library progress sequence assembling:By two generation sequencing sequences The short sequence K-mer that length is K is intercepted out successively;By K-mer storages into hash table, the summit of de Brujin graph is formed; The K-mer of successive is connected on sequencing sequence, forms the side of de Brujin graph;All sequencing sequences are all handled to obtain whole Individual de Brujin graph;Remove the path as caused by sequencing mistake, heterozygous sites in de Brujin graph;By linear K-mer paths Connect the contig to form the first order.
Further, above-mentioned short sequence K-mer length is 30bp-500bp.
Further, the above method also includes:Before above-mentioned splicing, sequence containing joint and low is removed by filtering off Mass-sequential.
According to the second aspect of the invention, the present invention provides a kind of generation sequence package system of genome contig two, including: Module is interrupted, for sample gene group DNA to be interrupted to the first predetermined length scope;Selecting module, for the DNA pieces to interrupting Section cuts glue and selects the second predetermined length scope, to build the library of different Insert Fragments;Sequencer module, for pre- to above-mentioned second The DNA fragmentation of measured length scope carries out double end sequencings, for specific DNA fragments obtain having overlapping first read long sequence and Second reads long sequence;Concatenation module, first for being obtained to the double end sequencings in each library, which reads long sequence and second, reads long sequence Row are spliced, and obtain the spliced sequence in each library;Module is assembled, for carrying out sequence to the spliced sequence in each library Row are assembled to obtain genome Contig.
The method and system of the present invention builds storehouse by experiment and sequencing reading length is chosen, and combines splicing to realize extension sequence Length is read, is finally assembled with the sequence after extending, obtains contig, it is possible to increase the index and standard of genome contig assembling True property.
Brief description of the drawings
Fig. 1 shows one embodiment flow chart of the generation sequence assemble method of genome contig two of the present invention;
Fig. 2 shows that the present invention cuts the DNA molecular after glue and reads long sequence and second using first and reads long sequence and survey to lead to and carry out Splicing, obtain one embodiment flow chart of longer sequence;
Fig. 3 shows one embodiment structured flowchart of the generation sequence package system of genome contig two of the present invention.
Embodiment
The present invention is described in further detail below by embodiment combination accompanying drawing.
In one embodiment of the invention, there is provided one kind is built storehouse, cut glue technology based on second generation sequencing technologies and experiment It is combined, it is intended to improve the method and system of genome contig assembling effect.
Fig. 1 shows that the present invention builds storehouse based on second generation sequencing technologies and experiment, cuts glue technology and be combined, and assembles genome weight One embodiment flow chart of folded group.
As shown in figure 1, in a step 102, reading to grow with reference to two generation sequencing sequences, sample progress DNA is interrupted some Length range (i.e. the first predetermined length scope).In one embodiment of the invention, sample gene group DNA is beaten using ultrasound Disconnected, the first predetermined length scope is 100bp-600bp or 100bp-500bp.
At step 104, the sequence of the double end sequencings of selection reads length as needed, it is determined that cutting the length range (i.e. the of glue Two predetermined length scopes), corresponding example in one embodiment is given in table 1.
Table 1
In step 106, the length range of glue is cut according to step 104, corresponding length of reading is chosen and carries out double end sequencings, and protect Card read 1 and read 2 (i.e. first, which reads long sequence and second, reads long sequences) have it is overlapping, as given in table 1 in one embodiment accordingly Example.
In step 108, after step 106 obtains sequence, it is compared to reading 1 end and reading 2 front ends, if on comparing Just spliced, obtain spliced sequence.This step can use PEAR alignment and assembbly softwares, can be from http://sco.h- Its.org/exelixis/web/software/pear/ is obtained.
In step 110, the splicing sequence obtained with step 108 is assembled, and obtains Contig.By two generation sequences Assembled, long (sequencing sequence) will be read and intercept out the short sequence that length is K successively, weighed mutually before and after referred to as K-mer, K-mer Fold K-1 base.By K-mer storages into hash table, the summit of de Brujin graph is formed;The K-mer of successive in reading Think that the two K-mer are connected, form the side of de Brujin graph.After all reading length have all been handled, whole moral cloth can be obtained Lu Yintu, the path as caused by sequencing mistake, heterozygous sites in figure is removed, linear K-mer paths are connected can shape Into Contig (contig) sequence of the first order.These K-mer bases are connected to the Contig for forming the first order. The assembling in this stage can use splicing software SOAPdenovo or Platanus.SOAPdenovo composite software bibliography Li,R.et al.De novo assembly of human genomes with massively parallel short read sequencing.Genome Res(2009).This software can be obtained freely from network, network address http:// soap.genomics.org.cn/soapdenovo.html.Or can http from network:// Platanus.bio.titech.ac.jp/platanus/ obtains Platanus composite softwares.
Fig. 2 shows that the present invention builds storehouse based on second generation sequencing technologies and experiment, cuts glue technology and be combined, and obtains longer sequence Read long one embodiment flow chart.
In step 202, show by cutting the DNA molecular obtained after glue.
In step 204, show with reference to glue scope is cut, choose corresponding sequencing reading length technology, obtaining reading 1 and reading 2 has weight The sequence in folded region.
In step 206, the sequence after reading 1 and 2 alignment and assembblies of reading is shown.
Corresponding to the generation sequence assemble method of genome contig two shown in Fig. 1, the present invention also provides a kind of genome weight The folded generation sequence package system of group two, as shown in figure 3, including:Module 310 is interrupted, for sample gene group DNA to be interrupted to first Predetermined length scope;Selecting module 320, the second predetermined length scope is selected for cutting glue to the DNA fragmentation interrupted, to build not With the library of Insert Fragment;Sequencer module 330, surveyed for carrying out double ends to the DNA fragmentation of above-mentioned second predetermined length scope Sequence, obtain that there is overlapping first to read long sequence and the second long sequence of reading for specific DNA fragments;Concatenation module 340, for pair The long sequence of the first reading and the second long sequence of reading that the double end sequencings in each library obtain are spliced, after obtaining each library splicing Sequence;Module 350 is assembled, for carrying out sequence assembling to the spliced sequence in each library to obtain genome contig sequence Row.
The algae bryophyte Genome Size that the inventive method is provided below is about 400MB concrete application examples.In the example In son, the sequencing assembling of genome contig is realized, is comprised the following steps that:
(1) storehouse sequencing is built
Extract the DNA of sample and interrupt at random, after electrophoresis, cut 170bp-180bp and 250bp-260bp scopes respectively Gel-purified.DNA fragmentation connection sequence measuring joints after purification, PCR amplifications, then carry out double ends using two generation sequenators respectively 100bp and 150bp sequences are held to read long sequencing.
(2) data filtering
Some original series carry joint sequence, or contain a small amount of low quality sequence.We pass through one first by software Series data processing obtains valid data to remove impurity data.Filtration step specifically includes:
1) sequence containing joint is removed;
2) remove low quality sequence (base number of the mass value less than or equal to 20 accounts for more than the 20% of whole sequence);
3) sequence after being filtered.
(3) sequence assembly
Splicing is compared to the sequence after filtering respectively by PEAR softwares, obtains spliced sequence.
(4) contig is established.
Sequence after splicing is assembled with Platanus softwares, obtains the Contig that size is about 419Mb. Contig N50 are 1881bp.Table 2 is reading 1 and the splicing of reading 2, does not splice the assembling effect comparison sheet being compared.Can from table To find out reading 1 and read 2 splicings than not splicing progress contig assembling, assembling result N50, N60, N70, N80, which are improved, to be exceeded 100%, N90 also improve 59%.
Table 2
Parameter Read 1 and read 2 splicings (bp) Read 1 and do not splice (bp) with reading 2
Contig N50 2074 607
Contig N60 947 313
Contig N70 466 205
Contig N80 315 137
Contig N90 174 109
Above content is to combine specific embodiment further description made for the present invention, it is impossible to assert this hair Bright specific implementation is confined to these explanations.For general technical staff of the technical field of the invention, do not taking off On the premise of from present inventive concept, some simple deduction or replace can also be made, should all be considered as belonging to the protection of the present invention Scope.

Claims (10)

1. a kind of generation sequence assemble method of genome contig two, it is characterised in that methods described includes:
Sample gene group DNA is interrupted to the first predetermined length scope;
Glue is cut to the DNA fragmentation interrupted and selects the second predetermined length scope, to build the library of different Insert Fragments;
Double end sequencings are carried out to the DNA fragmentation of the second predetermined length scope, obtain having for specific DNA fragments overlapping First read long sequence and second and read long sequence;
First obtained to the double end sequencings in each library reads long sequence and the second long sequence of reading is spliced, and obtains each library Spliced sequence;
Sequence assembling is carried out to the spliced sequence in each library to obtain genome Contig.
2. the generation sequence assemble method of genome contig two according to claim 1, it is characterised in that the sample gene Group DNA is interrupted using ultrasound.
3. the generation sequence assemble method of genome contig two according to claim 1, it is characterised in that described first is predetermined Length range is 100bp-600bp or 100bp-500bp.
4. the generation sequence assemble method of genome contig two according to claim 1, it is characterised in that described second is predetermined Length range is 170bp-180bp, 260bp-280bp, 450bp-470bp or 550bp-570bp.
5. the generation sequence assemble method of genome contig two according to claim 1, it is characterised in that described first reads length Sequence and the second sequence length for reading long sequence are 100bp-300bp.
6. the generation sequence assemble method of genome contig two according to claim 1, it is characterised in that described first reads length Sequence and the second sequence length for reading long sequence are 100bp, 150bp, 250bp or 300bp.
7. the generation sequence assemble method of genome contig two according to claim 1, it is characterised in that described to each text The spliced sequence in storehouse carries out sequence assembling:Two generation sequencing sequences are intercepted into out the short sequence K- that length is K successively mer;By K-mer storages into hash table, the summit of de Brujin graph is formed;The K-mer phases of successive on sequencing sequence Even, the side of de Brujin graph is formed;All sequencing sequences have all been handled to obtain whole de Brujin graph;Remove de Brujin graph In as sequencing mistake, path caused by heterozygous sites;Linear K-mer paths are connected to the contig to form the first order.
8. the generation sequence assemble method of genome contig two according to claim 7, it is characterised in that the short sequence K- Mer length is 30bp-500bp.
9. the generation sequence assemble method of genome contig two according to claim 1, it is characterised in that methods described is also wrapped Include:Before the splicing, sequence and low quality sequence containing joint are removed by filtering off.
10. a kind of generation sequence package system of genome contig two, it is characterised in that the system includes:
Module is interrupted, for sample gene group DNA to be interrupted to the first predetermined length scope;
Selecting module, the second predetermined length scope is selected for cutting glue to the DNA fragmentation interrupted, to build different Insert Fragments Library;
Sequencer module, for carrying out double end sequencings to the DNA fragmentation of the second predetermined length scope, for specific DNA pieces Section obtains that there is overlapping first to read long sequence and the second long sequence of reading;
Concatenation module, read long sequence for the long sequence of the first reading obtained to the double end sequencings in each library and second and spell Connect, obtain the spliced sequence in each library;
Module is assembled, for carrying out sequence assembling to the spliced sequence in each library to obtain genome Contig.
CN201610832844.1A 2016-09-19 2016-09-19 A kind of generation sequence assemble method of genome contig two and system Pending CN107841542A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610832844.1A CN107841542A (en) 2016-09-19 2016-09-19 A kind of generation sequence assemble method of genome contig two and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610832844.1A CN107841542A (en) 2016-09-19 2016-09-19 A kind of generation sequence assemble method of genome contig two and system

Publications (1)

Publication Number Publication Date
CN107841542A true CN107841542A (en) 2018-03-27

Family

ID=61657308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610832844.1A Pending CN107841542A (en) 2016-09-19 2016-09-19 A kind of generation sequence assemble method of genome contig two and system

Country Status (1)

Country Link
CN (1) CN107841542A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109935274A (en) * 2019-03-01 2019-06-25 河南大学 A kind of long reading overlay region detection method based on k-mer distribution characteristics
CN111445948A (en) * 2020-03-27 2020-07-24 武汉古奥基因科技有限公司 Chromosome construction method for polyploid fish by using Hi-C

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080243510A1 (en) * 2007-03-28 2008-10-02 Smith Lawrence C Overlapping screen reading of non-sequential text
CN103258145A (en) * 2012-12-22 2013-08-21 中国科学院深圳先进技术研究院 Parallel gene splicing method based on De Bruijn graph
CN104017883A (en) * 2014-06-18 2014-09-03 深圳华大基因科技服务有限公司 Method and system for assembling genomic sequence
CN104346539A (en) * 2013-07-29 2015-02-11 安捷伦科技有限公司 A method for finding variants from targeted sequencing panels
CN104531848A (en) * 2014-12-11 2015-04-22 杭州和壹基因科技有限公司 Method and system for assembling genome sequence
CN104850761A (en) * 2014-02-17 2015-08-19 深圳华大基因科技有限公司 Nucleotide sequence assembly method and device
CN104951672A (en) * 2015-06-19 2015-09-30 中国科学院计算技术研究所 Splicing method and system of second generation and third generation genomic sequencing data combination

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080243510A1 (en) * 2007-03-28 2008-10-02 Smith Lawrence C Overlapping screen reading of non-sequential text
CN103258145A (en) * 2012-12-22 2013-08-21 中国科学院深圳先进技术研究院 Parallel gene splicing method based on De Bruijn graph
CN104346539A (en) * 2013-07-29 2015-02-11 安捷伦科技有限公司 A method for finding variants from targeted sequencing panels
CN104850761A (en) * 2014-02-17 2015-08-19 深圳华大基因科技有限公司 Nucleotide sequence assembly method and device
CN104017883A (en) * 2014-06-18 2014-09-03 深圳华大基因科技服务有限公司 Method and system for assembling genomic sequence
CN104531848A (en) * 2014-12-11 2015-04-22 杭州和壹基因科技有限公司 Method and system for assembling genome sequence
CN104951672A (en) * 2015-06-19 2015-09-30 中国科学院计算技术研究所 Splicing method and system of second generation and third generation genomic sequencing data combination

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIAJIE ZHANG ET AL.: ""PEAR: a fast and accurate Illumina Paired-End reAd mergeR"", 《GENOME ANALYSIS》 *
TANJA MAGOC ET AL.: ""FLASH: fast length adjustment of short reads to improve genome assemblies"", 《GENOME ANALYSIS》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109935274A (en) * 2019-03-01 2019-06-25 河南大学 A kind of long reading overlay region detection method based on k-mer distribution characteristics
CN109935274B (en) * 2019-03-01 2021-04-30 河南大学 Long reading overlap region detection method based on k-mer distribution characteristics
CN111445948A (en) * 2020-03-27 2020-07-24 武汉古奥基因科技有限公司 Chromosome construction method for polyploid fish by using Hi-C
CN111445948B (en) * 2020-03-27 2023-09-29 武汉古奥基因科技有限公司 Chromosome construction method for polyploid fish by Hi-C

Similar Documents

Publication Publication Date Title
CN107858408A (en) A kind of generation sequence assemble method of genome two and system
Zimin et al. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm
CN102206704B (en) Method and device for assembling genome sequence
Kosugi et al. GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments
Staňková et al. BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes
Wang et al. Expression and diversification analysis reveals transposable elements play important roles in the origin of L ycopersicon‐specific lnc RNA s in tomato
Skennerton et al. Crass: identification and reconstruction of CRISPR from unassembled metagenomic data
CN103080333B (en) Methods and systems for detecting genomic structure variations
Nelson et al. Whole-genome validation of high-information-content fingerprinting
CN107784201B (en) Method and system for joint hole filling of second-generation sequence and third-generation single-molecule real-time sequencing sequence
Gao et al. Translational recoding signals between gag and pol in diverse LTR retrotransposons
Huang et al. Palindromic sequence impedes sequencing-by-ligation mechanism
Coombe et al. Assembly of the complete Sitka spruce chloroplast genome using 10X Genomics’ GemCode sequencing data
CN108460245B (en) Method and apparatus for optimizing second generation assembly results using third generation sequences
CN107841542A (en) A kind of generation sequence assemble method of genome contig two and system
CN106939344A (en) The joint being sequenced for two generations
CN111477281A (en) Pan-genome construction method and construction device based on phylogenetic tree
CN107784198B (en) Combined assembly method and system for second-generation sequence and third-generation single-molecule real-time sequencing sequence
CN108660197A (en) A kind of assemble method and system of two generation sequences genome contig
CN102789553B (en) Method and device for assembling genomes by utilizing long transcriptome sequencing result
CN108866173A (en) A kind of verification method of standard sequence, device and its application
Chester et al. Single integration and spread of a Copia-like sequence nested in rDNA intergenic spacers of Allium cernuum (Alliaceae)
CN103270175B (en) Method and system for detecting the insertion sites of transgenic foreign fragments
Young et al. A new strategy for genome assembly using short sequence reads and reduced representation libraries
WO2014005329A1 (en) Method and system for determining integration manner of foreign gene in human genome

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1250753

Country of ref document: HK

RJ01 Rejection of invention patent application after publication

Application publication date: 20180327

RJ01 Rejection of invention patent application after publication