WO1998026096A1 - Method for rapid gap closure - Google Patents
Method for rapid gap closure Download PDFInfo
- Publication number
- WO1998026096A1 WO1998026096A1 PCT/US1997/022655 US9722655W WO9826096A1 WO 1998026096 A1 WO1998026096 A1 WO 1998026096A1 US 9722655 W US9722655 W US 9722655W WO 9826096 A1 WO9826096 A1 WO 9826096A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequences
- sequence
- overlapping ends
- library
- overlapping
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6841—In situ hybridisation
Definitions
- the present invention relates to a simple and cost effective method for the closure of gaps generated during whole genome random sequencing through the use of high-density arrays, or grids, of genomic libraries.
- the method is also useful for the rapid isolation of full length genomic sequences obtained from partial gene sequences. Such genomic sequences will therefore comprise full length coding regions.
- the method of the present invention is also useful for the confirmation of computer generated assemblies. That is, the confirmation of the order or assembly of contiguous sequences. This method provides an alternative to chromosome walking.
- Genomics (1988), 2: 231-239). Ordering of contiguous sequences and completion of gap closures is typically performed by genomic PCR based on primers designed against every combination of physical gap ends. However, this procedure is both very time consuming and labor intensive and can take 5-10 times longer than the random sequencing itself. Accordingly, there exists a need for a more efficient method of ordering of contiguous sequences and completing gap closure in whole genome random sequencing of any organism.
- the invention provides a method for high throughput sequencing and gap closure and contiguous sequence assembly or clone ordering in genome sequencing projects using high density grids of genomic libraries.
- the method involves constructing a series of random genomic libraries for a selected organism and preparing a grid for each library, each grid having a surface on which is immobilized at predefined regions on said surface a plurality of clones derived from the libraries.
- This gap closure provides complete sequence for partial genes or genes not found in the original random sequencing step.
- hybridization probes are generated which correspond to the non-overlapping ends of the known sequence.
- the probes are then hybridized to a gridded library to identify nucleotide sequences which span the non- overlapping ends of the assembled nucleotide sequence.
- genes are a major goal of modern scientific research. By identifying genes, determining their sequences and characterizing their biological function, it is possible to employ recombinant technology to produce large quantities of valuable gene products, e.g., proteins and peptides. Additionally, knowledge of gene sequences can provide a key to diagnosis, prognosis and treatment in a variety of disease states in plants and animals which are characterized by inappropriate expression and/or repression of selected genes or by the influence of external factors, e.g., carcinogens or teratogens, on gene function. Methods now exist for whole random sequencing and assembly of a complete living organism. However, methods required to complete genome sequence gap closures and ordering of contiguous sequences are both time consuming and labor intensive.
- the present invention provides a method for high throughput gap closure and contiguous sequence assembly which is useful in whole genome random sequencing.
- This method uses a plurality of high density grids prepared from genomic libraries of a selected organism to perform sequence reactions, gap closure and contiguous sequence assembly.
- the method of the present invention provides a more rapid and cost effective means to sequence the whole genome of an organism.
- the method also provides rapid means to obtain the full length genomic sequence for genes for which only a partial sequence is obtained through random sequencing.
- gene refers to the genomic nucleotide sequence from which a cDNA sequence is derived.
- gene classically refers to the genomic sequence, which upon processing, can produce different cDNAs, e.g., by splicing events. However, for ease of reading, any full-length counterpart cDNA sequence will also be referred to by shorthand herein as gene.
- isolated means altered “by the hand of man” from its natural state; i.e., that, if it occurs in nature, it has been changed or removed from its original environment, or both.
- a naturally occurring polynucleotide or a polypeptide naturally present in a living animal in its natural state is not “isolated,” but the same polynucleotide or polypeptide separated from the coexisting materials of its natural state is “isolated”, as the term is employed herein.
- isolated means that it is separated from the chromosome and cell in which it naturally occurs.
- organism it is meant to include any living organism such as, but not limited to, bacterium (including both gram negative and gram positive species), viruses, lower eukaryotic cells such as fungi, yeast and molds, simple multicellular organisms (e.g., slime molds) and complex multicellular organisms including man.
- bacterium including both gram negative and gram positive species
- viruses lower eukaryotic cells such as fungi, yeast and molds
- simple multicellular organisms e.g., slime molds
- complex multicellular organisms including man.
- solid support refers to any known substrate which is useful for the immobilization of a plurality of defined materials derived from a genomic library by any available method to enable detectable hybridization of the immobilized polynucleotide sequences with other polynucleotides in the sample.
- solid supports one desirable example is the supports described in International Patent Application No. WO91/07087, published May 30, 1991.
- other useful supports include, but are not limited to, nitrocellulose, nylon, glass, silica and Pall BIODYNE C. It is also anticipated that improvements yet to be made to conventional solid supports may also be employed in this invention.
- grid means any generally two-dimensional structure on a solid support to which the defined materials of a genomic library are attached or immobilized.
- predefined region refers to a localized area on a surface of a solid support on which is immobilized one or multiple copies of a particular clone and which enables hybridization of that clone at the position, if hybridization of that clone to a sample polynucleotide occurs.
- immobilized it is meant to refer to the attachment of the genes to the solid support. Means of immobilization are known and conventional to those of skill in the art, and may depend on the type of support being used.
- the present invention is based upon the use of high density arrays of genomic libraries as a means for high throughput gap closure, including full length genomic sequences and contiguous sequence assembly in genomic sequencing.
- A. Preparation of genomic libraries For this analysis a series of random genomic libraries for a selected organism are prepared, each library comprising fractionated and ligated genomic DNA of a selected insert size range. To construct these libraries, genomic DNA from selected organism is first isolated using standard procedures for molecular biology such as those disclosed by Sambrook et al, MOLECULAR CLONING, A LABORATORY MANUAL, 2nd Ed.; Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989.
- the isolated DNA is then randomly sheared (e.g., by sonication, partial restriction endonuclease digestion, partial DNAse digestion, etc.), modified and ligated in a plasmid or phage vector in accordance with the procedures described by Fleischmann et al. Science, 1995, 269:496-512.
- a small insert library is prepared by fractionating and ligating the genomic DNA into a plasmid or phage based vector so that the average insert size is between 1.0 and 5.0 kb.
- plasmid vectors useful in constructing this small insert library include, but are not limited to, pBLuescript, Lambda ZAPII (Stratagene, La Jolla, CA) pUC19 and Ml 3 mpl8/19 (New England BIOLABS, Beverly, MA).
- a large insert library is also constructed in a cosmid vector so that the insert size averages between 10 and 100 kb.
- cosmid vectors useful in constructing this large insert library include, but are not limited to, pLorist, pWEIS (Statagene, La Jolla, CA), lambda DASH2 (Statagene) and lambda GEM- 12 (Promega, Madison, WI).
- a medium insert library having an insert size range averaging approximately 5.0-10 kb can also be prepared by fractionating the genomic DNA into a cosmid vector.
- the final ligation products are electroporated into a bacterium such as E. coli so that the final number of transformants for each library reaches 4 to 6 fold depth of coverage as predicted by the Lander-Waterman theory ( ⁇ .S. Lander and M.S.
- Each of the insert libraries are gridded onto solid supports, with the nucleic acid of each transformed host cell (containing the insert and amplified by PCR), or clone, being placed onto a predefined position within a high density array.
- this method involves forming predefined regions on a surface of a solid support, where the predefined regions are capable of immobilizing the clones.
- the method makes use of binding substrates attached to the surface which enable selective activation of the predefined regions. Upon activation, these binding substances become capable of binding and immobilizing the clones derived from the genomic library.
- Any of the known solid substrates suitable for binding nucleotide sequences at predefined regions on the surface thereof for hybridization and methods for attaching nucleotide sequences thereto may be employed by one of skill in the art according to the invention.
- known conventional methods for making hybridization of the immobilized clones detectable e.g., fluorescence, radioactivity, photoactivation, energy transfer dyes, biotinylation, solid state circuitry, and the like may be used in this invention.
- the present invention employs the compositions described above in methods for gap closure following high throughput sequencing for confirmation of computer generated assemblies, and to generate full length coding sequences.
- a small insert library can be used for high throughput sequencing.
- sequencing reactions are performed until 2 to 3 fold depth of coverage is obtained by standard sequencing methods (see, e.g., Fleischmann et al. (Science, (1995), 269:p. 496)).
- the resulting sequences are assembled using standard computational programs such as the GELMERGE Assembler available from Genetics Computer Group, Inc. of Madison, WI (UWGCG).
- GELMERGE Assembler available from Genetics Computer Group, Inc. of Madison, WI (UWGCG).
- the Assembler identifies non-overlapping regions, or gaps, between the assemblies. These gaps occur because of the statistical consequence of incomplete sampling and non-randomness in the collection of sequence fragments due to deletion of clones from the library.
- primer pairs are prepared from the non- overlapping end of each assembled contiguous sequence and used in individual PCR reactions against total genomic DNA isolated from the selected organism.
- Three types of probes can be prepared from the known non-overlapping ends of assembled sequences: (1) a clone which contains the end of an assembled sequence can be labeled (i.e., can use the clone directly as a probe); (2) a PCR fragment can be amplified using a primer pair from the end of the assembled sequence; and (3) an oligonucleotide from the end of the assembled sequence can be used as a hybridization probe. Products of these reactions are detectably labeled, preferably with a radioisotope or a fluorescent label, and used as probes in separate hybridization reactions against the gridded libraries.
- Adjacent contiguous sequences are identified by positively hybridizing probes and confirmed by sequence analysis and computational assembly. In addition, when probes from two separate gap ends hybridize to the same gridded clone, the region spanning a gap is immediately identified. Primer walks are performed until the entire clone is finished or the gap is closed. New probes and hybridizations are performed as needed.
- arrayed clones can be placed in approximate order prior to sequencing.
- 100 kb cosmid inserts can be hybridized against a small insert library to identify contiguous clones prior to sequencing. More precise ordering of clones is performed by digesting genomic DNA with rare cutting enzymes (such as NotI, Pad, Spel for E. coli, however, it varies depending on the base composition of the organism.
- the enzyme cuts once every 10-lOOkB), or partial digestion with a more frequent cutter (preferably cutting once every 10-lOOkB),., separating and isolating large DNA fragments resulting from the digestion by pulsed field gel electrophoresis, and hybridizing the isolated fragments against the grids.
- first pass sequencing of a large insert library is performed using both universal forward and reverse primers directed against the cosmid vector sequence.
- Universal primers such as those directed against ml 3 sequences, are well known in the art.
- a small number of sequencing runs will begin to array these large insert clones into contiguous sequences using standard computational approaches such as the G ⁇ LM ⁇ RG ⁇ Assembler.
- Other computer-assisted assembly of nucleotide data is known in the art, see, e.g.,
- a genomic library of Staphylococcus aureus was constructed in the vector, lambda ZAP II.
- the average insert size of this library is approximately 5 kb.
- Annotating is a process of identifying regions of partial gene sequences and putative gene assemblies that may cause two unlike sequences to be considered alike or otherwise produce inaccurate results in the grouping or assembly processes. These regions are likely to interfere with the correctness of the subsequent grouping and assembly steps of the method of the invention. The remaining unidentified regions are considered to contain useful information (for the purpose of grouping and assembly) and are used in the subsequent grouping and assembly steps. Regions identified as likely to interfere with subsequent steps are ignored in those steps. Examples of regions which can be identified in the annotating step are sequences from species other than the one of interest and nucleic acids or DNA from cellular structures such as ribosomes and mitochondria.
- annotated partial gene sequences are grouped with other annotated partial gene sequences.
- the step of grouping the annotated partial gene sequences is based on determining association relationships between an annotated partial gene sequence and other existing annotated partial gene sequences, some of which may be components of previously identified putative gene assemblies. This process begins by ignoring the annotated regions from the partial gene sequences and previously identified putative gene assemblies.
- the partial gene sequences, with the annotated regions ignored, are then compared with the consensus sequence of previously identified putative gene assemblies, with the annotated regions ignored.
- the partial gene sequences are also compared with each other, ignoring the annotated regions.
- the partial gene sequences are placed in groups based on the similarities found in these comparisons. Resulting groups thereby contain a collection of partial gene sequences that would appear to belong together, i.e., the grouping step produces a group of partial gene sequences that are thought to assemble together.
- the positional ordering of the partial gene sequences relative to one another is taken as a group on the assumption that all partial gene sequences belong to the same putative gene assembly.
- One of the consequences of the ordering may be that more than one putative gene assembly may result should the ordering step uncover inconsistencies among the group of partial gene sequences.
- putative gene assemblies are stored in a database.
- Putative gene assemblies may be characterized on the basis of their sequence, structure, biological function or other related characteristics. Once categorized, the database can be expanded with information linked to the putative gene assemblies regarding their potential biological function, structure or other characteristics. For example, one method of characterizing putative gene assemblies is by homology to other known genes. Shared homology of a putative gene assembly with a known gene may indicate a similar biological role or function.
- Another exemplary method of characterizing putative gene assemblies is on the basis of known sequence motifs. Certain sequence patterns are known to code for regions of proteins having specific biological characteristics such as signal sequences, transmembrane domains, SH2 domains, etc.
- PCR primers were designed to amplify a 250 bp fragment for each of these non-overlapping contiguous sequence ends and used in separate PCR reactions against Staphlyococcus aureus genomic DNA.
- the PCR products were purified, radiolabeled, and used in separate hybridization reactions against the high density grid.
- the PCR fragment for clone 2AU2142 was found to hybridize against three clones. Restriction digest analysis demonstrated that these three clones possessed overlapping sequences. Sequence analysis confirmed this result and further showed an additional 450 bp extending from the end of the contiguous sequence.
- 2AU0165 derived fragment was shown to hybridize against two clones. Restriction fragment and sequence analysis also confirmed the overlapping nature of these clones.
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP97952359A EP0958381A1 (en) | 1996-12-12 | 1997-12-11 | Method for rapid gap closure |
JP52692898A JP2001505780A (en) | 1996-12-12 | 1997-12-11 | Quick gap closure method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US3255596P | 1996-12-12 | 1996-12-12 | |
US60/032,555 | 1996-12-12 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1998026096A1 true WO1998026096A1 (en) | 1998-06-18 |
Family
ID=21865561
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1997/022655 WO1998026096A1 (en) | 1996-12-12 | 1997-12-11 | Method for rapid gap closure |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP0958381A1 (en) |
JP (1) | JP2001505780A (en) |
WO (1) | WO1998026096A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003074698A1 (en) * | 2002-03-06 | 2003-09-12 | Takara Bio Inc. | Method of determining base sequence of nucleic acid |
WO2013078623A1 (en) * | 2011-11-29 | 2013-06-06 | 深圳华大基因科技有限公司 | Method and device for gap closure in nucleotide sequence assembly |
WO2013078619A1 (en) * | 2011-11-29 | 2013-06-06 | 深圳华大基因科技有限公司 | Method and device for identifying extension conflict and determining confidence level of seed read in nucleotide sequence assembly |
US8946129B2 (en) | 2005-09-12 | 2015-02-03 | Electrophoretics Limited | Mass labels |
-
1997
- 1997-12-11 JP JP52692898A patent/JP2001505780A/en active Pending
- 1997-12-11 WO PCT/US1997/022655 patent/WO1998026096A1/en not_active Application Discontinuation
- 1997-12-11 EP EP97952359A patent/EP0958381A1/en not_active Withdrawn
Non-Patent Citations (4)
Title |
---|
GENOMICS, April 1988, Vol. 2, LANDER et al., "Genomic Mapping by Fingerprinting Random Clones: A Mathematical Analysis", pages 231-239. * |
GENOMICS, December 1991, Vol. 11, SCHLESSINGER et al., "Yeast Artifical Chromosome-Based Genome Mapping: Some Lessons from Xq24-q28", pages 783-793. * |
NUCLEIC ACIDS RESEARCH, 25 April 1993, Vol. 21, No. 8, MOTT et al., "Algorithms and Software Tools for Ordering Clone Libraries: Application to the Mapping of the Genome of Schizosaccharomyces Pombe", pages 1965-1974. * |
PROC. NATL. ACAD. SCI. U.S.A., October 1986, Vol. 83, OLSON et al., "Random-Clone Strategy for Genomic Restriction Mapping in Yeast", pages 7826-7830. * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003074698A1 (en) * | 2002-03-06 | 2003-09-12 | Takara Bio Inc. | Method of determining base sequence of nucleic acid |
US8946129B2 (en) | 2005-09-12 | 2015-02-03 | Electrophoretics Limited | Mass labels |
WO2013078623A1 (en) * | 2011-11-29 | 2013-06-06 | 深圳华大基因科技有限公司 | Method and device for gap closure in nucleotide sequence assembly |
WO2013078619A1 (en) * | 2011-11-29 | 2013-06-06 | 深圳华大基因科技有限公司 | Method and device for identifying extension conflict and determining confidence level of seed read in nucleotide sequence assembly |
Also Published As
Publication number | Publication date |
---|---|
JP2001505780A (en) | 2001-05-08 |
EP0958381A1 (en) | 1999-11-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5219726A (en) | Physical mapping of complex genomes | |
Smith et al. | Genomic sequence sampling: a strategy for high resolution sequence–based physical mapping of complex genomes | |
JP4714883B2 (en) | Subtractive hybridization based on microarray | |
US5840484A (en) | Comparative gene transcript analysis | |
Winzeler et al. | Functional analysis of the yeast genome | |
US6975943B2 (en) | Clone-array pooled shotgun strategy for nucleic acid sequencing | |
Clark et al. | [13] Construction and analysis of arrayed cDNA libraries | |
WO2001071042A2 (en) | Detection kits, such as nucleic acid arrays, for detecting the expression of 10,000 or more drosophila genes and uses thereof | |
US5851760A (en) | Method for generation of sequence sampled maps of complex genomes | |
WO1998026096A1 (en) | Method for rapid gap closure | |
EP0948646B1 (en) | Methods for identifying genes essential to the growth of an organism | |
Tagu et al. | Techniques for molecular biology | |
Wu et al. | Whole‐Genome Physical Mapping: An Overview on Methods for DNA Fingerprinting | |
US20020012911A1 (en) | Novel method for the preselection of shotgun clones of the genome or a portion thereof of an organism | |
US6924112B1 (en) | Cloning method by multiple-digestion, vectors for implementing same and applications | |
CA2309371A1 (en) | Gene sequence tag method | |
US20080076672A1 (en) | Methods for Identifying Genes Essential to the Growth of an Organism | |
Hilborne et al. | Diagnostic applications of recombinant nucleic acid technology: basic techniques | |
Milosavljevic et al. | Shotgun sequencing, clone pooling, and comparative strategies for mapping and sequencing | |
Jordan | The microarray paradigm and its various implementations | |
Green | PCR-based screening of yeast artificial chromosome libraries | |
EP1005653A1 (en) | Method of producing a subtraction library | |
Vijg | Introduction to recombinant DNA technology | |
Zhu | Use of PCR in Library Screening: An Overview | |
Carpten | Construction of a long-range physical map of the region containing the spinal muscular atrophy gene |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CA JP US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 09319143 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1997952359 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref country code: JP Ref document number: 1998 526928 Kind code of ref document: A Format of ref document f/p: F |
|
WWP | Wipo information: published in national office |
Ref document number: 1997952359 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1997952359 Country of ref document: EP |