EP0958381A1 - Verfahren zur schnellen schliessung einer lücke - Google Patents

Verfahren zur schnellen schliessung einer lücke

Info

Publication number
EP0958381A1
EP0958381A1 EP97952359A EP97952359A EP0958381A1 EP 0958381 A1 EP0958381 A1 EP 0958381A1 EP 97952359 A EP97952359 A EP 97952359A EP 97952359 A EP97952359 A EP 97952359A EP 0958381 A1 EP0958381 A1 EP 0958381A1
Authority
EP
European Patent Office
Prior art keywords
sequences
sequence
overlapping ends
library
overlapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP97952359A
Other languages
English (en)
French (fr)
Inventor
Jeffrey L. Mooney
Christine Marie Debouck
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SmithKline Beecham Corp
Original Assignee
SmithKline Beecham Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SmithKline Beecham Corp filed Critical SmithKline Beecham Corp
Publication of EP0958381A1 publication Critical patent/EP0958381A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6841In situ hybridisation

Definitions

  • the present invention relates to a simple and cost effective method for the closure of gaps generated during whole genome random sequencing through the use of high-density arrays, or grids, of genomic libraries.
  • the method is also useful for the rapid isolation of full length genomic sequences obtained from partial gene sequences. Such genomic sequences will therefore comprise full length coding regions.
  • the method of the present invention is also useful for the confirmation of computer generated assemblies. That is, the confirmation of the order or assembly of contiguous sequences. This method provides an alternative to chromosome walking.
  • Genomics (1988), 2: 231-239). Ordering of contiguous sequences and completion of gap closures is typically performed by genomic PCR based on primers designed against every combination of physical gap ends. However, this procedure is both very time consuming and labor intensive and can take 5-10 times longer than the random sequencing itself. Accordingly, there exists a need for a more efficient method of ordering of contiguous sequences and completing gap closure in whole genome random sequencing of any organism.
  • the invention provides a method for high throughput sequencing and gap closure and contiguous sequence assembly or clone ordering in genome sequencing projects using high density grids of genomic libraries.
  • the method involves constructing a series of random genomic libraries for a selected organism and preparing a grid for each library, each grid having a surface on which is immobilized at predefined regions on said surface a plurality of clones derived from the libraries.
  • This gap closure provides complete sequence for partial genes or genes not found in the original random sequencing step.
  • hybridization probes are generated which correspond to the non-overlapping ends of the known sequence.
  • the probes are then hybridized to a gridded library to identify nucleotide sequences which span the non- overlapping ends of the assembled nucleotide sequence.
  • genes are a major goal of modern scientific research. By identifying genes, determining their sequences and characterizing their biological function, it is possible to employ recombinant technology to produce large quantities of valuable gene products, e.g., proteins and peptides. Additionally, knowledge of gene sequences can provide a key to diagnosis, prognosis and treatment in a variety of disease states in plants and animals which are characterized by inappropriate expression and/or repression of selected genes or by the influence of external factors, e.g., carcinogens or teratogens, on gene function. Methods now exist for whole random sequencing and assembly of a complete living organism. However, methods required to complete genome sequence gap closures and ordering of contiguous sequences are both time consuming and labor intensive.
  • the present invention provides a method for high throughput gap closure and contiguous sequence assembly which is useful in whole genome random sequencing.
  • This method uses a plurality of high density grids prepared from genomic libraries of a selected organism to perform sequence reactions, gap closure and contiguous sequence assembly.
  • the method of the present invention provides a more rapid and cost effective means to sequence the whole genome of an organism.
  • the method also provides rapid means to obtain the full length genomic sequence for genes for which only a partial sequence is obtained through random sequencing.
  • gene refers to the genomic nucleotide sequence from which a cDNA sequence is derived.
  • gene classically refers to the genomic sequence, which upon processing, can produce different cDNAs, e.g., by splicing events. However, for ease of reading, any full-length counterpart cDNA sequence will also be referred to by shorthand herein as gene.
  • isolated means altered “by the hand of man” from its natural state; i.e., that, if it occurs in nature, it has been changed or removed from its original environment, or both.
  • a naturally occurring polynucleotide or a polypeptide naturally present in a living animal in its natural state is not “isolated,” but the same polynucleotide or polypeptide separated from the coexisting materials of its natural state is “isolated”, as the term is employed herein.
  • isolated means that it is separated from the chromosome and cell in which it naturally occurs.
  • organism it is meant to include any living organism such as, but not limited to, bacterium (including both gram negative and gram positive species), viruses, lower eukaryotic cells such as fungi, yeast and molds, simple multicellular organisms (e.g., slime molds) and complex multicellular organisms including man.
  • bacterium including both gram negative and gram positive species
  • viruses lower eukaryotic cells such as fungi, yeast and molds
  • simple multicellular organisms e.g., slime molds
  • complex multicellular organisms including man.
  • solid support refers to any known substrate which is useful for the immobilization of a plurality of defined materials derived from a genomic library by any available method to enable detectable hybridization of the immobilized polynucleotide sequences with other polynucleotides in the sample.
  • solid supports one desirable example is the supports described in International Patent Application No. WO91/07087, published May 30, 1991.
  • other useful supports include, but are not limited to, nitrocellulose, nylon, glass, silica and Pall BIODYNE C. It is also anticipated that improvements yet to be made to conventional solid supports may also be employed in this invention.
  • grid means any generally two-dimensional structure on a solid support to which the defined materials of a genomic library are attached or immobilized.
  • predefined region refers to a localized area on a surface of a solid support on which is immobilized one or multiple copies of a particular clone and which enables hybridization of that clone at the position, if hybridization of that clone to a sample polynucleotide occurs.
  • immobilized it is meant to refer to the attachment of the genes to the solid support. Means of immobilization are known and conventional to those of skill in the art, and may depend on the type of support being used.
  • the present invention is based upon the use of high density arrays of genomic libraries as a means for high throughput gap closure, including full length genomic sequences and contiguous sequence assembly in genomic sequencing.
  • A. Preparation of genomic libraries For this analysis a series of random genomic libraries for a selected organism are prepared, each library comprising fractionated and ligated genomic DNA of a selected insert size range. To construct these libraries, genomic DNA from selected organism is first isolated using standard procedures for molecular biology such as those disclosed by Sambrook et al, MOLECULAR CLONING, A LABORATORY MANUAL, 2nd Ed.; Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989.
  • the isolated DNA is then randomly sheared (e.g., by sonication, partial restriction endonuclease digestion, partial DNAse digestion, etc.), modified and ligated in a plasmid or phage vector in accordance with the procedures described by Fleischmann et al. Science, 1995, 269:496-512.
  • a small insert library is prepared by fractionating and ligating the genomic DNA into a plasmid or phage based vector so that the average insert size is between 1.0 and 5.0 kb.
  • plasmid vectors useful in constructing this small insert library include, but are not limited to, pBLuescript, Lambda ZAPII (Stratagene, La Jolla, CA) pUC19 and Ml 3 mpl8/19 (New England BIOLABS, Beverly, MA).
  • a large insert library is also constructed in a cosmid vector so that the insert size averages between 10 and 100 kb.
  • cosmid vectors useful in constructing this large insert library include, but are not limited to, pLorist, pWEIS (Statagene, La Jolla, CA), lambda DASH2 (Statagene) and lambda GEM- 12 (Promega, Madison, WI).
  • a medium insert library having an insert size range averaging approximately 5.0-10 kb can also be prepared by fractionating the genomic DNA into a cosmid vector.
  • the final ligation products are electroporated into a bacterium such as E. coli so that the final number of transformants for each library reaches 4 to 6 fold depth of coverage as predicted by the Lander-Waterman theory ( ⁇ .S. Lander and M.S.
  • Each of the insert libraries are gridded onto solid supports, with the nucleic acid of each transformed host cell (containing the insert and amplified by PCR), or clone, being placed onto a predefined position within a high density array.
  • this method involves forming predefined regions on a surface of a solid support, where the predefined regions are capable of immobilizing the clones.
  • the method makes use of binding substrates attached to the surface which enable selective activation of the predefined regions. Upon activation, these binding substances become capable of binding and immobilizing the clones derived from the genomic library.
  • Any of the known solid substrates suitable for binding nucleotide sequences at predefined regions on the surface thereof for hybridization and methods for attaching nucleotide sequences thereto may be employed by one of skill in the art according to the invention.
  • known conventional methods for making hybridization of the immobilized clones detectable e.g., fluorescence, radioactivity, photoactivation, energy transfer dyes, biotinylation, solid state circuitry, and the like may be used in this invention.
  • the present invention employs the compositions described above in methods for gap closure following high throughput sequencing for confirmation of computer generated assemblies, and to generate full length coding sequences.
  • a small insert library can be used for high throughput sequencing.
  • sequencing reactions are performed until 2 to 3 fold depth of coverage is obtained by standard sequencing methods (see, e.g., Fleischmann et al. (Science, (1995), 269:p. 496)).
  • the resulting sequences are assembled using standard computational programs such as the GELMERGE Assembler available from Genetics Computer Group, Inc. of Madison, WI (UWGCG).
  • GELMERGE Assembler available from Genetics Computer Group, Inc. of Madison, WI (UWGCG).
  • the Assembler identifies non-overlapping regions, or gaps, between the assemblies. These gaps occur because of the statistical consequence of incomplete sampling and non-randomness in the collection of sequence fragments due to deletion of clones from the library.
  • primer pairs are prepared from the non- overlapping end of each assembled contiguous sequence and used in individual PCR reactions against total genomic DNA isolated from the selected organism.
  • Three types of probes can be prepared from the known non-overlapping ends of assembled sequences: (1) a clone which contains the end of an assembled sequence can be labeled (i.e., can use the clone directly as a probe); (2) a PCR fragment can be amplified using a primer pair from the end of the assembled sequence; and (3) an oligonucleotide from the end of the assembled sequence can be used as a hybridization probe. Products of these reactions are detectably labeled, preferably with a radioisotope or a fluorescent label, and used as probes in separate hybridization reactions against the gridded libraries.
  • Adjacent contiguous sequences are identified by positively hybridizing probes and confirmed by sequence analysis and computational assembly. In addition, when probes from two separate gap ends hybridize to the same gridded clone, the region spanning a gap is immediately identified. Primer walks are performed until the entire clone is finished or the gap is closed. New probes and hybridizations are performed as needed.
  • arrayed clones can be placed in approximate order prior to sequencing.
  • 100 kb cosmid inserts can be hybridized against a small insert library to identify contiguous clones prior to sequencing. More precise ordering of clones is performed by digesting genomic DNA with rare cutting enzymes (such as NotI, Pad, Spel for E. coli, however, it varies depending on the base composition of the organism.
  • the enzyme cuts once every 10-lOOkB), or partial digestion with a more frequent cutter (preferably cutting once every 10-lOOkB),., separating and isolating large DNA fragments resulting from the digestion by pulsed field gel electrophoresis, and hybridizing the isolated fragments against the grids.
  • first pass sequencing of a large insert library is performed using both universal forward and reverse primers directed against the cosmid vector sequence.
  • Universal primers such as those directed against ml 3 sequences, are well known in the art.
  • a small number of sequencing runs will begin to array these large insert clones into contiguous sequences using standard computational approaches such as the G ⁇ LM ⁇ RG ⁇ Assembler.
  • Other computer-assisted assembly of nucleotide data is known in the art, see, e.g.,
  • a genomic library of Staphylococcus aureus was constructed in the vector, lambda ZAP II.
  • the average insert size of this library is approximately 5 kb.
  • Annotating is a process of identifying regions of partial gene sequences and putative gene assemblies that may cause two unlike sequences to be considered alike or otherwise produce inaccurate results in the grouping or assembly processes. These regions are likely to interfere with the correctness of the subsequent grouping and assembly steps of the method of the invention. The remaining unidentified regions are considered to contain useful information (for the purpose of grouping and assembly) and are used in the subsequent grouping and assembly steps. Regions identified as likely to interfere with subsequent steps are ignored in those steps. Examples of regions which can be identified in the annotating step are sequences from species other than the one of interest and nucleic acids or DNA from cellular structures such as ribosomes and mitochondria.
  • annotated partial gene sequences are grouped with other annotated partial gene sequences.
  • the step of grouping the annotated partial gene sequences is based on determining association relationships between an annotated partial gene sequence and other existing annotated partial gene sequences, some of which may be components of previously identified putative gene assemblies. This process begins by ignoring the annotated regions from the partial gene sequences and previously identified putative gene assemblies.
  • the partial gene sequences, with the annotated regions ignored, are then compared with the consensus sequence of previously identified putative gene assemblies, with the annotated regions ignored.
  • the partial gene sequences are also compared with each other, ignoring the annotated regions.
  • the partial gene sequences are placed in groups based on the similarities found in these comparisons. Resulting groups thereby contain a collection of partial gene sequences that would appear to belong together, i.e., the grouping step produces a group of partial gene sequences that are thought to assemble together.
  • the positional ordering of the partial gene sequences relative to one another is taken as a group on the assumption that all partial gene sequences belong to the same putative gene assembly.
  • One of the consequences of the ordering may be that more than one putative gene assembly may result should the ordering step uncover inconsistencies among the group of partial gene sequences.
  • putative gene assemblies are stored in a database.
  • Putative gene assemblies may be characterized on the basis of their sequence, structure, biological function or other related characteristics. Once categorized, the database can be expanded with information linked to the putative gene assemblies regarding their potential biological function, structure or other characteristics. For example, one method of characterizing putative gene assemblies is by homology to other known genes. Shared homology of a putative gene assembly with a known gene may indicate a similar biological role or function.
  • Another exemplary method of characterizing putative gene assemblies is on the basis of known sequence motifs. Certain sequence patterns are known to code for regions of proteins having specific biological characteristics such as signal sequences, transmembrane domains, SH2 domains, etc.
  • PCR primers were designed to amplify a 250 bp fragment for each of these non-overlapping contiguous sequence ends and used in separate PCR reactions against Staphlyococcus aureus genomic DNA.
  • the PCR products were purified, radiolabeled, and used in separate hybridization reactions against the high density grid.
  • the PCR fragment for clone 2AU2142 was found to hybridize against three clones. Restriction digest analysis demonstrated that these three clones possessed overlapping sequences. Sequence analysis confirmed this result and further showed an additional 450 bp extending from the end of the contiguous sequence.
  • 2AU0165 derived fragment was shown to hybridize against two clones. Restriction fragment and sequence analysis also confirmed the overlapping nature of these clones.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
EP97952359A 1996-12-12 1997-12-11 Verfahren zur schnellen schliessung einer lücke Withdrawn EP0958381A1 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US3255596P 1996-12-12 1996-12-12
US32555P 1996-12-12
PCT/US1997/022655 WO1998026096A1 (en) 1996-12-12 1997-12-11 Method for rapid gap closure

Publications (1)

Publication Number Publication Date
EP0958381A1 true EP0958381A1 (de) 1999-11-24

Family

ID=21865561

Family Applications (1)

Application Number Title Priority Date Filing Date
EP97952359A Withdrawn EP0958381A1 (de) 1996-12-12 1997-12-11 Verfahren zur schnellen schliessung einer lücke

Country Status (3)

Country Link
EP (1) EP0958381A1 (de)
JP (1) JP2001505780A (de)
WO (1) WO1998026096A1 (de)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005218301A (ja) * 2002-03-06 2005-08-18 Takara Bio Inc 核酸の塩基配列決定方法
GB0518585D0 (en) 2005-09-12 2005-10-19 Electrophoretics Ltd Mass labels
WO2013078623A1 (zh) * 2011-11-29 2013-06-06 深圳华大基因科技有限公司 核酸序列组装中的补洞方法及其装置
WO2013078619A1 (zh) * 2011-11-29 2013-06-06 深圳华大基因科技有限公司 核酸序列组装中识别延伸冲突和判断种子读序可信度的方法及其装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO9826096A1 *

Also Published As

Publication number Publication date
JP2001505780A (ja) 2001-05-08
WO1998026096A1 (en) 1998-06-18

Similar Documents

Publication Publication Date Title
US5219726A (en) Physical mapping of complex genomes
Smith et al. Genomic sequence sampling: a strategy for high resolution sequence–based physical mapping of complex genomes
JP4714883B2 (ja) マイクロアレイに基づくサブトラクティブハイブリダイゼーション
US5840484A (en) Comparative gene transcript analysis
Winzeler et al. Functional analysis of the yeast genome
US6975943B2 (en) Clone-array pooled shotgun strategy for nucleic acid sequencing
Clark et al. [13] Construction and analysis of arrayed cDNA libraries
WO2001071042A2 (en) Detection kits, such as nucleic acid arrays, for detecting the expression of 10,000 or more drosophila genes and uses thereof
US5851760A (en) Method for generation of sequence sampled maps of complex genomes
EP0958381A1 (de) Verfahren zur schnellen schliessung einer lücke
EP0948646B1 (de) Verfahren zur identifizierung von essentiellen genen für das wachstum eines organismus
Tagu et al. Techniques for molecular biology
Wu et al. Whole‐Genome Physical Mapping: An Overview on Methods for DNA Fingerprinting
US20020012911A1 (en) Novel method for the preselection of shotgun clones of the genome or a portion thereof of an organism
US6924112B1 (en) Cloning method by multiple-digestion, vectors for implementing same and applications
CA2309371A1 (en) Gene sequence tag method
US20080076672A1 (en) Methods for Identifying Genes Essential to the Growth of an Organism
Hilborne et al. Diagnostic applications of recombinant nucleic acid technology: basic techniques
Milosavljevic et al. Shotgun sequencing, clone pooling, and comparative strategies for mapping and sequencing
Jordan The microarray paradigm and its various implementations
Green PCR-based screening of yeast artificial chromosome libraries
WO1998043088A1 (en) Method of producing a subtraction library
Vijg Introduction to recombinant DNA technology
Zhu Use of PCR in Library Screening: An Overview
Carpten Construction of a long-range physical map of the region containing the spinal muscular atrophy gene

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19990712

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): BE CH DE DK FR GB IT LI NL

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Withdrawal date: 20020228