WO2011071382A1 - Polymorfphic whole genome profiling - Google Patents

Polymorfphic whole genome profiling Download PDF

Info

Publication number
WO2011071382A1
WO2011071382A1 PCT/NL2010/050836 NL2010050836W WO2011071382A1 WO 2011071382 A1 WO2011071382 A1 WO 2011071382A1 NL 2010050836 W NL2010050836 W NL 2010050836W WO 2011071382 A1 WO2011071382 A1 WO 2011071382A1
Authority
WO
WIPO (PCT)
Prior art keywords
adapter
sequence
fragment
genome
fragments
Prior art date
Application number
PCT/NL2010/050836
Other languages
French (fr)
Inventor
An Michels
Adriaan Jan Van Oeveren
Original Assignee
Keygene N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Keygene N.V. filed Critical Keygene N.V.
Publication of WO2011071382A1 publication Critical patent/WO2011071382A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • the present invention relates to the field of molecular biology and biotechnology.
  • the invention relates to the field of nucleic acid detection and identification. More in particular the invention relates to the generation of a physical map of a genome, or part thereof, using high-throughput sequencing technology, combined with the identification of polymorphic markers that are mapped to the physical map.
  • the invention further relates to the development of marker assays of the discovered polymorphic markers and the use of the markers in the generation of a genetic map. More in particular, the invention relates to the identification of sequence tags, from which a physical map is created, and polymorphic sequence tags, from which a genetic map is made, and which can be integrated into a genome wide high density physical map. Background of the invention
  • Integrated genetic and physical genome maps are extremely valuable for map-based gene isolation, comparative genome analysis and as sources of sequence-ready clones for genome sequencing projects.
  • the effect of the availability of an integrated map of physical and genetic markers of a species for genome research is enormous.
  • Integrated maps allow for precise and rapid gene mapping and precise mapping of microsatellite loci and SNP markers.
  • Various methods have been developed for assembling physical maps of genomes of varying complexity.
  • One of the better characterized approaches use restriction enzymes to generate large numbers of DNA fragments from genomic subclones (Brenner et al. , Proc. Natl. Acad. Sci. , (1989), 86, 8902-8906; Gregory et al., Genome Res.
  • a physical map is generated from a combination of restriction enzyme digestion of clones in a library, pooling, restriction enzyme digestion, adapter-ligation, (selective) amplification, high-throughput sequencing and deconvolution of the resulting sequences results into BAC clone specific sets that can be used to assemble physical maps.
  • the assembly of the clones into contigs is based on the co- presence of terminal nucleotide sequences of the sequenced fragments which can be used as sequence based anchor points for additional linkage of sequence data. More in detail, the technology disclosed in WO2008007951 is Whole Genome Profiling (WGP) , KeyGene's recently developed proprietary approach for sequence based physical mapping.
  • WGP Whole Genome Profiling
  • a BAC library is constructed from a single homozygous individual and BAC clones are pooled in a multi-dimensional format.
  • BAC pools are characterized by pool specific tags to allow assignment of sequences to individual BAC clones based on the coordinates in the multidimensional pool screening.
  • DNA is extracted from each BAC pool and digested with restriction enzymes, for instance EcoRI and Msel.
  • restriction enzymes for instance EcoRI and Msel.
  • the EcoRI ends of the restriction fragments are analyzed on a next-generation sequencer such as the lllumina Genome Analyzer and in this way these relative short (20-100 basepairs) sequenced fragments, called the WGP Tags, can be assigned to individual BACs.
  • BACs can be assembled based on overlapping WGP tag patterns using a contiging software tool such as FPC
  • the WGP method is unique in providing sequence based anchor points instead of fragment lengths for assembly of BAC contigs. Sequence based anchors are more accurate and provide the basis for assembly of Whole Genome Shotgun data.
  • sequencing refers to determining the order of nucleotides (base sequences) in a nucleic acid sample, e.g. DNA or RNA.
  • bases sequences e.g. DNA or RNA.
  • Many techniques are available such as Sanger sequencing and high-throughput sequencing technologies (also known as next- generation sequencing technologies) such as the GS FLX platform offered by Roche Applied Science, and the Genome Analyzer from lllumina, both based on pyrosequencing.
  • Restriction endonuclease a restriction endonuclease or restriction enzyme is an enzyme that recognizes a specific nucleotide sequence (target site) in a double-stranded DNA molecule, and will cleave both strands of the DNA molecule at or near every target site, leaving a blunt or a staggered end.
  • Frequent cutters and rare cutters Restriction enzymes typically have recognition sequences that vary in number of nucleotides from 3, 4 (such as Msel) to 6 (EcoRI) and even 8 (Notl).
  • the restriction enzymes used can be frequent and rare cutters. The term 'frequent' in this respect is typically used in relation to the term 'rare'.
  • Frequent cutting endonucleases are restriction endonucleases that have a relatively short recognition sequence. Frequent cutters typically have 3-5 nucleotides that they recognise and
  • a frequent cutter on average cuts a DNA sequence every 64-1024 nucleotides.
  • Rare cutters are restriction endonucleases that have a relatively long recognition sequence. Rare cutters typically have 6 or more nucleotides that they recognise and subsequently cut. Thus, a rare 6-cutter on average cuts a DNA sequence every 4096 nucleotides, leading to longer fragments. It is observed again that the definition of frequent and rare is relative to each other, meaning that when a 4 bp restriction enzyme, such as Msel, is used in combination with a 5-cutter such as Avail, Avail is seen as the rare cutter and Msel as the frequent cutter.
  • Restriction fragments the DNA molecules produced by digestion with a restriction endonuclease are referred to as restriction fragments. Any given genome (or nucleic acid, regardless of its origin) will be digested by a particular restriction endonuclease into a discrete set of restriction fragments.
  • the DNA fragments that result from restriction endonuclease cleavage can be further used in a variety of techniques and can for instance be detected by gel electrophoresis.
  • Ligation the enzymatic reaction catalyzed by a ligase enzyme in which two double- stranded DNA molecules are covalently joined together is referred to as ligation.
  • ligation the enzymatic reaction catalyzed by a ligase enzyme in which two double- stranded DNA molecules are covalently joined together.
  • both DNA strands are covalently joined together, but it is also possible to prevent the ligation of one of the two strands through chemical or enzymatic modification of one of the ends of the strands. In that case the covalent joining will occur in only one of the two DNA strands.
  • Synthetic oligonucleotide single-stranded DNA molecules having preferably from about 10 to about 50 bases, which can be synthesized chemically are referred to as synthetic oligonucleotides.
  • synthetic DNA molecules are designed to have a unique or desired nucleotide sequence, although it is possible to synthesize families of molecules having related sequences and which have different nucleotide compositions at specific positions within the nucleotide sequence.
  • synthetic oligonucleotide will be used to refer to DNA molecules having a designed or desired nucleotide sequence.
  • Adapters short double-stranded DNA molecules with a limited number of base pairs, e.g. about 10 to about 30 base pairs in length, which are designed such that they can be ligated to the ends of restriction fragments.
  • Adapters are generally composed of two synthetic oligonucleotides which have nucleotide sequences which are partially complementary to each other. When mixing the two synthetic oligonucleotides in solution under appropriate conditions, they will anneal to each other forming a double-stranded structure.
  • one end of the adapter molecule is designed such that it is compatible with the end of a restriction fragment and can be ligated thereto; the other end of the adapter can be designed so that it cannot be ligated, but this need not be the case (double ligated adapters).
  • Adapter-ligated restriction fragments restriction fragments that have been capped by adapters.
  • primers in general, the term primers refer to DNA strands which can prime the synthesis of DNA.
  • DNA polymerase cannot synthesize DNA de novo without primers: it can only extend an existing DNA strand in a reaction in which the complementary strand is used as a template to direct the order of nucleotides to be assembled.
  • primers we will refer to the synthetic oligonucleotide molecules which are used in a polymerase chain reaction (PCR) as primers.
  • DNA amplification the term DNA amplification or amplification will be typically used to denote the in vitro synthesis of double-stranded DNA molecules using PCR. It is noted that other amplification methods exist and they may be used in the present invention without departing from the gist.
  • Tagging refers to the addition of a sequence tag to a nucleic acid sample in order to be able to distinguish it from a second or further nucleic acid sample.
  • Tagging can e.g. be performed by the addition of a sequence identifier during complexity reduction or by any other means known in the art such as a separate ligation step.
  • a sequence identifier can e.g. be a unique base sequence of varying but defined length uniquely used for identifying a specific nucleic acid sample. Typical examples are ZIP sequences, known in the art as commonly used tags for unique detection by hybridization (lannone et al. Cytometry 39:131 -140, 2000).
  • nucleotide based tags the origin of a sample, a clone or an amplified product can be determined upon further processing.
  • the different nucleic acid samples should be identified using different tags.
  • Identifier a short sequence that can be added to an adapter or a primer or included in its sequence or otherwise used as label to provide a unique identifier (aka barcode or index).
  • identifiers can be sample specific, pool specific, clone specific, amplicon specific etc.
  • the different nucleic acid samples are generally identified using different identifiers.
  • Identifiers preferably differ from each other by at least two base pairs and preferably do not contain two identical consecutive bases to prevent misreads.
  • the identifier function can sometimes be combined with other functionalities such as adapters or primers and can be located at any convenient position.
  • Tagged library refers to a library of tagged nucleic acids.
  • Aligning and alignment With the term “aligning” and “alignment” is meant the comparison of two or more nucleotide sequence based on the presence of short or long stretches of identical or similar nucleotides. Several methods for alignment of nucleotide sequences are known in the art, as will be further explained below.
  • a contig is used in connection with DNA sequence analysis, and refers to assembled contiguous stretches of DNA derived from two or more DNA fragments having contiguous nucleotide sequences.
  • a contig is a set of overlapping DNA fragments that provides a partial contiguous sequence of a genome.
  • the term 'contig' is also used to indicate a contiguous stretch of, for instance, BACs, a "BAC-contig'.
  • a BAC contig can also be made on marker analysis, i.e. a more indirect way of sequence analysis.
  • a "scaffold" is defined as a series of contigs that are in the correct order, but are not connected in one continuous sequence , i.e. contain gaps.
  • Contig maps also represent the structure of contiguous regions of a genome by specifying overlap relationships among a set of clones.
  • the term "contigs” encompasses a series of cloning vectors which are ordered in such a way as to have each sequence overlap that of its neighbours.
  • the linked clones can then be grouped into contigs, either manually or, preferably, using appropriate computer programs such as FPC, PHRAP, CAP3 etc.
  • Complexity reduction is used to denote a method wherein the complexity of a nucleic acid sample, such as genomic DNA, is reduced by the generation or selection of a subset of the sample.
  • This subset can be representative for the whole (i.e. complex) sample and is preferably a reproducible subset. Reproducible means in this context that when the same sample is reduced in complexity using the same method and experimental conditions, the same, or at least comparable, subset is obtained.
  • the method used for complexity reduction may be any method for complexity reduction known in the art. Examples of methods for complexity reduction include for example AFLP® (Keygene N.V., the Netherlands; see e.g. EP 0 534 858), the methods described by Dong (see e.g.
  • WO 03/0121 18, WO 00/24939 indexed linking
  • Unrau et al., vide infra indexed linking
  • the complexity reduction methods used in the present invention have in common that they are reproducible. Reproducible in the sense that when the same sample is reduced in complexity in the same manner, the same subset of the sample is obtained, as opposed to more random complexity reduction such as microdissection, random shearing, or the use of mRNA (cDNA) which represents a portion of the genome transcribed in a selected tissue and for its reproducibility is depending on the selection of tissue, time of isolation etc..
  • cDNA mRNA
  • DNA amplification the term DNA amplification will be typically used to denote the in vitro synthesis of double-stranded DNA molecules using PCR. It is noted that other amplification methods exist and they may be used in the present invention without departing from the gist.
  • High-throughput screening is a method for scientific experimentation especially relevant to the fields of biology and chemistry. Through a combination of modern robotics and other specialised laboratory hardware, it allows a researcher to effectively screen large amounts of samples
  • Artificial clone library a population of hosts (bacteria, yeast), each of which carries a DNA molecule that was inserted into a cloning vector such that a representation of the genome of an organism is present (usually an entrire genome) of clones, artificial
  • BAC Bacterial Artificial Chromosome, a DNA construct, usually based on a functional fertilility plasmid (or F-plasmid), used for transforming and cloning in bacteria, usually E. coli.
  • the usual insert size is 100-350 kb but can also be 700kb.
  • BACs are often used to sequence the genomes of organisms whereby a short piece of the organism's DNA is amplified as an insert in BACs and then sequenced. Rearrangement of the sequenced parts in silico provides the genome sequence of the organism.
  • polymorphism refers to the presence of two or more variants of a nucleotide sequence in a population.
  • a polymorphism may comprise one or more base changes, an insertion, a repeat, or a deletion.
  • a polymorphism includes e.g. a simple sequence repeat (SSR) and a single nucleotide polymorphism (SNP), which is a variation, occurring when a single nucleotide: adenine (A), thymine (T), cytosine (C) or guanine (G) - is altered.
  • SSR simple sequence repeat
  • SNP single nucleotide polymorphism
  • a variation must generally occur in at least 1 % of the population to be considered a SNP.
  • SNPs make up e.g. 90% of all human genetic variations, and occur every 100 to 300 bases along the human genome. Two of every three SNPs substitute Cytosine (C) with
  • Thymine Variations in the DNA sequences of e.g. humans or plants can affect how they handle diseases, bacteria, viruses, chemicals, drugs, etc.
  • Heterozygous An organism is heterozygous for a particular gene when different alleles occupy the gene's position on the homologous chromosomes
  • Homozygous An organism is homozygous for a particular gene when identical alleles are present on both homologous chromosomes Summary of the invention
  • the present inventors found that 'Whole Genome Profiling' or WGP has proven to work well in providing high-quality physical maps. However, from analysis and validation on several data sets, it was observed that sometimes 'gaps' (missing WGP tags) in assembled BACs occur: one overlapping BAC might not show the same set of WGP tags as another BAC covering the same region. These gaps could result from sequence errors or incomplete deconvolution due to inadequate sequencing depth, however the inventors realised that they could also have a biological background. If there would be a SNP or short indel within the WGP tag region, this would result in a polymorphic tag: either a present/absent tag or a tag with a SNP or indel. Exactly these latter SNP variants provide a unique opportunity to use them as genetic markers.
  • the present inventors have found by using artificial chromosome libraries (particularly BAC libraries) from a heterozygous genome sample or a combination of two or more homozygous genome samples, this observed effect can be used in an efficient way to create a physical map and, based on the same data set, screen the data for the presence of polymorphisms within the WGP-tags. This will significantly reduce the effort in SNP discovery and performing a large scale genotyping experiment.
  • the addition of a (rough scale) genotyping and genetic mapping effort provides a link of BAC contigs into linkage groups. This genetic map can then be extended to a high density, high resolution map by adding all SNP tags from their positions as known from the BAC contigs, resulting in an integrated physical and genetic map.
  • FIG. 6 Various adapter-primer combinations containing identifiers to yield tagged fragments.
  • A adapter nucleotide
  • F fragment nucleotide
  • P primer nucleotide
  • 1. adapter ligated fragment; 2. adapter ligated fragment containing identifier, a. amplification with primer directed against identifier, b. amplification with primer directed against adapter; 3. adapter ligated fragment containing degenerate identifier section, amplification with primer directed against degenerate identifier and introducing identifier in amplified fragment. 4. adapter ligated fragment containing no identifier, amplification with primer introducing identifier in amplified fragment.
  • Figure 7 Integration of genetic and physical map.
  • Figure 8 Alignment of BAC clones for contig 293. Indicated by the rectangular dotted box is the likely position of the polymorphic WGP tag SNP1 .
  • FIG. 9 Alignment of BAC clones for contig 307. Indicated by the rectangular dotted box is the likely position of the polymorphic WGP tag SNP2.
  • the invention relates to a method for the generation of a physical map of a sample genome and identification of polymorphisms, comprising the steps of:
  • an artificial chromosome e.g. BAC, YAC
  • each artificial chromosome clone contains DNA from a sample genome, wherein the sample genome is selected from the group consisting of
  • step (h) aligning the sequenced fragments based on the determined sequences in step (e);
  • step (i) determining polymorphisms between the aligned sequences of step (h).
  • an artificial clone bank is provided.
  • the library can be a Bacterial Artificial Chromosome library (BAC) or based on yeast (YAC). Other libraries such as based on fosmids, cosmids, PAC, TAC or MAC are also possible.
  • BAC library is preferably of a high quality and preferably is a high insert size genomic library. This means that the individual BAC contains a relative large insert of the genomic DNA under investigation (typically > 100 kbp). The size of the preferred large insert is species-dependent.
  • BACs as examples of artificial chromosomes.
  • the present invention is not limited thereto and that other artificial chromosomes can be used without departing from the gist of the invention.
  • the libraries contain at least five genome equivalents, more preferably at least 7, most preferably at least 8. Particularly preferred is at least 10. The higher the number of genome equivalents in the library, the more reliable the resulting contigs and physical map will be.
  • sample genome DNA is thus selected from the group consisting of a
  • heterozygous sample genome a combination of two or more homozygous sample genomes; and a combination of at least one heterozygous and at least one homozygous sample genome.
  • the individual clones in the library are pooled to form pools containing a multitude of artificial chromosomes or clones.
  • the pooling may be the simple combination of a number of individual clones into one sample (for example, 100 clones into 10 pools, each containing 10 clones), but also more elaborate pooling strategies may be used.
  • the distribution of the clones over the pools is preferably such that each clone is present in at least two or more of the pools.
  • the pools contain from 10 to 10000 clones per pool, preferably from 100 to 1000, more preferably from 250 to 750. It is observed that the number of clones per pool can vary widely, and this variation is related to, for instance, the size of the genome under investigation.
  • the maximum size of a pool or a sub-pool is governed by the ability to uniquely identify a clone in a pooling set by a set of identifiers.
  • a typical range for a genome equivalent in a pool set is in the order of 0.2 - 0.3, and this may again vary per genome.
  • the pools are generated based on pooling strategies well known in the art. The skilled man is capable selecting the optimal pooling strategy based on factors such as genome size etc.
  • the resulting pooling strategy will depend on the circumstances, and examples thereof are plate pooling, N-dimensional pooling such as 2D-pooling, 3D- pooling, 6D-pooling or complex pooling.
  • the pools may, on their turn, be combined in super-pools (i.e. super-pools are pools of pools of clones) or divided into sub-pools.
  • deconvolution i.e. the correct identification of the individual clone in a library by detection of the presence of a known associated indicator (i.e. label or identifier) of the clone in one or more pools or subpools
  • a known associated indicator i.e. label or identifier
  • the pooling strategy is preferably such that every clone in the library is distributed in such over the pools that a unique combination of pools is made for every clone. The result thereof is that a certain combination of (sub)pools uniquely identifies a clone.
  • the pools are digested with restriction endonucleases to yield restriction fragments.
  • Each pool is preferably separately subjected to an endonuclease digest.
  • Each pool is treated with the same (combination of) endonuclease(s).
  • Restriction endonucleases may be frequent cutters (4 or 5 cutters, such as Msel or Pstl) or rare cutters (6 and more cutters such as EcoRI, Hindlll).
  • restriction endonucleases are selected such that restriction fragments are obtained that are, on average, present in an amount or have a certain length distribution that is adequate for the subsequent steps.
  • two or more restriction endonucleases can be used and in certain embodiments, combinations of rare and frequent cutters can be used. For large genomes the use of, for instance, three or more restriction endonucleases can be used advantageously.
  • adapters are ligated in step (d) to provide for adapter-ligated restriction fragments.
  • adapters are synthetic oligonucleotides as defined herein elsewhere.
  • the adapters used in the present invention preferably contain an identifier section, in essence as defined herein elsewhere to provide for 'tagged adapters' .
  • the adapter contains a pool-specific identifier, i.e.
  • an adapter containing a unique identifier is used that unequivocally indicates the pool.
  • the adapter contains a degenerate identifier section which is used in combination with a primer containing a pool-specific identifier.
  • the adapter-ligated restriction fragments can be combined in larger groups, in particular when the adapters contain a pool-specific identifier. This combination in larger groups may aid in reducing the number of parallel amplifications of each set of adapter-ligated restriction fragments obtained from a pool.
  • the adapters that are ligated do not contain an identifier or a degenerate identifier section.
  • the adapter-ligated fragments are subsequently amplified using primers that contain identifiers (tags), for instance at their 5'end. The result is that amplified, tagged adapter-ligated fragments are obtained.
  • the adapters can be the same for a plurality (or all) of pools and the amplification using tagged primers creates the distinction between the pools that can later be used in the deconvolution.
  • the tagged adapter-ligated fragment can be amplified.
  • the amplification may serve to reduce the complexity or to increase the amount the DNA available for analysis.
  • the amplification can be performed using a set of primers that are at least partly complementary to the adapters and or the tags/identifiers. This amplification may be independently from the amplification described herein above that introduces the tags into the adapters. In certain embodiments, the amplification may serve several purposes at a time, i.e. reduce complexity, increase DNA amount and introduce tags in the adapter-ligated fragments in the pools.
  • the adapter-ligated fragments can be combined in larger groups, in particular when the adapters contain a pool-specific identifier. This combination in larger groups may aid in reducing the number of parallel amplifications of each set of adapter- ligated restriction obtained from a pool.
  • the adapter-ligated fragments can be amplified using a set of primers of which at least one primer amplifies the pool-specific identifier at the position of the pool-specific or degenerate identifier in the adapter.
  • the primer may contain (part of) the identifier, but the primer may also be complementary to a section of the adapter that is located outside the tag, i.e. downstream in the adapter. Amplification then also amplifies the tag. See in this respect Fig 6 for various embodiments.
  • step (e) part of the sequence of the tagged adapter-ligated fragment is determined.
  • the tagged adapter-ligated fragments are subjected to sequencing, preferably high throughput sequencing as described herein elsewhere. During sequencing, at least part of the nucleotide sequence of the (amplified) tagged adapter-ligated fragment is determined.
  • At least the sequence of the pool-specific identifier and part of the fragment (i.e. derived from the sample genome) of the (amplified) tagged adapter-ligated fragment is determined.
  • a sequence of at least 10 nucleotides of the fragment is determined.
  • at least 15, 20, 25, 30 or 35 nucleotides of the fragment (i.e. derived from the sample genome) are determined.
  • the number of nucleotides that are to be determined minimally will be, again, genome- as well as sequencing platform dependent. For instance, in plants more repetitive sequences are present, hence longer sequences (50-150 nucleotides) may to be determined for a contig of comparable quality.
  • the sequence library may be sequenced with an average redundancy level (aka oversampling rate) of at least 5.
  • an average redundancy level (aka oversampling rate) of at least 5.
  • the sequence is determined of at least 5 amplicons obtained from the amplification of one specific adapter- ligated fragment.
  • each fragment is (statistically) sequenced on average at least five times.
  • Increased redundancy is preferred as its improves the fraction of fragments that are sampled in each pool and the accuracy of these sequences, so preferably the redundancy level is at least 7, more preferably a least 10.
  • Increased average sequencing redundancy levels are used to compensate for a phenomenon that is known as 'sampling variation', i.e.
  • sequencing is performed using high-throughput sequencing methods, such as the methods disclosed in WO 03/004690, WO 03/054142, WO
  • step (f) the (partly) sequenced (amplified) tagged adapter-ligated fragments are correlated to the corresponding clone, typically in silico by means of computerized methods.
  • the (amplified) tagged adapter-ligated fragments are selected that contain identical sections of nucleotides in the restriction fragment-derived part.
  • the different pool-specific identifiers (tags) are identified that are present in those (amplified) tagged adapter-ligated fragments.
  • the combination of the different pool-specific identifiers and hence the sequence of the restriction fragment can be uniquely assigned to a specific clone (a process indicated as 'deconvolution').
  • each pool in the library is uniquely addressed by a combination of 3 pool- specific identifiers with the same restriction fragment-derived section.
  • a restriction fragment-derived section originating from a clone will be tagged with 3 different identifiers.
  • Unique restriction fragment-derived sections when observed in combination with the 3 identifiers can be assigned to a single BAC clone. This can be repeated for each (amplified) tagged adapter-ligated fragment that contains other unique sections of nucleotides in the restriction fragment-derived part.
  • the clones are combined and ordered into clone contigs in step (g) of the method.
  • the grouping and ordering can be performed by fingerprint contiging software for this purpose such as FPC software (Soderlund et al (1997) FPC: a system for building contigs from restriction fingerprinted clones. Comput. Appl. Biosci., 13:523-535.) essentially as described herein elsewhere.
  • FPC software Serlund et al (1997) FPC: a system for building contigs from restriction fingerprinted clones. Comput. Appl. Biosci., 13:523-535.
  • the alignment of the clones into contigs and the corresponding order of WGP tags generates a physical map of the sample genome.
  • the above steps can be performed independently for each genome sample. More in particular, in certain embodiments, for each heterozygous and/or homozygous sample genome steps (a) - (g) are performed independently. The steps relating to the screening for polymorphisms in the subsequent steps are taken from the separate steps (e).
  • sequenced fragments are aligned based on the determined sequences in step (e) of the method.
  • sequenced fragments comprise a sample identification tag, identifying the pool from which the BAC was derived (and hence the fragment).
  • the sequence fragments further comprise the remains of the restriction site and further a sequence that is derived form the sample genome (sometimes indicated as the 'sequence tag').
  • step (i) the polymorphisms are determined between the aligned sequences of the WGP tags of step (h).
  • a polymorphism in the WGP tag region will result in two variants, with a SNP (or Indel) somewhere in the 65 nt, non-restriction site part of a (75 nt sequence read length)) WGP tag (which is now termed 'SNP Tag', i.e. a SNP Tag is a WGP tag that contains a polymorphism or an indel between its two variants, See fig 3). Given equal portions of BACs and pool samples, the two variants ('alleles') of each SNP Tag are expected to be found in a 50/50 distribution.
  • a polymorphism In those cases where a polymorphism is present in the restriction site region, it will cause the WGP tag to be present in only 50% of the BACs from the corresponding region in the genome. These tags are less useful to discriminate between alleles as it is not sure whether the tags are missing because the region was not covered by the BAC, or because of a polymorphism.
  • the low number of SNP tags will still allow the formation of contigs from overlapping BACs, as obtained in a standard WGP approach, albeit with less stringent alignment (FPC) settings.
  • FPC stringent alignment
  • a binning step is executed first to combine allelic variants (and possibly sequence errors) into a single WPG tag followed by a FPC step as for a homozygous line (i.e. with more stringent settings).
  • an additional analysis will be done to identify "SNP tags" and their position on the contigs or physical map.
  • the discovered polymorphism and SNPs are converted into conventional SNP assays using conventional technology.
  • SNP tags can due to the existence of sequence information be converted to PCR-assays, invader assays, Golden Gate assays and the like.
  • the developed assay can be validated on the sample genomes(s) used to generate the SNPs.
  • the discovered SNP tags are both physical markers (as their position on the physical map is known) and genetic markers, as they differ between sample genomes. This allows to couple genetic and physical maps (See Fig 7).
  • SNPs can be followed in subsequent crosses, for instance in the offspring of crosses between parents of which at least one of the parents was used to create the physical map.
  • the SNPs can serve as genetic markers that are linked to the physical map.
  • the BACs used to generate the physical map can be anchored to the genetic map, resulting in a high resolution map based on a scaffold of genetic markers (SNP tags) and supplemented with WGP tags. It is further possible to also link genetic markers obtained via other ways to the map to further complete the high resolution map.
  • a further aspect of the invention relates to a method for the generation of a linked genetic map of genetic markers (SNP tags) and a physical map (of WGP tags) comprising
  • step (i) providing two parents, wherein at least one of the parents has been used as a sample genome in the generation of the physical map that provided the SNP tags of step (i);
  • the genetic linkage map is constructed using the markers discovered using the method of the invention.
  • the map can be constructed by observing how frequently pairs of SNP Tag markers are inherited together after selfing an F1 population and analyzing the F2 individuals (about 100).
  • the thus obtained genetic map shows the relative locations of these SNPs along the chromosome, thus providing the link with the position of the SNP tags on the BAC contigs of the physical map.
  • This example is to provide an illustration for the polymorphic Whole Genome Profiling (pWGP) concept described herein.
  • the invention is aimed to perform physical mapping and SNP discovery at the same time, by executing a WGP project on a BAC library derived from a heterozygote individual or a combination of two polymorphic individuals.
  • data have been used from both Arabidopsis thaliana Columbia and Landsberg erecta ecotypes.
  • WGP tags were identified which were specific to either the Col or the Ler data.
  • a single Col tag matches a single Ler tag with only one nucleotide difference, than such a tag is a putative SNP marker.
  • Two of such candidates and corresponding BAC clones are identified. They are presented in Table 3.

Abstract

The invention relates to a method for the generation of a physical map of a sample genome combined with the identification of polymorphisms by providing a physical map of a BAC library of a heterozygous sample genome based on the high throughput sequencing of tagged adapter-ligated restriction fragments of pooled BAC clones and determining polymorphisms between the tagged adapter-ligated restriction fragments.

Description

Title: Polymorphic whole genome profiling
Technical field of the invention
The present invention relates to the field of molecular biology and biotechnology. In particular, the invention relates to the field of nucleic acid detection and identification. More in particular the invention relates to the generation of a physical map of a genome, or part thereof, using high-throughput sequencing technology, combined with the identification of polymorphic markers that are mapped to the physical map. The invention further relates to the development of marker assays of the discovered polymorphic markers and the use of the markers in the generation of a genetic map. More in particular, the invention relates to the identification of sequence tags, from which a physical map is created, and polymorphic sequence tags, from which a genetic map is made, and which can be integrated into a genome wide high density physical map. Background of the invention
Integrated genetic and physical genome maps are extremely valuable for map-based gene isolation, comparative genome analysis and as sources of sequence-ready clones for genome sequencing projects. The effect of the availability of an integrated map of physical and genetic markers of a species for genome research is enormous. Integrated maps allow for precise and rapid gene mapping and precise mapping of microsatellite loci and SNP markers. Various methods have been developed for assembling physical maps of genomes of varying complexity. One of the better characterized approaches use restriction enzymes to generate large numbers of DNA fragments from genomic subclones (Brenner et al. , Proc. Natl. Acad. Sci. , (1989), 86, 8902-8906; Gregory et al., Genome Res. (1997), 7, 1 162-1 168; Marra et al., Genome Res. (1997), 7, 1072-1084). The fingerprints created from these restriction fragments are compared to identify related clones and to assemble overlapping clones in contigs. The utility of fingerprinting for ordering large insert clones of a complex genome is limited however, due to variation in DNA migration from gel to gel, the presence of repetitive DNAs, unusual distribution of restriction sites and skewed clone representation. Most high quality physical maps of complex genomes have therefore been constructed using a combination of fingerprinting and PCR-based or hybridisation based methods. However, one of the disadvantages of the use of fingerprinting technology is that it is based on fragment-pattern matching, which is an indirect method and hence less preferred than so- called direct methods such as sequencing. Recently methods for high throughput sequencing have been made available that would allow for the determination of BAC clone ordering in a more efficient and cost-effective manner. Efficient methods are for instance disclosed in applicants own WO2008007951 relating to high throughput physical mapping. In this method, a physical map is generated from a combination of restriction enzyme digestion of clones in a library, pooling, restriction enzyme digestion, adapter-ligation, (selective) amplification, high-throughput sequencing and deconvolution of the resulting sequences results into BAC clone specific sets that can be used to assemble physical maps. The assembly of the clones into contigs is based on the co- presence of terminal nucleotide sequences of the sequenced fragments which can be used as sequence based anchor points for additional linkage of sequence data. More in detail, the technology disclosed in WO2008007951 is Whole Genome Profiling (WGP) , KeyGene's recently developed proprietary approach for sequence based physical mapping. Typically, a BAC library is constructed from a single homozygous individual and BAC clones are pooled in a multi-dimensional format. BAC pools are characterized by pool specific tags to allow assignment of sequences to individual BAC clones based on the coordinates in the multidimensional pool screening. DNA is extracted from each BAC pool and digested with restriction enzymes, for instance EcoRI and Msel. The EcoRI ends of the restriction fragments are analyzed on a next-generation sequencer such as the lllumina Genome Analyzer and in this way these relative short (20-100 basepairs) sequenced fragments, called the WGP Tags, can be assigned to individual BACs. In a next step BACs can be assembled based on overlapping WGP tag patterns using a contiging software tool such as FPC
(Soderlund et al.). Typically this leads to contigs of assembled BACs, with WGP tags every 2 to 3 kilobases, about 30-40 tags per BAC clone.
Compared to other physical mapping approaches such as SNaPshot mapping, the WGP method is unique in providing sequence based anchor points instead of fragment lengths for assembly of BAC contigs. Sequence based anchors are more accurate and provide the basis for assembly of Whole Genome Shotgun data.
Definitions
Sequencing: The term sequencing refers to determining the order of nucleotides (base sequences) in a nucleic acid sample, e.g. DNA or RNA. Many techniques are available such as Sanger sequencing and high-throughput sequencing technologies (also known as next- generation sequencing technologies) such as the GS FLX platform offered by Roche Applied Science, and the Genome Analyzer from lllumina, both based on pyrosequencing.
Restriction endonuclease: a restriction endonuclease or restriction enzyme is an enzyme that recognizes a specific nucleotide sequence (target site) in a double-stranded DNA molecule, and will cleave both strands of the DNA molecule at or near every target site, leaving a blunt or a staggered end.
Frequent cutters and rare cutters: Restriction enzymes typically have recognition sequences that vary in number of nucleotides from 3, 4 (such as Msel) to 6 (EcoRI) and even 8 (Notl). The restriction enzymes used can be frequent and rare cutters. The term 'frequent' in this respect is typically used in relation to the term 'rare'. Frequent cutting endonucleases (aka frequent cutters) are restriction endonucleases that have a relatively short recognition sequence. Frequent cutters typically have 3-5 nucleotides that they recognise and
subsequently cut. Thus, a frequent cutter on average cuts a DNA sequence every 64-1024 nucleotides. Rare cutters are restriction endonucleases that have a relatively long recognition sequence. Rare cutters typically have 6 or more nucleotides that they recognise and subsequently cut. Thus, a rare 6-cutter on average cuts a DNA sequence every 4096 nucleotides, leading to longer fragments. It is observed again that the definition of frequent and rare is relative to each other, meaning that when a 4 bp restriction enzyme, such as Msel, is used in combination with a 5-cutter such as Avail, Avail is seen as the rare cutter and Msel as the frequent cutter.
Restriction fragments: the DNA molecules produced by digestion with a restriction endonuclease are referred to as restriction fragments. Any given genome (or nucleic acid, regardless of its origin) will be digested by a particular restriction endonuclease into a discrete set of restriction fragments. The DNA fragments that result from restriction endonuclease cleavage can be further used in a variety of techniques and can for instance be detected by gel electrophoresis.
Ligation: the enzymatic reaction catalyzed by a ligase enzyme in which two double- stranded DNA molecules are covalently joined together is referred to as ligation. In general, both DNA strands are covalently joined together, but it is also possible to prevent the ligation of one of the two strands through chemical or enzymatic modification of one of the ends of the strands. In that case the covalent joining will occur in only one of the two DNA strands.
Synthetic oligonucleotide: single-stranded DNA molecules having preferably from about 10 to about 50 bases, which can be synthesized chemically are referred to as synthetic oligonucleotides. In general, these synthetic DNA molecules are designed to have a unique or desired nucleotide sequence, although it is possible to synthesize families of molecules having related sequences and which have different nucleotide compositions at specific positions within the nucleotide sequence. The term synthetic oligonucleotide will be used to refer to DNA molecules having a designed or desired nucleotide sequence.
Adapters: short double-stranded DNA molecules with a limited number of base pairs, e.g. about 10 to about 30 base pairs in length, which are designed such that they can be ligated to the ends of restriction fragments. Adapters are generally composed of two synthetic oligonucleotides which have nucleotide sequences which are partially complementary to each other. When mixing the two synthetic oligonucleotides in solution under appropriate conditions, they will anneal to each other forming a double-stranded structure. After annealing, one end of the adapter molecule is designed such that it is compatible with the end of a restriction fragment and can be ligated thereto; the other end of the adapter can be designed so that it cannot be ligated, but this need not be the case (double ligated adapters).
Adapter-ligated restriction fragments: restriction fragments that have been capped by adapters.
Primers: in general, the term primers refer to DNA strands which can prime the synthesis of DNA. DNA polymerase cannot synthesize DNA de novo without primers: it can only extend an existing DNA strand in a reaction in which the complementary strand is used as a template to direct the order of nucleotides to be assembled. We will refer to the synthetic oligonucleotide molecules which are used in a polymerase chain reaction (PCR) as primers.
DNA amplification: the term DNA amplification or amplification will be typically used to denote the in vitro synthesis of double-stranded DNA molecules using PCR. It is noted that other amplification methods exist and they may be used in the present invention without departing from the gist.
Tagging: the term tagging refers to the addition of a sequence tag to a nucleic acid sample in order to be able to distinguish it from a second or further nucleic acid sample. Tagging can e.g. be performed by the addition of a sequence identifier during complexity reduction or by any other means known in the art such as a separate ligation step. Such a sequence identifier can e.g. be a unique base sequence of varying but defined length uniquely used for identifying a specific nucleic acid sample. Typical examples are ZIP sequences, known in the art as commonly used tags for unique detection by hybridization (lannone et al. Cytometry 39:131 -140, 2000). Using nucleotide based tags, the origin of a sample, a clone or an amplified product can be determined upon further processing. In case of combining processed products originating from different nucleic acid samples, the different nucleic acid samples should be identified using different tags.
Identifier: a short sequence that can be added to an adapter or a primer or included in its sequence or otherwise used as label to provide a unique identifier (aka barcode or index). Such a sequence identifier (tag) can be a unique base sequence of varying but defined length, typically from 4-16 bp used for identifying a specific nucleic acid sample. For instance 4 bp tags allow 4(exp4) = 256 different tags. Using such an identifier, the origin of a PCR sample can be determined upon further processing or fragments can be related to a clone. Also clones in a pool can be distinguished from one another using these sequence based identifiers. Thus, identifiers can be sample specific, pool specific, clone specific, amplicon specific etc. In the case of combining processed products originating from different nucleic acid samples, the different nucleic acid samples are generally identified using different identifiers. Identifiers preferably differ from each other by at least two base pairs and preferably do not contain two identical consecutive bases to prevent misreads. The identifier function can sometimes be combined with other functionalities such as adapters or primers and can be located at any convenient position.
Tagged library: the term tagged library refers to a library of tagged nucleic acids. Aligning and alignment: With the term "aligning" and "alignment" is meant the comparison of two or more nucleotide sequence based on the presence of short or long stretches of identical or similar nucleotides. Several methods for alignment of nucleotide sequences are known in the art, as will be further explained below.
The term "contig" is used in connection with DNA sequence analysis, and refers to assembled contiguous stretches of DNA derived from two or more DNA fragments having contiguous nucleotide sequences. Thus, a contig is a set of overlapping DNA fragments that provides a partial contiguous sequence of a genome. The term 'contig' is also used to indicate a contiguous stretch of, for instance, BACs, a "BAC-contig'. A BAC contig can also be made on marker analysis, i.e. a more indirect way of sequence analysis. A "scaffold" is defined as a series of contigs that are in the correct order, but are not connected in one continuous sequence , i.e. contain gaps. Contig maps also represent the structure of contiguous regions of a genome by specifying overlap relationships among a set of clones. For example, the term "contigs" encompasses a series of cloning vectors which are ordered in such a way as to have each sequence overlap that of its neighbours. The linked clones can then be grouped into contigs, either manually or, preferably, using appropriate computer programs such as FPC, PHRAP, CAP3 etc.
Complexity reduction: the term complexity reduction is used to denote a method wherein the complexity of a nucleic acid sample, such as genomic DNA, is reduced by the generation or selection of a subset of the sample. This subset can be representative for the whole (i.e. complex) sample and is preferably a reproducible subset. Reproducible means in this context that when the same sample is reduced in complexity using the same method and experimental conditions, the same, or at least comparable, subset is obtained. The method used for complexity reduction may be any method for complexity reduction known in the art. Examples of methods for complexity reduction include for example AFLP® (Keygene N.V., the Netherlands; see e.g. EP 0 534 858), the methods described by Dong (see e.g. WO 03/0121 18, WO 00/24939), indexed linking (Unrau et al., vide infra), etc. The complexity reduction methods used in the present invention have in common that they are reproducible. Reproducible in the sense that when the same sample is reduced in complexity in the same manner, the same subset of the sample is obtained, as opposed to more random complexity reduction such as microdissection, random shearing, or the use of mRNA (cDNA) which represents a portion of the genome transcribed in a selected tissue and for its reproducibility is depending on the selection of tissue, time of isolation etc..
DNA amplification: the term DNA amplification will be typically used to denote the in vitro synthesis of double-stranded DNA molecules using PCR. It is noted that other amplification methods exist and they may be used in the present invention without departing from the gist.
High-throughput screening: High-throughput screening, often abbreviated as HTS, is a method for scientific experimentation especially relevant to the fields of biology and chemistry. Through a combination of modern robotics and other specialised laboratory hardware, it allows a researcher to effectively screen large amounts of samples
simultaneously.
Artificial clone library: a population of hosts (bacteria, yeast), each of which carries a DNA molecule that was inserted into a cloning vector such that a representation of the genome of an organism is present (usually an entrire genome) of clones, artificial
chromosome clone.
BAC: Bacterial Artificial Chromosome, a DNA construct, usually based on a functional fertilility plasmid (or F-plasmid), used for transforming and cloning in bacteria, usually E. coli. The usual insert size is 100-350 kb but can also be 700kb. BACsare often used to sequence the genomes of organisms whereby a short piece of the organism's DNA is amplified as an insert in BACs and then sequenced. Rearrangement of the sequenced parts in silico provides the genome sequence of the organism.
YAC: Yeast Artificial Library, the host is yeast, the insert size is larger (100-3000 kb), Polymorphism: polymorphism refers to the presence of two or more variants of a nucleotide sequence in a population. A polymorphism may comprise one or more base changes, an insertion, a repeat, or a deletion. A polymorphism includes e.g. a simple sequence repeat (SSR) and a single nucleotide polymorphism (SNP), which is a variation, occurring when a single nucleotide: adenine (A), thymine (T), cytosine (C) or guanine (G) - is altered. A variation must generally occur in at least 1 % of the population to be considered a SNP. SNPs make up e.g. 90% of all human genetic variations, and occur every 100 to 300 bases along the human genome. Two of every three SNPs substitute Cytosine (C) with
Thymine (T). Variations in the DNA sequences of e.g. humans or plants can affect how they handle diseases, bacteria, viruses, chemicals, drugs, etc.
Heterozygous: An organism is heterozygous for a particular gene when different alleles occupy the gene's position on the homologous chromosomes
Homozygous: An organism is homozygous for a particular gene when identical alleles are present on both homologous chromosomes Summary of the invention
The present inventors found that 'Whole Genome Profiling' or WGP has proven to work well in providing high-quality physical maps. However, from analysis and validation on several data sets, it was observed that sometimes 'gaps' (missing WGP tags) in assembled BACs occur: one overlapping BAC might not show the same set of WGP tags as another BAC covering the same region. These gaps could result from sequence errors or incomplete deconvolution due to inadequate sequencing depth, however the inventors realised that they could also have a biological background. If there would be a SNP or short indel within the WGP tag region, this would result in a polymorphic tag: either a present/absent tag or a tag with a SNP or indel. Exactly these latter SNP variants provide a unique opportunity to use them as genetic markers.
The present inventors have found by using artificial chromosome libraries (particularly BAC libraries) from a heterozygous genome sample or a combination of two or more homozygous genome samples, this observed effect can be used in an efficient way to create a physical map and, based on the same data set, screen the data for the presence of polymorphisms within the WGP-tags. This will significantly reduce the effort in SNP discovery and performing a large scale genotyping experiment. The addition of a (rough scale) genotyping and genetic mapping effort provides a link of BAC contigs into linkage groups. This genetic map can then be extended to a high density, high resolution map by adding all SNP tags from their positions as known from the BAC contigs, resulting in an integrated physical and genetic map.
Description of the figures Figure 1. Polymorphic physical mapping
Figure 2. Whole Genome Profiling approach
Figure 3. Whole Genome Profiling approach for polymorphic physical mapping Figure 4. WGP Tags
Figure 5. SNP Tags
Figure 6: Various adapter-primer combinations containing identifiers to yield tagged fragments. A= adapter nucleotide, F= fragment nucleotide , P= primer nucleotide,
N=degenerate nucleotide, ID= identifier nucleotides, -> direction of amplification. 1.= adapter ligated fragment; 2. adapter ligated fragment containing identifier, a. amplification with primer directed against identifier, b. amplification with primer directed against adapter; 3. adapter ligated fragment containing degenerate identifier section, amplification with primer directed against degenerate identifier and introducing identifier in amplified fragment. 4. adapter ligated fragment containing no identifier, amplification with primer introducing identifier in amplified fragment.
Figure 7: Integration of genetic and physical map.
Figure 8 Alignment of BAC clones for contig 293. Indicated by the rectangular dotted box is the likely position of the polymorphic WGP tag SNP1 .
Figure 9. Alignment of BAC clones for contig 307. Indicated by the rectangular dotted box is the likely position of the polymorphic WGP tag SNP2.
Detailed description of the invention
Thus in a first aspect the invention relates to a method for the generation of a physical map of a sample genome and identification of polymorphisms, comprising the steps of:
(a) providing an artificial chromosome (e.g. BAC, YAC) clone bank wherein each artificial chromosome clone contains DNA from a sample genome, wherein the sample genome is selected from the group consisting of
- a heterozygous sample genome,
- a combination of two or more homozygous sample genomes; and
- a combination of at least one heterozygous and at least one homozygous sample genome;
(b) pooling the clones from the artificial chromosome library into pools;
(c) providing a set of fragments for each pool using restriction enzymes;
(d) ligating adapters to the fragments;
(e) determining the sequence of at least part of the adapter and part of the fragment;
(f) assigning the fragments to the corresponding clones;
(g) ordering the clones into clone-contigs thereby generating a physical map of the sample genome;
(h) aligning the sequenced fragments based on the determined sequences in step (e);
(i) determining polymorphisms between the aligned sequences of step (h).
In step (a) of the method an artificial clone bank is provided. The library can be a Bacterial Artificial Chromosome library (BAC) or based on yeast (YAC). Other libraries such as based on fosmids, cosmids, PAC, TAC or MAC are also possible. Preferred is a BAC library. The library is preferably of a high quality and preferably is a high insert size genomic library. This means that the individual BAC contains a relative large insert of the genomic DNA under investigation (typically > 100 kbp). The size of the preferred large insert is species-dependent. Throughout this application, reference can be made to BACs as examples of artificial chromosomes. However, it is noted that the present invention is not limited thereto and that other artificial chromosomes can be used without departing from the gist of the invention. Preferably, the libraries contain at least five genome equivalents, more preferably at least 7, most preferably at least 8. Particularly preferred is at least 10. The higher the number of genome equivalents in the library, the more reliable the resulting contigs and physical map will be.
The sample genome DNA is thus selected from the group consisting of a
heterozygous sample genome, a combination of two or more homozygous sample genomes; and a combination of at least one heterozygous and at least one homozygous sample genome.
The same WGP approach can be applied to a set of BAC clones from a heterozygous F1 individual. Thus, the BACs would differentiate the two sister chromosomes and
polymorphisms will occur between them. It is expected that the majority of the WGP tags obtained from such "allelic" BAC clones will be identical (i.e. not containing a sequence polymorphisms) but a certain percentage, 5-10%, depending on the level of polymorphism and the length of the sequence read, will differ.
It is also possible to use two (or more), different, homozygous sample genomes. Such an approach may be efficient when for instance a BAC library of one homozygous sample is already available. Also the combination of at least one heterozygous sample genome and at least homozygous sample genome may yield a similar effect. The use of the latter two options may even result in more polymorphisms as more than two different sister chromosomes are present in the sample genomes.
To increase the likelihood that a WGP tag will contain a sequence polymorphism, it is preferred that longer sequence reads, for instance 76 nt or longer used. Obviously, the chance that a WGP tag contains a sequence polymorphism increases with increasing sequence read length. The available sequence read is typically platform dependent.
In step (b), the individual clones in the library are pooled to form pools containing a multitude of artificial chromosomes or clones. The pooling may be the simple combination of a number of individual clones into one sample (for example, 100 clones into 10 pools, each containing 10 clones), but also more elaborate pooling strategies may be used. The distribution of the clones over the pools is preferably such that each clone is present in at least two or more of the pools. Preferably, the pools contain from 10 to 10000 clones per pool, preferably from 100 to 1000, more preferably from 250 to 750. It is observed that the number of clones per pool can vary widely, and this variation is related to, for instance, the size of the genome under investigation. Typically, the maximum size of a pool or a sub-pool is governed by the ability to uniquely identify a clone in a pooling set by a set of identifiers. A typical range for a genome equivalent in a pool set is in the order of 0.2 - 0.3, and this may again vary per genome. The pools are generated based on pooling strategies well known in the art. The skilled man is capable selecting the optimal pooling strategy based on factors such as genome size etc. The resulting pooling strategy will depend on the circumstances, and examples thereof are plate pooling, N-dimensional pooling such as 2D-pooling, 3D- pooling, 6D-pooling or complex pooling. To facilitate handling of large numbers of pools, the pools may, on their turn, be combined in super-pools (i.e. super-pools are pools of pools of clones) or divided into sub-pools. Other examples of pooling strategies and their
deconvolution (i.e. the correct identification of the individual clone in a library by detection of the presence of a known associated indicator (i.e. label or identifier) of the clone in one or more pools or subpools) are for instance described in US6975943 or in Klein et al. in Genome Research, (2000), 10, 798-807. The pooling strategy is preferably such that every clone in the library is distributed in such over the pools that a unique combination of pools is made for every clone. The result thereof is that a certain combination of (sub)pools uniquely identifies a clone.
In step (c) of the method, the pools are digested with restriction endonucleases to yield restriction fragments. Each pool is preferably separately subjected to an endonuclease digest. Each pool is treated with the same (combination of) endonuclease(s). In principle any restriction endonuclease can be used. Restriction endonucleases may be frequent cutters (4 or 5 cutters, such as Msel or Pstl) or rare cutters (6 and more cutters such as EcoRI, Hindlll). Typically, restriction endonucleases are selected such that restriction fragments are obtained that are, on average, present in an amount or have a certain length distribution that is adequate for the subsequent steps. In certain embodiments, two or more restriction endonucleases can be used and in certain embodiments, combinations of rare and frequent cutters can be used. For large genomes the use of, for instance, three or more restriction endonucleases can be used advantageously. To one or both ends of the restriction fragments, adapters are ligated in step (d) to provide for adapter-ligated restriction fragments. Typically, adapters are synthetic oligonucleotides as defined herein elsewhere. The adapters used in the present invention preferably contain an identifier section, in essence as defined herein elsewhere to provide for 'tagged adapters' . In certain embodiments, the adapter contains a pool-specific identifier, i.e. for each pool, an adapter containing a unique identifier is used that unequivocally indicates the pool. In certain embodiments, the adapter contains a degenerate identifier section which is used in combination with a primer containing a pool-specific identifier.
In certain embodiments, the adapter-ligated restriction fragments can be combined in larger groups, in particular when the adapters contain a pool-specific identifier. This combination in larger groups may aid in reducing the number of parallel amplifications of each set of adapter-ligated restriction fragments obtained from a pool. Alternatively, the adapters that are ligated do not contain an identifier or a degenerate identifier section. The adapter-ligated fragments are subsequently amplified using primers that contain identifiers (tags), for instance at their 5'end. The result is that amplified, tagged adapter-ligated fragments are obtained.
In this embodiment, the adapters can be the same for a plurality (or all) of pools and the amplification using tagged primers creates the distinction between the pools that can later be used in the deconvolution.
Either way, a set of tagged adapter-ligated fragments is obtained that are linked to the pool from which they originate by the presence of the tag.
The tagged adapter-ligated fragment can be amplified. The amplification may serve to reduce the complexity or to increase the amount the DNA available for analysis. The amplification can be performed using a set of primers that are at least partly complementary to the adapters and or the tags/identifiers. This amplification may be independently from the amplification described herein above that introduces the tags into the adapters. In certain embodiments, the amplification may serve several purposes at a time, i.e. reduce complexity, increase DNA amount and introduce tags in the adapter-ligated fragments in the pools.
In certain embodiments, the adapter-ligated fragments can be combined in larger groups, in particular when the adapters contain a pool-specific identifier. This combination in larger groups may aid in reducing the number of parallel amplifications of each set of adapter- ligated restriction obtained from a pool.
The adapter-ligated fragments can be amplified using a set of primers of which at least one primer amplifies the pool-specific identifier at the position of the pool-specific or degenerate identifier in the adapter. The primer may contain (part of) the identifier, but the primer may also be complementary to a section of the adapter that is located outside the tag, i.e. downstream in the adapter. Amplification then also amplifies the tag. See in this respect Fig 6 for various embodiments.
In step (e) part of the sequence of the tagged adapter-ligated fragment is determined. The tagged adapter-ligated fragments are subjected to sequencing, preferably high throughput sequencing as described herein elsewhere. During sequencing, at least part of the nucleotide sequence of the (amplified) tagged adapter-ligated fragment is determined.
Preferably at least the sequence of the pool-specific identifier and part of the fragment (i.e. derived from the sample genome) of the (amplified) tagged adapter-ligated fragment is determined. Preferably, a sequence of at least 10 nucleotides of the fragment is determined. In certain embodiments, at least 15, 20, 25, 30 or 35 nucleotides of the fragment (i.e. derived from the sample genome) are determined. The number of nucleotides that are to be determined minimally will be, again, genome- as well as sequencing platform dependent. For instance, in plants more repetitive sequences are present, hence longer sequences (50-150 nucleotides) may to be determined for a contig of comparable quality. For instance, in silico calculations on the known genome sequence of Arabidopsis have shown that, when including a 6 bp restriction site in the sequencing step, about 20 bp per fragment needs to be determined in order to ensure that the majority of sequences are unique in the genome. It is possible to determine the sequence of the entire fragment, but this is not an absolute necessity for contig building of a BAC clone. In general I the method of the present invention, there is a preference for longer reads, from 50 nt upward, more preferably 75 nt and more, even more preferably more than 100 nt, to increase the probability of detecting SNPs.
In the sequencing step, to provide for maximum coverage of all fragments and increased accuracy, the sequence library may be sequenced with an average redundancy level (aka oversampling rate) of at least 5. This means that, on average, the sequence is determined of at least 5 amplicons obtained from the amplification of one specific adapter- ligated fragment. In other words: each fragment is (statistically) sequenced on average at least five times. Increased redundancy is preferred as its improves the fraction of fragments that are sampled in each pool and the accuracy of these sequences, so preferably the redundancy level is at least 7, more preferably a least 10. Increased average sequencing redundancy levels are used to compensate for a phenomenon that is known as 'sampling variation', i.e. random statistical fluctuation in sampling subsets from a large "population". In addition, a higher average sequencing redundancy level alleviates possible differences in the abundance of amplified fragments which result from differences in their amplification rates caused by length variation between fragments and differences in sequence composition.
It is preferred that the sequencing is performed using high-throughput sequencing methods, such as the methods disclosed in WO 03/004690, WO 03/054142, WO
2004/069849, WO 2004/070005, WO 2004/070007, and WO 2005/003375, by Seo et al. (2004) Proc. Natl. Acad. Sci. USA 101 :5488-93, and technologies of Helicos, lllumina), US Genomics, etcetera, which are herein incorporated by reference.
In the following step (f), the (partly) sequenced (amplified) tagged adapter-ligated fragments are correlated to the corresponding clone, typically in silico by means of computerized methods. The (amplified) tagged adapter-ligated fragments are selected that contain identical sections of nucleotides in the restriction fragment-derived part. Subsequently the different pool-specific identifiers (tags) are identified that are present in those (amplified) tagged adapter-ligated fragments. The combination of the different pool-specific identifiers and hence the sequence of the restriction fragment can be uniquely assigned to a specific clone (a process indicated as 'deconvolution'). For example, in the case of a 3D pooling strategy (Χ,Υ,Ζ), each pool in the library is uniquely addressed by a combination of 3 pool- specific identifiers with the same restriction fragment-derived section. In other words: a restriction fragment-derived section originating from a clone will be tagged with 3 different identifiers. Unique restriction fragment-derived sections, when observed in combination with the 3 identifiers can be assigned to a single BAC clone. This can be repeated for each (amplified) tagged adapter-ligated fragment that contains other unique sections of nucleotides in the restriction fragment-derived part.
After assigning the fragments to the corresponding clones in step (f), the clones are combined and ordered into clone contigs in step (g) of the method. The grouping and ordering can be performed by fingerprint contiging software for this purpose such as FPC software (Soderlund et al (1997) FPC: a system for building contigs from restriction fingerprinted clones. Comput. Appl. Biosci., 13:523-535.) essentially as described herein elsewhere. The alignment of the clones into contigs and the corresponding order of WGP tags generates a physical map of the sample genome.
In certain embodiments, for instance where a physical map is already available for a homozygous sample genome, the above steps can be performed independently for each genome sample. More in particular, in certain embodiments, for each heterozygous and/or homozygous sample genome steps (a) - (g) are performed independently. The steps relating to the screening for polymorphisms in the subsequent steps are taken from the separate steps (e).
In step (h) of the method, the sequenced fragments (the WGP tags) are aligned based on the determined sequences in step (e) of the method. Typically, sequenced fragments comprise a sample identification tag, identifying the pool from which the BAC was derived (and hence the fragment). The sequence fragments further comprise the remains of the restriction site and further a sequence that is derived form the sample genome (sometimes indicated as the 'sequence tag'). By aligning the sequence tags, (see for example fig 4, Pool C19 and Pool R3 and fig 5, pool C19, C13, RA and RH) themselves, polymorphisms can be identified between them, in this case allele 'G' for C19 and RA vs. allele 'C for C13 and RH.
In step (i) the polymorphisms are determined between the aligned sequences of the WGP tags of step (h).
For most situations, a polymorphism in the WGP tag region will result in two variants, with a SNP (or Indel) somewhere in the 65 nt, non-restriction site part of a (75 nt sequence read length)) WGP tag (which is now termed 'SNP Tag', i.e. a SNP Tag is a WGP tag that contains a polymorphism or an indel between its two variants, See fig 3). Given equal portions of BACs and pool samples, the two variants ('alleles') of each SNP Tag are expected to be found in a 50/50 distribution. In those cases where a polymorphism is present in the restriction site region, it will cause the WGP tag to be present in only 50% of the BACs from the corresponding region in the genome. These tags are less useful to discriminate between alleles as it is not sure whether the tags are missing because the region was not covered by the BAC, or because of a polymorphism. The low number of SNP tags will still allow the formation of contigs from overlapping BACs, as obtained in a standard WGP approach, albeit with less stringent alignment (FPC) settings. In an alternative embodiment, a binning step is executed first to combine allelic variants (and possibly sequence errors) into a single WPG tag followed by a FPC step as for a homozygous line (i.e. with more stringent settings). In parallel or subsequent to the contig building step, an additional analysis will be done to identify "SNP tags" and their position on the contigs or physical map.
In a further aspect of the invention, the discovered polymorphism and SNPs are converted into conventional SNP assays using conventional technology. Such SNP tags can due to the existence of sequence information be converted to PCR-assays, invader assays, Golden Gate assays and the like. The developed assay can be validated on the sample genomes(s) used to generate the SNPs.
The discovered SNP tags (and underlying SNPs) are both physical markers (as their position on the physical map is known) and genetic markers, as they differ between sample genomes. This allows to couple genetic and physical maps (See Fig 7).
The thus discovered SNPs can be followed in subsequent crosses, for instance in the offspring of crosses between parents of which at least one of the parents was used to create the physical map. In this manner, the SNPs can serve as genetic markers that are linked to the physical map. At the same time, the BACs used to generate the physical map can be anchored to the genetic map, resulting in a high resolution map based on a scaffold of genetic markers (SNP tags) and supplemented with WGP tags. It is further possible to also link genetic markers obtained via other ways to the map to further complete the high resolution map.
Thus, a further aspect of the invention relates to a method for the generation of a linked genetic map of genetic markers (SNP tags) and a physical map (of WGP tags) comprising
- providing two parents, wherein at least one of the parents has been used as a sample genome in the generation of the physical map that provided the SNP tags of step (i);
- crossing the parents
- screening the offspring for the presence (or absence) of the SNP tags;
- providing a genetic map from the SNP tags
- coupling the genetic map tags to the physical map.
The genetic linkage map is constructed using the markers discovered using the method of the invention. The map can be constructed by observing how frequently pairs of SNP Tag markers are inherited together after selfing an F1 population and analyzing the F2 individuals (about 100). The thus obtained genetic map shows the relative locations of these SNPs along the chromosome, thus providing the link with the position of the SNP tags on the BAC contigs of the physical map. Typically, it would suffice to use a single SNP assay of 384 validated SNPs (Veracode), derived from about 500 candidates. Example
This example is to provide an illustration for the polymorphic Whole Genome Profiling (pWGP) concept described herein. The invention is aimed to perform physical mapping and SNP discovery at the same time, by executing a WGP project on a BAC library derived from a heterozygote individual or a combination of two polymorphic individuals. In this example data have been used from both Arabidopsis thaliana Columbia and Landsberg erecta ecotypes.
Data
Existing WGP Columbia (Col) data were used and combined with data from a new WGP project on Landsberg erecta (Ler). Where the Col data consisted of 16 plates (6,144 BACs), the Ler set consisted of 24 plates (9,216 BACs - see Table 1 ). The Ler BACs were pooled in a 3D manner using an H I Quad pooling setup with 22 pools (termed 1 SuperPool = SP) per plate. The BAC pools were processed with the regular WGP protocol of EcoRI/Msel restriction ligation, amplification and subsequent sequencing on the lllumina GAM. The 24 SPs were sequenced using only two lanes. Deconvolution and subsequent analysis also conformed to the WGP protocol.
Results
WGP map
Results in terms of numbers of reads and tags are given in Table 1 .
Table 1 . Deconvolution results WGP on Arabidopsis Col and Ler
Arabidopsis
Colombia Landsberg Combined
genome size (Mbp) 130 130 130
WGP tag length (incl. RE site) 26 26 26
nr BACs tested 6, 144 9,216 15,360
GE BACs tested (125 kb inserts) 5.9 8.9 14.8
nr OK reads generated (M) 28.2 62.9 91
nr deconvolutable reads (M) 12.1 28.9 45
% deconvolutable reads 43% 46% 50%
all deconvolutable tags 183,366 344,955 528,321
nr unique tags 65,734 62,423 84,710 nr tagged BACs 4,599 7550 12, 149
average nr tags/BAC 40 46 43
average nr reads/tag 66 84 86
The combined set of Col and Ler BACs was subjected to mapping using FPC. As a large fraction of the clones clustered to one single contig with low stringent setting, the cutoff had to be decreased to 1.0x10"20. After an additional DQ step (step size = 5) 378 contigs were obtained, which encompassed 10,299 BACs (85% of all deconvoluted BACs). The combined contigs covered an estimated 132 Mbp (Table 2).
Table 2. FPC results for both the original Col WGP map and the combined Col+Ler WGP data.
FPC Col Col+Ler
Cutoff 1.0E-06 1.0E-20
nr contigs 273 378
nr singletons 551 1850
nr BACs in contig 4,048 10,299
% BACs in contig (nr BACs in contig/nr BACs
tested) 66% 67%
% BACs in contig (nr BACs in contig/nr tagged
BACs) 88% 85%
average BACs/contig 14.8 27.2
N50 BACs/contig 25 49
average contig size (Mbp) 0.369 0.347
N50 contig size (Mbp) 0.548 0.608
genome coverage (Mbp) 100.8 131 .3
% genome coverage 78% 101 %
SNP discovery
In the next step, WGP tags were identified which were specific to either the Col or the Ler data. When a single Col tag matches a single Ler tag with only one nucleotide difference, than such a tag is a putative SNP marker. Two of such candidates and corresponding BAC clones are identified. They are presented in Table 3.
Table 3. Two examples of SNPs between Col and Ler (bold underlined characters indicate the SNP)
Ctg293 BAC IDs
AT.E102.G22, AT.E1 10.N08,
SNP1 GAATTCAGCTGTGAACATAACCGAAA Col
AT.E1 15.P18, AT.E1 12.D16, GAATTCAGCTGTGAACATAACCGGAA Ler AT.E022.F04, AT.E013.A12,
AT.E009.H24, AT.E010.E18, AT.E001 .L24, AT.E005.A07, AT.E015.E24, AT.E018.C01 , AT.E004.D16, AT.E003.L16,
Ctg307
AT.E1 10.120, AT.E102.E22,
SNP2 GAATTCAGGAGAAATAGAGTGGATAT Col
- AT.E1 13.D18, AT.E101 .J24,
AT.E021 .P20, AT.E003.M03,
GAATTCAGGAGAAATAGAGTGAATAT Ler
AT.E015.B21 , AT.E004.K13,
The BAC clones corresponding to a single SNP tag all map to a single region on the contig as is visualized in Figures 8 and 9 for contig 293 and contig 307 resp. From the data it is concluded that a physical map can be built from the combined
WGP data of two different genotypes and that SNPs can be detected in this combined data set between genotype specific WGP tags, mapping to the same location on the physical map

Claims

C L A I M S
1 .Method for the generation of a physical map of a sample genome and identification of polymorphisms, comprising the steps of:
(a) providing an artificial chromosome (e.g. BAC, YAC) clone bank wherein each artificial chromosome clone contains DNA from (part of) a sample genome, wherein the sample genome is selected from the group consisting of a heterozygous sample genome, a combination of two or more homozygous sample genomes or a combination of at least one heterozygous and at least one homozygous sample genome;
(b) pooling the clones from the artificial chromosome library into pools;
(c) providing a set of fragments for each pool using restriction enzymes;
(d) ligating adapters to the fragments;
(e) determining the sequence of at least part of the adapter and part of the fragment;
(f) assigning the fragments to the corresponding clones;
(g) ordering the clones into clone-contigs thereby generating a physical map of the sample genome;
(h) aligning the sequenced fragments based on the determined sequences in step (e);
(i) determining polymorphisms between the aligned sequences of step (h).
2. Method according to claim 1 , wherein for each heterozygous and/or homozygous sample genome steps (a) - (g) are performed independently and whereby in step (h) the sequences determined in step (e) are taken from the separate steps (e).
3. Method according to claim 1 , wherein the adapter contains an identifier or a degenerate identifier section.
4. Method according to claims 1 -3, wherein the adapter-ligated fragments are amplified using
- a primer that amplifies at least the identifier and part of the fragment; or
- a primer that contains a section that is complementary to te degenerate section in the adapter and introduces an identifier in the amplified fragment; or - a primer that is complementary to at least part of the adapter and provides an identifier in the amplified adapter-ligated fragment.
5. Method according to claim 1 , wherein the sequence of at least consecutive 50 nucleotides are determined from the tagged adapter and part of the fragment.
6. Method according to claim 1 , wherein the sequence of at least consecutive 75 nucleotides are determined from the tagged adapter and part of the fragment.
7. Method according to claim 1 , wherein the sequence of at least consecutive 100 nucleotides are determined from the tagged adapter and part of the fragment.
8. Method according to claim 1 , wherein the determined polymorphisms are converted to marker assays.
9. Method for the generation of a linked genetic map and a physical map comprising
- providing two parental lines, or a heterozygous line wherein at least one of the lines has been used as a sample genome in the generation of the physical map that provided the polymorphisms of step (i) of claim 1 ;
- crossing the parents or selfing the heterozygote;
- screening the offspring for the presence (or absence) of the polymorphisms and/or markers associated with the polymorphisms;
- providing a genetic map from the polymorphisms;
- coupling the genetic map tags to the physical map.
PCT/NL2010/050836 2009-12-10 2010-12-09 Polymorfphic whole genome profiling WO2011071382A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US28533109P 2009-12-10 2009-12-10
US61/285,331 2009-12-10
NL2003932 2009-12-10
NL2003932 2009-12-10

Publications (1)

Publication Number Publication Date
WO2011071382A1 true WO2011071382A1 (en) 2011-06-16

Family

ID=42272720

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/NL2010/050836 WO2011071382A1 (en) 2009-12-10 2010-12-09 Polymorfphic whole genome profiling

Country Status (1)

Country Link
WO (1) WO2011071382A1 (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0534858A1 (en) 1991-09-24 1993-03-31 Keygene N.V. Selective restriction fragment amplification : a general method for DNA fingerprinting
WO2000024939A1 (en) 1998-10-27 2000-05-04 Affymetrix, Inc. Complexity management and analysis of genomic dna
WO2003004690A2 (en) 2001-07-06 2003-01-16 454$m(3) CORPORATION Method for isolation of independent, parallel chemical micro-reactions using a porous filter
WO2003012118A1 (en) 2001-07-31 2003-02-13 Affymetrix, Inc. Complexity management of genomic dna
WO2003027311A2 (en) * 2001-09-24 2003-04-03 Seqwright, Inc A clone-array pooled shotgun strategy for nucleic acid sequencing
WO2003054142A2 (en) 2001-10-30 2003-07-03 454 Corporation Novel sulfurylase-luciferase fusion proteins and thermostable sulfurylase
WO2004063323A2 (en) * 2003-01-10 2004-07-29 Keygene N.V. Aflp-based method for integrating physical and genetic maps
WO2004069849A2 (en) 2003-01-29 2004-08-19 454 Corporation Bead emulsion nucleic acid amplification
WO2006137734A1 (en) * 2005-06-23 2006-12-28 Keygene N.V. Improved strategies for sequencing complex genomes using high throughput sequencing technologies
WO2008007951A1 (en) 2006-07-12 2008-01-17 Keygene N.V. High throughput physical mapping using aflp

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0969102A2 (en) * 1991-09-24 2000-01-05 Keygene N.V. Selective restriction fragment amplification: a general method for DNA fingerprinting
EP0534858A1 (en) 1991-09-24 1993-03-31 Keygene N.V. Selective restriction fragment amplification : a general method for DNA fingerprinting
WO2000024939A1 (en) 1998-10-27 2000-05-04 Affymetrix, Inc. Complexity management and analysis of genomic dna
WO2003004690A2 (en) 2001-07-06 2003-01-16 454$m(3) CORPORATION Method for isolation of independent, parallel chemical micro-reactions using a porous filter
WO2003012118A1 (en) 2001-07-31 2003-02-13 Affymetrix, Inc. Complexity management of genomic dna
US6975943B2 (en) 2001-09-24 2005-12-13 Seqwright, Inc. Clone-array pooled shotgun strategy for nucleic acid sequencing
WO2003027311A2 (en) * 2001-09-24 2003-04-03 Seqwright, Inc A clone-array pooled shotgun strategy for nucleic acid sequencing
WO2003054142A2 (en) 2001-10-30 2003-07-03 454 Corporation Novel sulfurylase-luciferase fusion proteins and thermostable sulfurylase
WO2004063323A2 (en) * 2003-01-10 2004-07-29 Keygene N.V. Aflp-based method for integrating physical and genetic maps
WO2004069849A2 (en) 2003-01-29 2004-08-19 454 Corporation Bead emulsion nucleic acid amplification
WO2004070005A2 (en) 2003-01-29 2004-08-19 454 Corporation Double ended sequencing
WO2004070007A2 (en) 2003-01-29 2004-08-19 454 Corporation Method for preparing single-stranded dna libraries
WO2005003375A2 (en) 2003-01-29 2005-01-13 454 Corporation Methods of amplifying and sequencing nucleic acids
WO2006137734A1 (en) * 2005-06-23 2006-12-28 Keygene N.V. Improved strategies for sequencing complex genomes using high throughput sequencing technologies
WO2008007951A1 (en) 2006-07-12 2008-01-17 Keygene N.V. High throughput physical mapping using aflp

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
BRENNER ET AL., PROC. NATL. ACAD. SCI., vol. 86, 1989, pages 8902 - 8906
CAI W-W ET AL: "A clone-array pooled shotgun strategy for sequencing large genomics", GENOME RESEARCH, COLD SPRING HARBOR LABORATORY PRESS, WOODBURY, NY, US LNKD- DOI:10.1101/GR.198101, vol. 11, 1 January 2001 (2001-01-01), pages 1619 - 1623, XP002967818, ISSN: 1088-9051 *
GREGORY ET AL., GENOME RES., vol. 7, 1997, pages 1162 - 1168
KLEIN ET AL., GENOME RESEARCH, vol. 10, 2000, pages 798 - 807
KLEIN P E ET AL: "A high-throughput AFLP-based method for constructing integrated genetic and physical maps: Progress toward a sorghum genome map", GENOME RESEARCH, COLD SPRING HARBOR LABORATORY PRESS, WOODBURY, NY, US LNKD- DOI:10.1101/GR.10.6.789, vol. 10, no. 6, 1 June 2000 (2000-06-01), pages 789 - 807, XP002240094, ISSN: 1088-9051 *
LANNONE ET AL., CYTOMETRY, vol. 39, 2000, pages 131 - 140
MARRA ET AL., GENOME RES., vol. 7, 1997, pages 1072 - 1084
SEO ET AL., PROC. NATL. ACAD. SCI. USA, vol. 101, 2004, pages 5488 - 93
SODERLUND ET AL.: "FPC: a system for building contigs from restriction fingerprinted clones", COMPUT. APPL. BIOSCI., vol. 13, 1997, pages 523 - 535

Similar Documents

Publication Publication Date Title
US10538806B2 (en) High throughput screening of populations carrying naturally occurring mutations
JP5389638B2 (en) High-throughput detection of molecular markers based on restriction fragments
EP2663655B1 (en) Paired end random sequence based genotyping
EP2427569B1 (en) The use of class iib restriction endonucleases in 2nd generation sequencing applications
EP2379751B1 (en) Novel genome sequencing strategies
US8975028B2 (en) Method for the identification of the clonal source of a restriction fragment
EP2513333A1 (en) Restriction enzyme based whole genome sequencing
US20200102612A1 (en) Method for identifying the source of an amplicon
WO2011071382A1 (en) Polymorfphic whole genome profiling
US20150329906A1 (en) Novel genome sequencing strategies

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10796186

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10796186

Country of ref document: EP

Kind code of ref document: A1