WO2004022758A1 - Genome partitioning - Google Patents

Genome partitioning Download PDF

Info

Publication number
WO2004022758A1
WO2004022758A1 PCT/GB2003/003866 GB0303866W WO2004022758A1 WO 2004022758 A1 WO2004022758 A1 WO 2004022758A1 GB 0303866 W GB0303866 W GB 0303866W WO 2004022758 A1 WO2004022758 A1 WO 2004022758A1
Authority
WO
WIPO (PCT)
Prior art keywords
library
nucleic acid
fragments
sample
size
Prior art date
Application number
PCT/GB2003/003866
Other languages
French (fr)
Inventor
Jiahui Zhu
Original Assignee
Plant Bioscience Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB0220649A external-priority patent/GB0220649D0/en
Priority claimed from GB0220773A external-priority patent/GB0220773D0/en
Application filed by Plant Bioscience Limited filed Critical Plant Bioscience Limited
Priority to DE60312875T priority Critical patent/DE60312875T2/en
Priority to AU2003260790A priority patent/AU2003260790A1/en
Priority to US10/526,571 priority patent/US20060281082A1/en
Priority to EP03793900A priority patent/EP1546345B1/en
Priority to CA002496517A priority patent/CA2496517A1/en
Publication of WO2004022758A1 publication Critical patent/WO2004022758A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/66General methods for inserting a gene into a vector to form a recombinant vector using cleavage and ligation; Use of non-functional linkers or adaptors, e.g. linkers containing the sequence for a restriction endonuclease
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1089Design, preparation, screening or analysis of libraries using computer algorithms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups

Definitions

  • This invention relates generally to nucleic library construction, for example for sequence variation discovery and screening. Particularly, it relates to methods and materials for reproducibly cloning a subset of a sample nucleic acid having reduced complexity.
  • Genetic markers are of increasing importance in the geno ics and proteomics fields in understanding phenotype, susceptibility to disease, and response to treatments.
  • Single nucleotide polymorphisms are one of the most abundant and useful markers, and are the subject of investigation in numerous different organisms, including within the human genome.
  • Methods which have been used in the art have included shotgun sequencing the whole genome or sequencing PCR products (see e.g. Roth (2001) Nature Biotechnology 19: 209-211) .
  • shotgun sequencing of the whole human genome provided a few millions of SNPs from five different individuals as a by-product 1 to the main initiative.
  • a more routine method is to design a pair of specific primers for each DNA fragment of interest. After PCR amplification, the fragment can be purified and sequenced. Although these are widely used methods, their efficiency and throughput are very limited. Moreover, both of them are very costly.
  • AFLP is one method of achieving this. It had been widely used to study DNA polymorphisms and AFLP markers have been mapped in many species 2 . However, AFLP has not been used for SNP screening because of its technical limits, such as artificial sequence alteration, high proportion of random fragment loss and complexity of the procedure.
  • RRS reduced representation shotgun
  • EP 1001037 (Whitehead Biomedical Inst., US) describes such an RRS strategy.
  • a nucleic acid-containing sample to be assessed is treated to fractionate it into fragments selected in a sequence- dependent manner, a subset of which is selected on the basis of size.
  • the present inventors have developed methods to reduce the complexity of a sample of nucleic acid (e.g. genomic or cDNA library) in large, flexible and controllable scales by dividing the genome or a collection of cDNA into smaller subsets.
  • the method uses multiple restriction enzymes to cut the DNA into a collection of restriction fragments. Based on the unique restriction ends of the fragments, they are then divided into different groups or "layers". A layer, or a combination of layers, is then cloned at a specific restriction site such that the resulting library only contains the desired subset or partition of the total sample. This permits the reduction of e.g. a genomic library's complexity more than a thousand-fold.
  • each sample or pooled samples
  • a highly consistent sub-set of corresponding fragments is generated in each case.
  • the method has particular utility for sequence variation discovery or screening through direct sequencing. Additionally it can be utilised within automated systems to provide high-throughput screening.
  • a method for producing a nucleic acid library which library contains a plurality of different nucleic acid fragments, the combination of said fragments being a representative partition of the entirety of a sample nucleic acid, the method comprising:
  • the method provides a reproducible method of reducing the complexity of the sample.
  • a partition with at least 10, 100, or 1000-fold reduced complexity compared to the sample nucleic acid can be generated.
  • the method is performed (including, optionally, purification to remove short sequences e.g. less than 100 bps) such that the sub-set of layers ligated into said vectors provides a library with fragments with a size range of 100-2000 bps.
  • the number of restriction enzymes, the type of restriction enzymes, and the sub-set of layers ligated into said vectors are selected in accordance with the equations set out hereinafter.
  • Nucleic acid for use in the present invention may include cDNA, RNA and genomic DNA. It may be provided in amplified form. RNA may be provided as cDNA.
  • the total size of the cDNA pool will be smaller than a genome. Therefore, fewer enzymes will be used and pilot tests (see below) can be used to optimise the design.
  • the sample may represent all or part of a particular source of origin e.g. may have been enriched.
  • Nucleic acids for use in the present invention may be provided isolated and/or purified from their natural environment, in substantially pure or homogeneous form, or free or substantially free of other nucleic acids of the species of origin. Where used herein, the term “isolated” encompasses all of these possibilities.
  • restriction enzymes between 3 and 6 restriction enzymes will be used e.g. equal to, or at least, 3, 4, 5 or 6.
  • the restriction enzymes are selected from four-, six- or eight- base-cutters.
  • one or two six-base-cutters are used as cloning-end-generators to create the cloning ends for the layer (s) which are selected for cloning.
  • the other restriction enzymes are four-base-cutters (which cut relatively more frequently) and which are used, in effect, as fragment-cutters to destroy some or most of the fragments which could otherwise be cloned into the chosen vector. These enzymes , therefore serve to reduce the size of the selected layer (s).
  • a combination of four- and six-base cutters as fragment cutters may be useful to ⁇ hone f the size of the partition.
  • Preferred restriction enzymes are selected from any of those given in Table 1.
  • Eight-base cutters include Sfil and Notl. More preferably the enzymes Hpall, Alul, Dral, and Pstl are used (Pstl being used to generate cloning ends) .
  • enzymes may be selected as appropriate to the specific application in hand - for instances -when all or part of a reference sequence for a sample is known, the enzymes will be selected such as to have a target frequency appropriate to the size of the partition which it is wished to generate. Likewise if it is desired to investigate a particular region of the sample, the enzymes will be selected such as to achieve this.
  • the plurality of enzymes are used simultaneously, and are selected such as to be active under comparable conditions to permit this.
  • Optimum conditions for commercially available restriction enzyme are available from the manufacturers.
  • Restriction by one enzyme may be partial. In 'such cases it is preferred that the group of fragments in the selected layer have restriction ends created by said partial digestion.
  • the selected sub-set of layers consists of one layer or two layers
  • the present invention may incorporate the performance of a ⁇ pilot test' to confirm the validity of the partition design, and optionally to refine it.
  • a pilot test may be used to measure the size or complexity (number of unique sequences) of a particular partition design. It will also provide information about original genome size and restriction site frequencies. The principle is as follows: when sequencing a library (e.g. a partition) having a given number of colonies, there will be a chance for a particular sequence to be sequenced more than once. This is called sequence redundancy of shotgun sequencing strategy. The more colonies sequenced the more redundancy. The smaller (or less complex) the library, the more redundancy. Thus assessment of sequence redundancy provides information about the size of the partition.
  • n is the total number of good sequences obtained by sequencing ni is the number of sequence in the ith contig.
  • s is the standard error, which represents the statistical error when the sample size is not big enough.
  • 500 colonies may be selected from a partition and sequenced. This should give more than 400 good quality sequences.
  • the complexity of the partition, F can be calculated. Additionally, the deviation constant for restriction enzymes in the genome can be extrapolated from the sequence results permitting a honing of the partition design.
  • the method may include performing the method of the invention as described above using parameters which are likely to produce an acceptable result for a wide spread of genome sizes from different species, for example by performing a digestion of 5 ⁇ g genomic DNA using a 6nt cutter (e.g. Pstl) as the cloning site enzyme and three 4nt cutters (e.g. Hpall, Alul and Dral).
  • the 'partition may be cloned into pZErO at Pstl site with presence of suitable enhancing linkers (linkers for Hpall, Alul and Dral) .
  • N x ⁇ . x2 is the number of fragments with length between xl and x2 (which is F above) .
  • k is fragment length xl and x2 are upper and lower limits of the size range of the fragments in the library (these may be assumed as lOObp and 2000bp, as described above, or can be verified by the sequence obtained)
  • Pi is the probability of having a restriction site at any given base for the ⁇ i'th enzyme
  • Pi is measured or estimated in silico based on a large number sample of sequences e.g. from a database.
  • a corresponding approach may be used with cDNA from an unknown tissue from an unknown species.
  • the lower complexity suggests that Pstl as the cloning site restriction enzyme, and Hpall as the fragment cutter, may be an appropriate starting point.
  • the enzymes to produce a desired partition size are thus selected on the basis of the formula:
  • k is fragment length (and xl and x2 are upper and lower limits)
  • G is the size of the genome n is the number of extra 4 nt cutters m is the number of extra 6 nt cutters
  • a corresponding approach may be used with cD ⁇ A from tissues or species in which the complexity is known or can be estimated, either directly or by comparison with other species.
  • sample nucleic acid sequence (inasmuch as it derives from a different source from the reference) is likely to include sequence variation with respect to any reference and indeed this variation between corresponding sequences underlies certain embodiments of t ⁇ e present invention. Nevertheless, since such variations are by definition rare, the reference sequence can be used to calculate restriction site frequency for restriction enzymes which it may be desired to use in the methods described herein.
  • the restriction site frequency of each enzyme can be provided, and the formula:
  • a set of restriction enzyme can be based on the restriction map of the desired genes and other sequences so as to select them in particular, while still having an appropriately sized partition.
  • the fragments are purified at step (ii) .
  • fragments may be purified in a conventional manner.
  • the restriction reaction was passed through a column containing resins (QIAQuick PCR purification kit, QiaGen) , which can effectively adsorb D ⁇ A molecules larger than lOObp. After washing with 70% ethanol, the D ⁇ A fragments were eluted into 30 ⁇ 50 ⁇ l water.
  • An alternative second method used the BioRad Clean-A-Gene kit.
  • the third method was to purify the fragments by running 1% agarose gel and recovering the D ⁇ A by using Promega gel recovery kit.
  • extra D ⁇ A should be used, for example, 10 microgram for rice and pearl millet, 20 microgram for human and wheat.
  • Preferred purification techniques will be such as to remove fragments of less than 100 bases. Enrichment of sample
  • an enrichment strategy may be adopted, so that a particular region or gene may be treated.
  • restriction enzymes may be chosen through a restriction map of the reference sequence(s).
  • a set of oligos (16 ⁇ 60 bases preferably 20 ⁇ 50 bases) could be designed to enrich the genes e.g. via a hybridization method using magnetic beads with biotin-labelled oligonucleotides attached on them (see e.g.
  • the sample is enriched, it may be preferred to use pilot tests to confirm the size of the total DNA pool.
  • enhancement linkers are added prior or during step (iv) such that only the desired sub-set of layers being included in said library.
  • the linkers prevent fragments with compatible restriction ends combining to form artifacts .
  • Such linkers (which may be provided as a pair of oligonucleotides) comprise : (i) a core sequence, which is selected such that it does not contain a restriction site and does not have , a high probability of hybridizing to target sequence,
  • the enhancement linkers are not used for the , cloning site restriction enzyme (s).
  • Preferred linkers are any of those given in Table 1.
  • a typical protocol can be achieved by exposing a vector restricted with the appropriate enzymes to the selected layers such as to ligate or otherwise incorporate the heterologous nucleic acid fragments into the 'vector at the appropriate cloning site; exposing the ligation product (recombinant vector) to host cells under conditions whereby the vector is taken up by the cells such as to generate a population of host cells containing the vector; exposing the population of cells to a propagation medium comprising a selection agent whereby transformed host cells which contain vector incorporating the nucleic acid insert are selectively grown or propagated in the medium.
  • one or more pairs of "adaptor" oligonucleotides may be used to bridge the cloning ends of the DNA fragments of interest (i.e. from the layer (s) in the desired sub-set) and the cloning site of the vector(s).
  • the adaptor sequences have appropriate restriction site sequences (fragment and vector) at each end and a core sequence in the middle.
  • An example core sequence is 5-CGTAGACGATGCGTGAGAC-3.
  • PCR amplification may optionally be used to enrich the fragments of interest and increase the amount of DNA by using the adaptor sequence as PCR primer. This may be advantageous where the quantity of fragments is relatively low.
  • the method may optionally include the step of ligating adaptor oligonucleotides to all or part (e.g. generally one or both layers, if two layers are selected) of the selected sub-set of fragments in order to facilitate their ligation into vectors adapted to receive them.
  • the adaptor sequences may optionally incorporate extra restriction sites .
  • the sample may comprise corresponding nucleic acid from several (e.g. two or more) different sources. This permits equivalent partitions to be compared e.g. for the discovery of sequence variation.
  • the methods described herein may be used to identify any type of marker e.g. microsatellites, minisatellites etc.
  • the markers are SNPs.
  • the size of the partition sequences will be chosen to be appropriate to the number and nature of markers which it is desired to look for. Thus, for example, if S' different SNPs are required, it may be appropriate to ensure that there are at least that many different unique sequences in the partition (more preferably twice that many) representing a total length of S x 1000 bases.
  • the nucleic acid-containing sample can be pooled from individuals who share a particular trait (e.g. an undesirable trait, such as a particular disorder, or a desirable trait, such as resistance to a particular disorder) . Sequences can be taken from different species, varieties or populations such as to provide markers for plant-breeding, or phylogenetic studies etc.
  • Preferred target genomes include Human, Arabidopsis, wheat, rice, millet and soybean genomes.
  • the invention provides a method for identifying a limited population of markers in a sample nucleic acid, which method comprises:
  • the nucleic acid from different sources may be pooled. However it may also be analysed on separate occasions since the methods of the invention produce a partition of fixed size and fixed content in a reproducible manner.
  • sequence data is obtained by sequencing the library e.g. to 3 -5 times coverage. If desired the actual size of partition can be calculated as described herein.
  • sequences corresponding to in terms of sequence comparisons herein (whether with a known reference, or between different source nucleic acids in a sample) refers to sequences derived from equivalent loci or genes from two different genomes (e.g. the sequences may be orthologues, homologues, alleles etc.) but which may therefore include differences between them (e.g. by way of mutation, polymorphism, or other sequence variation which gives rise to nucleic acid "markers”) .
  • Corresponding sequences will generally be at least 80% identical, most preferably at least about 90%, 95%, 96% ⁇ 97%, 98% or 99% identical. Identity is established by comparison of the full length of the sequences (or the shorter of the sequences) . Thus alignment of different sequencing results, and assessment of the degree of identity between them, can be used to confirm that sequences are indeed corresponding ones, and hence that sequence differences between them represent potential markers. For markers which are candidate single nucleotide polymorphisms, the frequency should preferably not exceed 1% of the total number of bases in the shorter of the two sequences - sequences which meet these criteria may be selected as corresponding. Whether sequences are indeed corresponding sequences showing intergenomic or inter-gene variation, rather than e.g.
  • intergenome or inter-gene-copy variation is generally larger than the allelic variation so that a phylogenetic tree of the sequences in an alignment based on sequence similarity may distinguish the two types of variation.
  • SNP candidates can be validated by genotyping and genetic mapping - if the marker segregates and can be mapped to a chromosomal location, it would normally be recognized as true allelic variation.
  • SNPs are genetic diseases that are associated with the genetic disorder.
  • SNPs are associated with the SNP's map position in the human genome, and (ii) a genotyping assay for scoring the locus in association studies.
  • the assessment of polymorphisms may be carried out on a DNA microchip.
  • a microchip system may involve the synthesis of microarrays of oligonucleotides on a glass support. Fluorescently - labelled PCR products may then be hybridised to the oligonucleotide array and sequence specific hybridisation may be detected by scanning confocal microscopy and analysed automatically (see Marshall & Hodgson (1998) Nature Biotechnology 16: 27-31, for a review) .
  • the invention also provides for a method for making a genotyping microchip for use in assaying a limited population of polymorphisms within a sample (see, e.g., U.S. Pat. Nos. 5,861,242 and 5,837,832) .
  • the present invention can facilitate efficient genotyping. Once a set of polymorphisms is isolated, probes or primers for detecting those polymorphisms can be incorporated into such a chip. When it is desirable to assay an individual for the polymorphisms in the set, nucleic acid is isolated from that individual, and it can be partitioned with the same methods that were used to isolate the original set of polymorphisms.
  • this invention is more flexible than the other reduced representation approaches because it can greatly and flexibly reduce the size of a partition e.g. to as small as one containing 500 unique fragments.
  • methylation sensitive and non-sensitive restriction enzymes may be used separately so that the methylation distribution patterns could be revealed by comparing the two.
  • the invention provides an automated computer system, comprising a combination of hardware and software, that can rapidly determine optimised partitions based on a reference sequence, a desired size, and optionally desired region within the sequence.
  • these aspects of the invention are implemented in computer programs executing on a programmable computer comprising a processor, a data storage system (including volatile and nonvolatile memory and/or storage elements) , at least one input device, and at least one output device.
  • Data input through one or more input devices for temporary or permanent storage in the data storage system includes sequences.
  • Program code is applied to the input data to perform the functions described above and generate output information.
  • the output information is applied to one or more output devices, in known fashion.
  • the program code will include analysis of some or all of the functions described above, and will include the ability to input a reference sequence, and preferences regarding partition size and optionally preferred regions to include in the partition.
  • the program code will also be able to reference (e.g. from a look-up table) restriction site target sequences for different 4 and 6nt cutters .
  • the automated system can be implemented through a variety of combinations of computer hardware and software.
  • the computer hardware is a high-speed multiprocessor computer running a well-known operating system, such as UNIX.
  • a well-known operating system such as UNIX.
  • personal computers using single or multiple microprocessors might also function within the parameters of the present invention.
  • Each such computer program is preferably stored on a storage media or device (e.g., ROM or magnetic diskette) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein.
  • a storage media or device e.g., ROM or magnetic diskette
  • the inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
  • Example 1 methods for determining size of layers and partitions
  • the DNA fragments can be classified into groups based on the restriction ends produced specifically by the restriction enzymes.
  • Each layer of DNA fragments can be specifically cloned into a cloning vector at the corresponding restriction site.
  • the specificity is determined by the cloning site, which only matches the restriction fragment ends of the chosen layers.
  • any combination of the layers can be cloned into a library.
  • the sub-set or combination of layers cloned is termed a "partition" herein.
  • the number of possible partitions will be:
  • the size of a layer depends on the number and the types of enzymes used.
  • v is the frequency deviation for each particular enzyme in a particular genome, and may be assumed to be 1 unless known or established to be otherwise) .
  • N 4- 6 vG[(l-l/4 6 ) Jcl -(1-1/4 6 )* 2 ] .
  • N' 4 _1 VG ⁇ [(1 -1/4 4 ) 2i (l-l/4 6 )*] . xl
  • N' 4 ⁇ VG [(1-1/4 4 ) 3 *(1-1/4 6 )*] xl xl
  • N' 4 ⁇ l2 v'G ⁇ [(l-ll 4 4 ) nk ( ⁇ -ll 4 6 ) k ] .
  • N' 4- 12 v'O ;[(l-l/4 6 )"] .
  • N 4 ⁇ n v'G ⁇ [(l -ll 4 6 ) u ] .
  • N' 4 ⁇ 12 VG ⁇ [(l -l/ 4 4 ) ⁇ !c (1 -1/ 4 6 ) (l+m)k ] .
  • v' is a combined frequency _ ⁇ deviation so that this formula is preferred to be used only when v' is assumed to be one or when pilot test is used to verify the partition design.
  • the size of a partition with one cloning end is the same as that with combination of two different cloning ends if other restriction enzymes (fragment cutters, the enzymes which do not match the cloning site) are the same.
  • the two restriction enzymes for the cloning site may be counted as one enzyme, with the Pi taken as the mean of that of the two enzymes.
  • most cloned fragments will fall between 100 and 2000 base pairs (and hence xl and x2 may be assumed as 100 bp and 2000 bp) . This is because smaller fragments, which are not informative, may be removed by purification techniques. Additionally, the selected restriction endonuclease (s) will generally cleave the sample nucleic acid molecule at least approximately every 2000 bases. Thus larger fragments will be comparatively rare .
  • the larger the partition the more sequence reactions are needed to get sequence pair-wise comparison. It is therefore preferred to keep the size of the partition to the minimum likely to encompass the number of sequence variations which it is desired to identify.
  • the partition should provide more than five hundreds unique sequences (ideally about 1000) . Random sequencing should preferably cover the library 3-5 times - more than 10-times should not be necessary.
  • restriction enzymes should be decided based on the formulae described above.
  • the restriction site frequency can be checked and a particular design to cover certain genomic regions or genes can be performed using a known or bespoke programs. Sequence enrichment strategy can also be considered at that stage.
  • a pilot test is carried out to confirm the expected size of the partition is valid in respect of that genome.
  • a pilot test may be required in each case to hone the partitioning.
  • microgram DNA For e.g. rice DNA, at least two microgram is preferred. For the human genome, more than five microgram DNA is recommended for normal genome partitioning without gel-based purification.
  • Restriction digestion can be performed in one cocktail. However, if the enzymes are optimal in different conditions, two or even three stages of reaction should be carried out.
  • Partial digestion can be used as a special way to enlarge a partition. Normally, partial digestion is only performed on one enzyme, which generates the cloning ends.
  • Enhancing Linkers can be designed to avoid chimerical sequences and restoring the undesired restriction site during ligation.
  • each linker consists of two oligos. The core sequence were 5' -TTGGCGTTTAC-3' and 3' -CCGCAAATG- 5' .
  • One end of the linker has a overhang ⁇ ⁇ ' so that no linkage can be made at this end.
  • the other end has a sticky end with added nucleotides, which matches the restriction sites - this can be linked to the genomic DNA fragments with undesired restriction ends. Because of the competition of these linkers, DNA fragments with the same restriction site as the linkers will not link to each other to create "false" fragments within given layers.
  • each used restriction enzyme except that for cloning site
  • a corresponding enhancing linker should be added into the ligation reaction.
  • the final concentration of each oligo should be O.l ⁇ M. This is conveniently achieved using a stock solution of each oligo (lmM) (which can be stored for use e.g. at -20°C. Before ligation, a 'cocktail' of these oligos is made to contain each necessary oligo with the concentration of lO ⁇ M and l ⁇ l of the cocktail should be added in the lOO ⁇ l ligation reaction.
  • Preferred enhancing linkers are listed in Table 1 hereinafter.
  • the restriction endonuclease in the list is recommended for genome partitioning.
  • Rice is a model plant for cereals. DNA sequences are widely available for rice subspecies, Indica and Japonica. The rice genome is about 400 million base pairs and has been shot-gun sequenced independently by several groups, while at least one other group (Japanese National Rice Genome Project) is using a BAC strategy. Currently, sequences from Huada 4 and RGP 5 are publicly available for Indica and Japonica respectively.
  • the digested DNA was purified using QIAQuick PCR purification kit, QiaGen.
  • the purified DNA was eluted in 20 ⁇ l water and subsequently 5 ⁇ l of the purified DNA fragments were used in a lO ⁇ l ligation reaction.
  • Six oligos (as three enhancing linkers for Hpall, Alul and Dral) were added into the reaction. They were 5' -TTGGCGTTTAC- 3' , 5' -CGGTAAACGCC-3' , 5' -TTGGCGTTTAC-3' , 5 ' -GTAAACGCC-3' , 5 ' - TTGGCGTTTAC-3' , 5' -AATTGTAAACGCC-3' (see Table 1).
  • each oligo was O.l ⁇ M.
  • One ⁇ l of ligase was used and 0.2 ⁇ g pZero vector (InvitroGen) digested "with Pstl was added. The reaction was at 15°C for 30 minutes and then kept at -20°C for subsequent transformation.
  • the one-shot competent cell (InvitroGen) was used for transformation of the E. coli. Kanamycin was used as selection antibiotic. After overnight culture on LB medium agar plate, approximately 600 colonies were selected. The colonies were cultured in 1.5ml LB medium and the plasmid DNA was isolated using QuiaGen miniprep kit. Thirty of the plasmid DNA samples were run on agarose gel to see the size of inserts. Out of the thirty samples, the insert size ranged from 200 to 3000 bp, with average of 800bp. The DNA was sequenced using fluorescent-capillary method on ABI 3700 (sequence service was provided by John Innes Centre).
  • the sequences were processed with PreGap4 to cut away the poor sequence and vector sequence.
  • the sequence with good quality (pregap4 default threshold was used for quality control) can be assembled into contigs using Gap .
  • F n(n -V) l ⁇ j .n i (n i -_) +_" .
  • the size of the partition was estimated as containing 624 unique colonies (the standard error was ignored as being insignificant) (Table 3) .
  • F 500 (500-1) / [212xlx (1-1) +121x2x (2-1) +8x3x (3- D+2x4x(4-l)+lx6x(6-l)+lx8x(8-l)] «624;
  • the average insert size of the colonies was 800bp. Since rice genome is 400 million bp and the size of library was (624 x 800) bp, the genome partition was about 1/800 of the whole genome. In another word, this genome partitioning design reduced the complexity of the library by 800 times.
  • HpyCH4 III None is needed.
  • HpyCH4 IV 5' -TTGGCGTTTAC-3' 5' -CGGTAAACGCC-3' HpyCH4 V

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Physics & Mathematics (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Plant Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Saccharide Compounds (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

This invention relates to 'genome partitioning' and nucleic library construction, for example for sequence variation discovery and screening. The method employs a plurality of restriction enzymes in order to reliably reproduce a representative partition of the entirety of a sample nucleic acid based on the restriction ends of one or more 'layers' of the fragments present. In preferred embodiments there is provided a method for producing a nucleic acid library, which library contains a plurality of different nucleic acid fragments, the method comprising: (i) digesting the sample nucleic acid with a plurality of different restriction enzymes to generate a plurality of different layers of fragments, wherein each layer is a group of fragments having a unique combination of restriction ends, and wherein the combination of layers represents the entirety of the sample nucleic acid, (ii) optionally purifying said fragments, (iii) selecting a desired sub-set of layers according to the unique restriction ends of said layers, (iv) ligating said sub-set of layers into vectors adapted to receive it, (v) transforming host cells with the vectors(vi) culturing said host cells to provide said library containing said partition of the sample nucleic acid. The inventionalso provides systems, methods and functions for designing and optimising such libraries, and genotyping 'chips' based on the genome partitioning methods.

Description

GENOME PARTITIONING
TECHNICAL FIELD
This invention relates generally to nucleic library construction, for example for sequence variation discovery and screening. Particularly, it relates to methods and materials for reproducibly cloning a subset of a sample nucleic acid having reduced complexity.
BACKGROUND ART
Genetic markers are of increasing importance in the geno ics and proteomics fields in understanding phenotype, susceptibility to disease, and response to treatments.
Single nucleotide polymorphisms (SNPs) are one of the most abundant and useful markers, and are the subject of investigation in numerous different organisms, including within the human genome. Methods which have been used in the art have included shotgun sequencing the whole genome or sequencing PCR products (see e.g. Roth (2001) Nature Biotechnology 19: 209-211) . Thus shotgun sequencing of the whole human genome provided a few millions of SNPs from five different individuals as a by-product1 to the main initiative. A more routine method is to design a pair of specific primers for each DNA fragment of interest. After PCR amplification, the fragment can be purified and sequenced. Although these are widely used methods, their efficiency and throughput are very limited. Moreover, both of them are very costly.
Unfortunately the size of eucaryote genome make it difficult to search or screen for DNA sequence variation between individuals. To address this problem, attempts have been made to reduce the complexity of the genome to a more manageable scale, and thereby facilitate marker discovery.
AFLP is one method of achieving this. It had been widely used to study DNA polymorphisms and AFLP markers have been mapped in many species2. However, AFLP has not been used for SNP screening because of its technical limits, such as artificial sequence alteration, high proportion of random fragment loss and complexity of the procedure.
More recently, a more targeted and collaborative effort had been made to reduce the genome complexity for searching human SNPs .
This technology was called the reduced representation shotgun (RRS) strategy and it was adopted for the global human SNPs consortium project. RRS reduced the complexity of the genome by about sixfold, which increased the efficiency for finding the SNP. For RRS, the DNA is digested with a restriction enzyme. Based on the distribution of the fragments at different sizes, a subset of the fragments can be cut out from an electrophoresis gel so that the subset only contains the fragments with a particular size interval. The isolated fragments are subsequently be cloned into a library for random sequencing3 (see Roth (2001) Nature Biotechnology 19: 209-211) .
EP 1001037 (Whitehead Biomedical Inst., US) describes such an RRS strategy. A nucleic acid-containing sample to be assessed is treated to fractionate it into fragments selected in a sequence- dependent manner, a subset of which is selected on the basis of size.
The drawback of this method is that it can only reduce the genome complexity by a small scale.
Thus it can be seen that alternative methods of reproducibly reducing the complexity of nucleic acid samples to a controllable scale e.g. for marker discovery, would provide a contribution to the art .
DISCLOSURE OF THE INVENTION
The present inventors have developed methods to reduce the complexity of a sample of nucleic acid (e.g. genomic or cDNA library) in large, flexible and controllable scales by dividing the genome or a collection of cDNA into smaller subsets. Briefly, the method uses multiple restriction enzymes to cut the DNA into a collection of restriction fragments. Based on the unique restriction ends of the fragments, they are then divided into different groups or "layers". A layer, or a combination of layers, is then cloned at a specific restriction site such that the resulting library only contains the desired subset or partition of the total sample. This permits the reduction of e.g. a genomic library's complexity more than a thousand-fold. By treating each sample (or pooled samples) in this way, a highly consistent sub-set of corresponding fragments is generated in each case. Thus the method has particular utility for sequence variation discovery or screening through direct sequencing. Additionally it can be utilised within automated systems to provide high-throughput screening.
Thus in a first aspect there is provided a method for producing a nucleic acid library, which library contains a plurality of different nucleic acid fragments, the combination of said fragments being a representative partition of the entirety of a sample nucleic acid, the method comprising:
(i) digesting the sample nucleic acid with a plurality of different restriction enzymes to generate a plurality of different layers of fragments, wherein each layer is a group of fragments having a unique combination of restriction ends, and wherein the combination of layers represents the entirety of the sample nucleic acid,
(ii) optionally purifying said fragments, (iϋ) selecting a desired sub-set of layers according to the unique restriction ends of said layers,
(iv) ligating said sub-set of layers into vectors adapted to receive it,
(v) transforming host cells with the vectors (vi) culturing said host cells to provide said library containing said partition of the sample nucleic acid.
Thus the method provides a reproducible method of reducing the complexity of the sample. By selection of the appropriate numbers of restriction enzymes, the type of restriction enzymes, and the sub-set of layers ligated into said vectors, a partition with at least 10, 100, or 1000-fold reduced complexity compared to the sample nucleic acid can be generated.
In preferred embodiments, the method is performed (including, optionally, purification to remove short sequences e.g. less than 100 bps) such that the sub-set of layers ligated into said vectors provides a library with fragments with a size range of 100-2000 bps.
The number of restriction enzymes, the type of restriction enzymes, and the sub-set of layers ligated into said vectors are selected in accordance with the equations set out hereinafter.
Choice of nucleic acid sample
Nucleic acid for use in the present invention may include cDNA, RNA and genomic DNA. It may be provided in amplified form. RNA may be provided as cDNA.
Generally speaking, for cDNA samples, the total size of the cDNA pool will be smaller than a genome. Therefore, fewer enzymes will be used and pilot tests (see below) can be used to optimise the design.
The sample may represent all or part of a particular source of origin e.g. may have been enriched.
Nucleic acids for use in the present invention may be provided isolated and/or purified from their natural environment, in substantially pure or homogeneous form, or free or substantially free of other nucleic acids of the species of origin. Where used herein, the term "isolated" encompasses all of these possibilities.
Choice of restriction enzymes
In preferred embodiments, between 3 and 6 restriction enzymes will be used e.g. equal to, or at least, 3, 4, 5 or 6. Preferably, the restriction enzymes are selected from four-, six- or eight- base-cutters.
Preferably, one or two six-base-cutters (which cut relatively rarely) are used as cloning-end-generators to create the cloning ends for the layer (s) which are selected for cloning. The other restriction enzymes are four-base-cutters (which cut relatively more frequently) and which are used, in effect, as fragment-cutters to destroy some or most of the fragments which could otherwise be cloned into the chosen vector. These enzymes , therefore serve to reduce the size of the selected layer (s). A combination of four- and six-base cutters as fragment cutters may be useful to Λhonef the size of the partition.
Preferred restriction enzymes are selected from any of those given in Table 1. Eight-base cutters include Sfil and Notl. More preferably the enzymes Hpall, Alul, Dral, and Pstl are used (Pstl being used to generate cloning ends) .
However those skilled in the art will appreciate that other combinations of enzymes may be selected as appropriate to the specific application in hand - for instances -when all or part of a reference sequence for a sample is known, the enzymes will be selected such as to have a target frequency appropriate to the size of the partition which it is wished to generate. Likewise if it is desired to investigate a particular region of the sample, the enzymes will be selected such as to achieve this.
Preferably the plurality of enzymes are used simultaneously, and are selected such as to be active under comparable conditions to permit this. Optimum conditions for commercially available restriction enzyme are available from the manufacturers.
Restriction by one enzyme may be partial. In 'such cases it is preferred that the group of fragments in the selected layer have restriction ends created by said partial digestion.
Choice of layers In preferred embodiments, the selected sub-set of layers consists of one layer or two layers
The following represent various preferred embodiments of the invention:
Design of partitions for samples with unknown sequence and size
In some embodiments it may be required to generate a partition having a desired number of unique fragments where no reference sequence is available in a genome of unknown size. In this case the present invention may incorporate the performance of a Λpilot test' to confirm the validity of the partition design, and optionally to refine it.
A pilot test may be used to measure the size or complexity (number of unique sequences) of a particular partition design. It will also provide information about original genome size and restriction site frequencies. The principle is as follows: when sequencing a library (e.g. a partition) having a given number of colonies, there will be a chance for a particular sequence to be sequenced more than once. This is called sequence redundancy of shotgun sequencing strategy. The more colonies sequenced the more redundancy. The smaller (or less complex) the library, the more redundancy. Thus assessment of sequence redundancy provides information about the size of the partition.
The function is described in this formula: F = n(n -ϊ)/∑.ni(ni -ϊ) ± s .
Wherein:
F is the size or complexity of the partition n is the total number of good sequences obtained by sequencing ni is the number of sequence in the ith contig. s is the standard error, which represents the statistical error when the sample size is not big enough.
Thus, for example, 500 colonies may be selected from a partition and sequenced. This should give more than 400 good quality sequences. Using these sequences, the complexity of the partition, F, can be calculated. Additionally, the deviation constant for restriction enzymes in the genome can be extrapolated from the sequence results permitting a honing of the partition design.
Thus the method may include performing the method of the invention as described above using parameters which are likely to produce an acceptable result for a wide spread of genome sizes from different species, for example by performing a digestion of 5μg genomic DNA using a 6nt cutter (e.g. Pstl) as the cloning site enzyme and three 4nt cutters (e.g. Hpall, Alul and Dral). The 'partition may be cloned into pZErO at Pstl site with presence of suitable enhancing linkers (linkers for Hpall, Alul and Dral) .
The following steps are then performed:
(vii) sequencing the fragments in a fraction' of the colonies (host cells) in said library, (viii) calculating the size of the library (i.e. partition) using formula F = n(n — l)/^j.ni(ni —l) ± s .
If the partition size is appropriate it can be accepted.
If not (for example it is too small or too big) then the following further steps, in any appropriate order, may be performed:
(ix) providing the restriction site frequency (fj.) of the enzymes used in the partition, for example based on sequences obtained at step (vii) ,
(x) calculating the genome size G using the formula:
Figure imgf000008_0001
wherein: Nxι.x2 is the number of fragments with length between xl and x2 (which is F above) . k is fragment length xl and x2 are upper and lower limits of the size range of the fragments in the library (these may be assumed as lOObp and 2000bp, as described above, or can be verified by the sequence obtained) Pi is the probability of having a restriction site at any given base for the λi'th enzyme,
(xi) providing a restriction site frequency (fA) for enzymes not used in the partition, for example based on sequences obtained at step (vii) (this can also be expressed as Pi) , (xii) selecting further restriction enzymes on the basis of restriction site frequency (f±) to generate a desired size of partition using the formula:
Figure imgf000009_0001
(xiii) producing a further nucleic library in accordance with steps (i)-(vi) using at least one of these further restriction enzymes.
It should be noted that in reality the possibility of an enzyme cutting site being present will vary according to the restriction enzyme in question. Preferably, where a sample sequence is unknown, therefore Pi is measured or estimated in silico based on a large number sample of sequences e.g. from a database.
A corresponding approach may be used with cDNA from an unknown tissue from an unknown species. In such case the lower complexity (compared with a genome) suggests that Pstl as the cloning site restriction enzyme, and Hpall as the fragment cutter, may be an appropriate starting point.
Design of partitions for samples of known size and unknown sequence
Where the approximate genome size (G) is known, in choosing the enzymes to be used in step (i) , the restriction site frequency may be assumed to be randomly distributed i.e. the v = 1, wherein, v is the deviation constant in the formula P=v/256 for four base cutter and P=v/1096 for six base cutter.
The enzymes to produce a desired partition size are thus selected on the basis of the formula:
Figure imgf000010_0001
More specifically the formula :
J k=x2
N'= 4-12v'G ^[(l-l/44)"A'(l-l/46)(1+"] f _=_1
wherein:
k is fragment length (and xl and x2 are upper and lower limits)
G is the size of the genome n is the number of extra 4 nt cutters m is the number of extra 6 nt cutters
is used to select an appropriate combination of 4nt and 6nt cutters .
This can be verified as described above in steps (vii)-(xiii) if required.
A corresponding approach may be used with cDΝA from tissues or species in which the complexity is known or can be estimated, either directly or by comparison with other species.
Samples with known sequence
One or more reference sequences corresponding to the sample nucleic acid may be known. It will be understood that the sample nucleic acid sequence (inasmuch as it derives from a different source from the reference) is likely to include sequence variation with respect to any reference and indeed this variation between corresponding sequences underlies certain embodiments of tήe present invention. Nevertheless, since such variations are by definition rare, the reference sequence can be used to calculate restriction site frequency for restriction enzymes which it may be desired to use in the methods described herein.
When the sequence is known, the restriction site frequency of each enzyme can be provided, and the formula:
Figure imgf000011_0001
can be used to select the enzymes to produce a desired partition size,
Where a reference sequence is known, a set of restriction enzyme can be based on the restriction map of the desired genes and other sequences so as to select them in particular, while still having an appropriately sized partition.
Some particular practical aspects of the invention will now be discussed in more detail:
Purifica tion
In preferred embodiments the fragments are purified at step (ii) .
As described in the Examples hereinafter, fragments may be purified in a conventional manner. In examples herein, the restriction reaction was passed through a column containing resins (QIAQuick PCR purification kit, QiaGen) , which can effectively adsorb DΝA molecules larger than lOObp. After washing with 70% ethanol, the DΝA fragments were eluted into 30~50μl water. An alternative second method used the BioRad Clean-A-Gene kit. The third method was to purify the fragments by running 1% agarose gel and recovering the DΝA by using Promega gel recovery kit. For the third method, extra DΝA should be used, for example, 10 microgram for rice and pearl millet, 20 microgram for human and wheat.
Preferred purification techniques will be such as to remove fragments of less than 100 bases. Enrichment of sample
Where a corresponding reference sequence is known, an enrichment strategy may be adopted, so that a particular region or gene may be treated. For example, when a particular set of fragments are required to be enclosed, restriction enzymes may be chosen through a restriction map of the reference sequence(s). Moreover, if a particular set of genes are needed to be studied, from the reference sequence, a set of oligos (16~60 bases preferably 20~50 bases) could be designed to enrich the genes e.g. via a hybridization method using magnetic beads with biotin-labelled oligonucleotides attached on them (see e.g. Edwards KJ, Barker JHA, Daly A, Jones C, Karp A (1996) Microsatellite libraries enriched for several microsatellite sequences in plants. BioTechniques 20:758-760). This technique may be particularly useful when dealing with repetitive DNA.
Once the sample is enriched, it may be preferred to use pilot tests to confirm the size of the total DNA pool.
Enhancement linkers
In preferred embodiments, enhancement linkers are added prior or during step (iv) such that only the desired sub-set of layers being included in said library. The linkers prevent fragments with compatible restriction ends combining to form artifacts .
Such linkers (which may be provided as a pair of oligonucleotides) comprise : (i) a core sequence, which is selected such that it does not contain a restriction site and does not have , a high probability of hybridizing to target sequence,
(ii) a portion that matches the appropriate restricted-end
(iii) additional sequence to prevent the linkers annealing e.g. an overhang.
The enhancement linkers are not used for the , cloning site restriction enzyme (s). Preferred linkers are any of those given in Table 1.
Cloning and ligation
The terms "cloning" and "ligation" and so on are used herein because they will be well understood by those skilled in the art, and can be performed by standard techniques. Those skilled in the art are well able to cloned selected fragments into libraries - see, for example, Molecular Cloning: a Laboratory Manual : 2nd edition, Sambrook et al f 1989, Cold Spring Harbor Laboratory Press or Current Protocols in Molecular Biology, Second Edition, Ausubel et al . eds . , John Wiley & Sons, 1992 (or later editions of these works) both of which are specifically incorporated herein by reference. Generally speaking a typical protocol can be achieved by exposing a vector restricted with the appropriate enzymes to the selected layers such as to ligate or otherwise incorporate the heterologous nucleic acid fragments into the 'vector at the appropriate cloning site; exposing the ligation product (recombinant vector) to host cells under conditions whereby the vector is taken up by the cells such as to generate a population of host cells containing the vector; exposing the population of cells to a propagation medium comprising a selection agent whereby transformed host cells which contain vector incorporating the nucleic acid insert are selectively grown or propagated in the medium.
Where desired, one or more pairs of "adaptor" oligonucleotides may be used to bridge the cloning ends of the DNA fragments of interest (i.e. from the layer (s) in the desired sub-set) and the cloning site of the vector(s). The adaptor sequences have appropriate restriction site sequences (fragment and vector) at each end and a core sequence in the middle. An example core sequence is 5-CGTAGACGATGCGTGAGAC-3.
In such cases, PCR amplification may optionally be used to enrich the fragments of interest and increase the amount of DNA by using the adaptor sequence as PCR primer. This may be advantageous where the quantity of fragments is relatively low. Thus, prior to step (iv) , the method may optionally include the step of ligating adaptor oligonucleotides to all or part (e.g. generally one or both layers, if two layers are selected) of the selected sub-set of fragments in order to facilitate their ligation into vectors adapted to receive them.
The adaptor sequences may optionally incorporate extra restriction sites .
Use for discovery of sequence variation
As described in more detail below, the sample may comprise corresponding nucleic acid from several (e.g. two or more) different sources. This permits equivalent partitions to be compared e.g. for the discovery of sequence variation.
The methods described herein may be used to identify any type of marker e.g. microsatellites, minisatellites etc. Preferably the markers are SNPs.
The size of the partition sequences will be chosen to be appropriate to the number and nature of markers which it is desired to look for. Thus, for example, if S' different SNPs are required, it may be appropriate to ensure that there are at least that many different unique sequences in the partition (more preferably twice that many) representing a total length of S x 1000 bases.
Markers can be investigated which are appropriate to the samples . For example, the nucleic acid-containing sample can be pooled from individuals who share a particular trait (e.g. an undesirable trait, such as a particular disorder, or a desirable trait, such as resistance to a particular disorder) . Sequences can be taken from different species, varieties or populations such as to provide markers for plant-breeding, or phylogenetic studies etc. Preferred target genomes (or cDNA sources) include Human, Arabidopsis, wheat, rice, millet and soybean genomes.
Thus the invention provides a method for identifying a limited population of markers in a sample nucleic acid, which method comprises:
(a) providing sample nucleic acid from at least 2 different sources,
(b) providing a representative partition of the sample nucleic acid in accordance with the methods described herein,
(c) identifying differences within corresponding sequences from said different sources contained within the library.
The nucleic acid from different sources may be pooled. However it may also be analysed on separate occasions since the methods of the invention produce a partition of fixed size and fixed content in a reproducible manner.
Generally the corresponding sequences from the different sources within the partition are sequenced to identify the differences.
Such sequence data is obtained by sequencing the library e.g. to 3 -5 times coverage. If desired the actual size of partition can be calculated as described herein.
The term "corresponding to" in terms of sequence comparisons herein (whether with a known reference, or between different source nucleic acids in a sample) refers to sequences derived from equivalent loci or genes from two different genomes (e.g. the sequences may be orthologues, homologues, alleles etc.) but which may therefore include differences between them (e.g. by way of mutation, polymorphism, or other sequence variation which gives rise to nucleic acid "markers") .
Corresponding sequences will generally be at least 80% identical, most preferably at least about 90%, 95%, 96%^ 97%, 98% or 99% identical. Identity is established by comparison of the full length of the sequences (or the shorter of the sequences) . Thus alignment of different sequencing results, and assessment of the degree of identity between them, can be used to confirm that sequences are indeed corresponding ones, and hence that sequence differences between them represent potential markers. For markers which are candidate single nucleotide polymorphisms, the frequency should preferably not exceed 1% of the total number of bases in the shorter of the two sequences - sequences which meet these criteria may be selected as corresponding. Whether sequences are indeed corresponding sequences showing intergenomic or inter-gene variation, rather than e.g. multiple copies in a single genome or individual, can be verified if desired by conventional methods familiar to those skilled in the art of SNP identification. For example, intergenome or inter-gene-copy variation is generally larger than the allelic variation so that a phylogenetic tree of the sequences in an alignment based on sequence similarity may distinguish the two types of variation. If required, SNP candidates can be validated by genotyping and genetic mapping - if the marker segregates and can be mapped to a chromosomal location, it would normally be recognized as true allelic variation.
Use in genotyping
Many uses of SNPs require: (i) the SNP ' s map position in the human genome, and (ii) a genotyping assay for scoring the locus in association studies.
Methods for assessment of polymorphisms are reviewed by Schafer and Hawkins, (Nature Biotechnology (1998)16, 33-39, and references referred to therein) and include: allele specific oligonucleotide probing, amplification using PCR, denaturing gradient gel electrophoresis, RNase cleavage, chemical cleavage of mismatch, T4 endonuclease VII cleavage, multiphoton detection, cleavase fragment length polymorphism, E. coli mismatch repair enzymes, denaturing high performance liquid chromatography, (MALDI-TOF) mass spectrometry, analysing the melting characteristics for double stranded DNA fragments as described by Akey et al (2001) Biotechniques 30; 358-367.
The assessment of polymorphisms may be carried out on a DNA microchip. One example of such a microchip system may involve the synthesis of microarrays of oligonucleotides on a glass support. Fluorescently - labelled PCR products may then be hybridised to the oligonucleotide array and sequence specific hybridisation may be detected by scanning confocal microscopy and analysed automatically (see Marshall & Hodgson (1998) Nature Biotechnology 16: 27-31, for a review) . Thus the invention also provides for a method for making a genotyping microchip for use in assaying a limited population of polymorphisms within a sample (see, e.g., U.S. Pat. Nos. 5,861,242 and 5,837,832) .
As with other reduced representation approaches, the present invention can facilitate efficient genotyping. Once a set of polymorphisms is isolated, probes or primers for detecting those polymorphisms can be incorporated into such a chip. When it is desirable to assay an individual for the polymorphisms in the set, nucleic acid is isolated from that individual, and it can be partitioned with the same methods that were used to isolate the original set of polymorphisms.
However, this invention is more flexible than the other reduced representation approaches because it can greatly and flexibly reduce the size of a partition e.g. to as small as one containing 500 unique fragments.
For example, if one wishes to genotype a new sample for 10,000, or 1000 or 100 SNPs isolated from a specific partition, one could restriction-digest the sample; isolate an appropriate partition; and amplify by PCR using primers complementary to a generic linker. The resulting amplification products could be hybridized to an appropriate 'genotyping array'. Such methods allow the user to concentrate study on only a limited portion of the entire spectrum of the avaiiable polymorphisms. By examining only a limited portion of the genome, this method has the added benefit of reducing cross- reactivity between unrelated genetic sites.
Use for investigation of methylation sensitivity
For methylation sensitivity studies, methylation sensitive and non- sensitive restriction enzymes may be used separately so that the methylation distribution patterns could be revealed by comparing the two.
Computer-implemented embodiments In a further aspect of the present invention, some or all of the steps of the methods described above may be performed by a digital computer, in particular steps in designing appropriate genome partitions based on reference sequence restriction maps and\or equations as described above. Although this could be done using commercially available sequence analysis software and sequence databases, in preferred embodiments a bespoke system directly provides the choice of enzymes to use.
Thus the invention provides an automated computer system, comprising a combination of hardware and software, that can rapidly determine optimised partitions based on a reference sequence, a desired size, and optionally desired region within the sequence.
Preferably, these aspects of the invention are implemented in computer programs executing on a programmable computer comprising a processor, a data storage system (including volatile and nonvolatile memory and/or storage elements) , at least one input device, and at least one output device. Data input through one or more input devices for temporary or permanent storage in the data storage system includes sequences. Program code is applied to the input data to perform the functions described above and generate output information. The output information is applied to one or more output devices, in known fashion.
The program code will include analysis of some or all of the functions described above, and will include the ability to input a reference sequence, and preferences regarding partition size and optionally preferred regions to include in the partition. The program code will also be able to reference (e.g. from a look-up table) restriction site target sequences for different 4 and 6nt cutters .
The automated system can be implemented through a variety of combinations of computer hardware and software. In one implementation, the computer hardware is a high-speed multiprocessor computer running a well-known operating system, such as UNIX. In other embodiments personal computers using single or multiple microprocessors might also function within the parameters of the present invention.
Each such computer program is preferably stored on a storage media or device (e.g., ROM or magnetic diskette) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer to perform the procedures described herein. The inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
The invention will now be further described with reference to the following non-limiting Figures and Examples . Other embodiments of the invention will occur to those skilled in the art in the light of these.
Example 1 - methods for determining size of layers and partitions
Relationship between Enzymes and Layers
When DNA is digested with more than one restriction enzymes, the DNA fragments can be classified into groups based on the restriction ends produced specifically by the restriction enzymes.
When N different enzymes are used, the maximum number of groups of DNA fragments generated, which are called "layers" herein, is:
L = N + (N2 ~ N)/2
Each layer of DNA fragments can be specifically cloned into a cloning vector at the corresponding restriction site. The specificity is determined by the cloning site, which only matches the restriction fragment ends of the chosen layers.
Combinations of Layers In principle, any combination of the layers can be cloned into a library. The sub-set or combination of layers cloned is termed a "partition" herein. The number of possible partitions will be:
p = c[ + cl + cL L-
For example, when five different enzymes were used, there should be up to 15 layers and 32766 partitions. In practice, it is preferred to use only a partition containing one or two layers for library construction. Thus, five enzymes could provide 15 or 225 partitions. Given that more than a hundred of restriction enzymes are available on the market, the number of possible partition of a genome is huge.
Estima ting number and size of fragments per layer
The size of a layer depends on the number and the types of enzymes used.
For a given cloning site generated by a 6nt cutter,
Total number of fragments = total number of restriction sites = vG 46 '
(G stands for genome size in base pairs).
(v is the frequency deviation for each particular enzyme in a particular genome, and may be assumed to be 1 unless known or established to be otherwise) .
The possibility of a restriction fragment with length ≥k is (1-1/46)*. The possibility of obtaining a fragment with length of k is (l-l/46)*-(l-l/46)*+1
The number of fragments with length between xl and x2 is N = 4-6vG[(l-l/46)Jcl -(1-1/46)*2] .
With an extra 4nt cutter, the number of fragments per layer will be reduced because a given fragment could be cut internally, to generate fragments with different combinations of restriction ends, and hence no long within the original layer . Thus the fragments per
layer will be reduced to : 4"f (l-\l 46)k\ .
Figure imgf000021_0001
_2
With two extra 4nt cutters, N'= 4_1VG∑[(1 -1/44)2i(l-l/46)*] . xl
With three extra 4nt cutters, N'= 4~VG [(1-1/44)3*(1-1/46)*] xl xl With n extra 4nt cutters , N'= 4~l2v'G ∑[(l-ll 44)nk (\-ll 46)k] . xl
With an extra 6nt cutter, the number of fragments will be reduced
to N'= 4-12v'O ;[(l-l/46)"] .
With two extra 6nt cutters, N,= 4~nv'G∑[(l -ll 46)u] .
_ι If one 6nt cutter is used for cloning site, a 4nt extra cutter and λm' 6nt extra cutters are used, the number of fragments will be
N'= 4~12VG∑[(l -l/ 44)π!c (1 -1/ 46)(l+m)k] . Herein v' is a combined frequency _ι deviation so that this formula is preferred to be used only when v' is assumed to be one or when pilot test is used to verify the partition design.
In general, the number of fragments with length between xl and x2 k =x2 i
( in base pairs ) is Nx^x2 = GP [( ~~Pi)k < ln which P± is the k=xl i-1 possibility to have a restriction site at any base pair for the ^ i ' th enzyme used and Pj represents that for the enzyme of the cloning site.
It should be noted that when a partition is based on fragments having two different restriction ends, the number of matching fragments remains the same. Although the number of total fragments is doubled with two enzymes, the chance of having two different ends is 50%. Therefore, the size of a partition with one cloning end is the same as that with combination of two different cloning ends if other restriction enzymes (fragment cutters, the enzymes which do not match the cloning site) are the same. Thus for the purposes of calculation, the two restriction enzymes for the cloning site may be counted as one enzyme, with the Pi taken as the mean of that of the two enzymes.
In preferred embodiments, most cloned fragments will fall between 100 and 2000 base pairs (and hence xl and x2 may be assumed as 100 bp and 2000 bp) . This is because smaller fragments, which are not informative, may be removed by purification techniques. Additionally, the selected restriction endonuclease (s) will generally cleave the sample nucleic acid molecule at least approximately every 2000 bases. Thus larger fragments will be comparatively rare .
Testing the number of unique fragments - "pilot testing"
Since the frequency of a given restriction site varies greatly from enzyme to enzyme and from genome to genome, the frequency of the enzymes and the actual size of designed partitions needs to be tested unless it is known from a pre-existing sequence.
To evaluate the number of unique fragments in a partition. After the library of a partition is constructed in accordance with the above, randomly pick and sequence 500 well-separated colonies. Assemble them so that the same sequences will be piled in alignments. Each alignment of a sequence may be termed a "contig" or "clique". The number of unique fragments in the partition should be E =
Figure imgf000022_0001
-l)±_? , in which n is the total number of sequence and n± is the number of the sequences in the ith contig. When the number of sequences is big enough, the standard error s could be neglected. (See Appendix I where the derivation is given)
Example 2 - Use of a partition to find DNA sequence variation
Partition strategy
Clearly, the larger the partition, the more sequence reactions are needed to get sequence pair-wise comparison. It is therefore preferred to keep the size of the partition to the minimum likely to encompass the number of sequence variations which it is desired to identify.
For example, when if five hundred SNPs are required for a population or a panel of varieties, the partition should provide more than five hundreds unique sequences (ideally about 1000) . Random sequencing should preferably cover the library 3-5 times - more than 10-times should not be necessary.
The number and types of restriction enzymes should be decided based on the formulae described above. When the genome sequence is available, the restriction site frequency can be checked and a particular design to cover certain genomic regions or genes can be performed using a known or bespoke programs. Sequence enrichment strategy can also be considered at that stage.
For a new species and a particular set of enzymes, a pilot test is carried out to confirm the expected size of the partition is valid in respect of that genome. For cDNA, a pilot test may be required in each case to hone the partitioning.
Sample preparation
This can be done in conventional manner. For e.g. rice DNA, at least two microgram is preferred. For the human genome, more than five microgram DNA is recommended for normal genome partitioning without gel-based purification.
Restriction digestion
Restriction digestion can be performed in one cocktail. However, if the enzymes are optimal in different conditions, two or even three stages of reaction should be carried out.
Partial digestion can be used as a special way to enlarge a partition. Normally, partial digestion is only performed on one enzyme, which generates the cloning ends.
Use of Enhancing Linkers For ligation, enhancing linkers can be designed to avoid chimerical sequences and restoring the undesired restriction site during ligation. In the Examples herein, each linker consists of two oligos. The core sequence were 5' -TTGGCGTTTAC-3' and 3' -CCGCAAATG- 5' .
In order to define the core sequence, a set of randomly generated short sequences were Blast searched against all sequences from different species in EMBL database. 5 ' -GGCGTTTAC-3 ' was selected on the basis that it had the least hits, and it did not contain a restriction site.
One end of the linker has a overhang λττ' so that no linkage can be made at this end. The other end has a sticky end with added nucleotides, which matches the restriction sites - this can be linked to the genomic DNA fragments with undesired restriction ends. Because of the competition of these linkers, DNA fragments with the same restriction site as the linkers will not link to each other to create "false" fragments within given layers.
Thus for each used restriction enzyme (except that for cloning site) a corresponding enhancing linker should be added into the ligation reaction. In preferred embodiments the final concentration of each oligo should be O.lμM. This is conveniently achieved using a stock solution of each oligo (lmM) (which can be stored for use e.g. at -20°C. Before ligation, a 'cocktail' of these oligos is made to contain each necessary oligo with the concentration of lOμM and lμl of the cocktail should be added in the lOOμl ligation reaction.
Preferred enhancing linkers are listed in Table 1 hereinafter. The restriction endonuclease in the list is recommended for genome partitioning.
Cloning
This can be done in conventional manner. Zero Background vector from Invitrogen was used. Ligation, transformation, colonies picking, miniprep and sequencing were performed using routine DNA library construction protocols.
Compatibility with Two automated systems (Qiagen Robots 3000 and 8000 with QIAprep 96 Turbo BioRobot Kit) was demonstrated showing the utility of the invention in high-throughput screening.
Example 3 - SNP discovery in rice
Rice is a model plant for cereals. DNA sequences are widely available for rice subspecies, Indica and Japonica. The rice genome is about 400 million base pairs and has been shot-gun sequenced independently by several groups, while at least one other group (Japanese National Rice Genome Project) is using a BAC strategy. Currently, sequences from Huada4 and RGP5 are publicly available for Indica and Japonica respectively.
Genomic DNA was isolated from 20 rice varieties and equally pooled into one sample (Table 2 below) .
Ten μg of the pooled DNA was digested with 0.5 μl of Hpall, Alul, Dral and Pstl each in a cocktail with GIB buffer 8. The total volume of reaction was lOOμl and it was incubated at 37 °C for 12 hours overnight.
The digested DNA was purified using QIAQuick PCR purification kit, QiaGen. The purified DNA was eluted in 20 μl water and subsequently 5μl of the purified DNA fragments were used in a lOμl ligation reaction. Six oligos (as three enhancing linkers for Hpall, Alul and Dral) were added into the reaction. They were 5' -TTGGCGTTTAC- 3' , 5' -CGGTAAACGCC-3' , 5' -TTGGCGTTTAC-3' , 5 ' -GTAAACGCC-3' , 5 ' - TTGGCGTTTAC-3' , 5' -AATTGTAAACGCC-3' (see Table 1). The final concentration of each oligo was O.lμM. One μl of ligase was used and 0.2μg pZero vector (InvitroGen) digested "with Pstl was added. The reaction was at 15°C for 30 minutes and then kept at -20°C for subsequent transformation.
The one-shot competent cell (InvitroGen) was used for transformation of the E. coli. Kanamycin was used as selection antibiotic. After overnight culture on LB medium agar plate, approximately 600 colonies were selected. The colonies were cultured in 1.5ml LB medium and the plasmid DNA was isolated using QuiaGen miniprep kit. Thirty of the plasmid DNA samples were run on agarose gel to see the size of inserts. Out of the thirty samples, the insert size ranged from 200 to 3000 bp, with average of 800bp. The DNA was sequenced using fluorescent-capillary method on ABI 3700 (sequence service was provided by John Innes Centre).
The sequences were processed with PreGap4 to cut away the poor sequence and vector sequence. The sequence with good quality (pregap4 default threshold was used for quality control) can be assembled into contigs using Gap .
About 400 pairwise comparisons were found (Table 3) , from which 278 SNP candidates were identified.
Table 3 Number of sequences and SNP candidates
No. of sequences No. of sequences No. of SNP in each contig No. of Contig in each contig type candidates
1 212 212 -
2 121 242 222
3 8 24 46
4 2 8 6
6 1 6 0
8 1 8 4
Total 345 500 278
Using the formula: F = n(n -V) l∑j.ni(ni -_) +_" , the size of the partition was estimated as containing 624 unique colonies (the standard error was ignored as being insignificant) (Table 3) . In this calculation, F = 500 (500-1) / [212xlx (1-1) +121x2x (2-1) +8x3x (3- D+2x4x(4-l)+lx6x(6-l)+lx8x(8-l)]«624;
The average insert size of the colonies was 800bp. Since rice genome is 400 million bp and the size of library was (624 x 800) bp, the genome partition was about 1/800 of the whole genome. In another word, this genome partitioning design reduced the complexity of the library by 800 times.
Example 4 - SNP discovery in Pearl millet
Pearl millet (Table 4) was tested using the procedure set out in Example 3. The total number of sequences was 607 from about 800 colonies. The result showed that a partition containing about 2000 colonies were constructed.
Since the size of pearl millet genome is not known accurately, the actual reduction in complexity of the genome, was not determined, nor has the total number of SNPs been calculated.
Table 4 Pearl millet varieties pooled for genome partitioning experiment
1 Tift238D
2 IP10401
3 IP10402
4 IP8214
5 81B
6 ICMP451
7 LGD-1
8 . ICMP85410
9 . Tift23DB
1( D. 843B
Figure imgf000027_0001
ι; _. PT732B i: 3. P1449
1 .. 841B
1. 5. 863B
1 δ. H77
1 7. PRLT2
1 3. ICMP501
1 .. Tift383
2 3. 700481-21 References
1. J. Craig Venter, et al . 2001. Science 291:1304-1315.
2. P. Vos, et al. 1995. Nucleic Acids Res 23:4407-4414.
3. D. Altshuler, et al . 2000. Nature 407: 513-516.
4. Hua Da rice sequence database: http://210.83.138.53/rice/tools.php
5. Japanese sequence database: http://rgp.dna.aff c.go.jp/
Table 1 Sequences of enhancing linkers
Ace I
5' -TTGGCGTTTAC-3'
5' -ATGTAAACGCC-3'
5' -CGGTAAACGCC-3' Aci I
5' -TTGGCGTTTAC-3'
5' -CGGTAAACGCC-3'
Afl III
5' -TTGGCGTTTAC-3' 5'-CUYGGTAAACGCC-3'
Alu I
5' -TTGGCGTTTAC-3'
5'-GTAAACGCC-3'
Apo I 5' -TTGGCGTTTAC-3'
5' -AATTGTAAACGCC-3'
Ban I
5' -TTGGCGTTTAC-3'
5' -GYUCGTAAACGCC-3' Ban II
5' -TTGGCGTTTACUGCY-3'
5'-GTAAACGCC-3'
Bfa I
5' -TTGGCGTTTAC-3' 5'-TAGTAAACGCC-3'
BsaA I
5' -TTGGCGTTTAC-3'
5 ' -GTAAACGCC-3 f
BsaH I 5' -TTGGCGTTTAC-3'
5' -CGGTAAACGCC-3'
BsaJ I
5' -TTGGCGTTTAC-3' 5' -CNNGGTAAACGCC-3'
BsiE I
5' -TTGGCGTTTACUY-3'
5'-GTAAACGCC-3'
BssK I 5' -TTGGCGTTTAC-3'
5' -CCNGGGTAAACGCC-3 '
BstN I
None is needed.
BstU I 5' -TTGGCGTTTAC-3'
5'-GTAAACGCC-3'
Btg I
5' -TTGGCGTTTAC-3'
5' -CUYGGTAAACGCC-3' Cac8 I
5' -TTGGCGTTTAC-3'
5'-GTAAACGCC-3'
Dpnl
5' -TTGGCGTTTAC-3' 5'-GTAAACGCC-3'
Dpn II
5' -TTGGCGTTTAC-3'
5 ' -GATCGTAAACGCC-3'
Dra I 5' -TTGGCGTTTAC-3'
5 ' -AATTGTAAACGCC-3 '
Eae I
5' -TTGGCGTTTAC-3'
5f -GGCCGTAAACGCC-3' Fnu4H I
None is needed.
Eae II
5' -TTGGCGTTTACGCGC-3'
5'-GTAAACGCC-3' Hae I II
5' -TTGGCGTTTAC-3' 5'-GTAAACGCC-3' Hha I 5' -TTGGCGTTTACCG-3' 5'-GTAAACGCC-3' Hinc II
5' -TTGGCGTTTAC-3' 5'-GTAAACGCC-3' Hinf I
5' -TTGGCGTTTAC-3' 5 ' -ANTGTAAACGCC-3' HinPl I
5' -TTGGCGTTTAC-3' 5' -CGGTAAACGCC-3' Hpa II
5' -TTGGCGTTTAC-3' 5' -CGGTAAACGCC-3 ' Hpyl 88 I None is needed. HpyCH4 III None is needed. HpyCH4 IV 5' -TTGGCGTTTAC-3' 5' -CGGTAAACGCC-3' HpyCH4 V
5' -TTGGCGTTTAC-3' 5'-GTAAACGCC-3' Mbo I 5' -TTGGCGTTTAC-3'
5' -GATCGTAAACGCC-3' Mnl I
None is needed. Mse I 5' -TTGGCGTTTAC-3' 5' -TAGTAAACGCC-3' Msl I
None is needed. Msp I 5' -TTGGCGTTTAC-3'
5' -CGGTAAACGCC-3'
Nla III
5' -TTGGCGTTTACCATG-3' 5' -GTAAACGCC-3'
Nla IV
5' -TTGGCGTTTAC-3'
5' -GTAAACGCC-3'
Nsp I 5' -TTGGCGTTTACCATG-3'
5' -GTAAACGCC-3'
Rsa I
5' -TTGGCGTTTAC-3'
5' -GTAAACGCC-3' Sau3A I
5' -TTGGCGTTTAC-3'
5' -GATCGTAAACGCC-3'
Sau96 I
5' -TTGGCGTTTAC-3' 5 ' -GNCGTAAACGCC-3 '
ScrF I
None is needed.
Sfc I
5' -TTGGCGTTTAC-3' 5'-TUYAGTAAACGCC-3'
Sml I
5' -TTGGCGTTTAC-3'
5' -TYUAGTAAACGCC-3'
Taq I 5' -TTGGCGTTTAC-3'
5' -CGGTAAACGCC-3'
Tsp509 I
5' -TTGGCGTTTAC-3'
5 ' -AATTGTAAACGCC-3 ' CviJ I
None is needed.
Cvi T I
None is needed. Table 2 20 Rice Varieties
Series No. RC No. IRGC No. Name
1 1 25833 AusJhari
2 8 25885 Lakhsnikajal
3 10 25898 Mimidim
4 17 27502 Walanga
5 18 27522 Ashmber
6 21 33118 Hnanwa
7 26 34737 Bawoi
8 27 38697 NPE837
9 28 62154 ASU
10 33 64780 Kalshori
1 1 36 64792 Narikel Jhupi
12 40 64887 Dagpa Bara
13 48 66513 Guru uthessa
14 50 66529 Podi Niyanwee
15 58 66614 Puteh Kaca
16 81 67423 Aguyod
17 88 67720 Banikat
18 98 71496 Babalatik
19 178 78333 Khau M uong Pieng
20 181 78369 Nep Ngau
Appendix I Derivation of formula, F = n(n—l)/^.n.(n. — l) + s .
Assume a pool which has F different/unique sequences and each unique sequence has very large equal number of copies. Then the size of this pool, in terms of genome partitioning, is F.
The chance to randomly selecting a pair of sequences that are the same is 1/F, because the pool is very large so that taking one sequence off the pool makes almost no difference to the size.
If P is the total number of pair wise combinations of the same sequences and P' is the total number of any pair wise combinations, the chance to randomly selecting a pair of sequences that are the same is also P/P' . Thus, F=P'/P. If n is the total number of sequences of the pool. P'=n(n-l)/2. If ni is the number of sequences of the ith unique sequence (or contigs) . i is from 1 to F. P=[n_ (n^-l) +n2 (n2-l) ...+nF(nF-
1) ] /2=∑ni(ni -l)/2=>»,. -l)/2.
1
Therefore, F = n(n -ϊ)l .ni(ni -1).
If the number of sequences is small as we are sampling the pool, there will be a statistical error, which is given as S. As the result, F = n(n -ϊ)l j.ni(ni - V) ± s .

Claims

1 A method for producing a nucleic acid library, which library contains a plurality of different nucleic acid fragments, the combination of said fragments being a representative partition of the entirety of a sample nucleic acid, the method comprising: (i) digesting the sample nucleic acid with a plurality of different restriction enzymes to generate a plurality of different layers of fragments, wherein each layer is a group of fragments having a unique combination of restriction ends, and wherein the combination of layers represents the entirety of the sample nucleic acid, (ii) optionally purifying said fragments,
(iii) selecting a desired sub-set of layers according to the unique restriction ends of said layers,
(iv) ligating said sub-set of layers into vectors adapted to receive it, (v) transforming host cells with the vectors
(vi) culturing said host cells to provide said library containing said partition of the sample nucleic acid.
2 A method as claimed in claim 1 wherein the sample is genomic DNA.
3 A method as claimed in claim 2 wherein the sample consists of an entire genome.
4 A method as claimed in any one of the preceding claims wherein the number of and type of the different restriction enzymes used in step (i) , and the sub-set of layers selected in step (iii) are selected in order to generate a library size with a reduced complexity compared to the sample nucleic acid of at least 10, 100, or 1000-fold.
5 A method as claimed in any one of the preceding claims wherein between 3 and 6 restriction enzymes are used. 6 A method as claimed in any one of the preceding claims wherein the digestion by one restriction enzyme is partial, and the group of fragments in the selected layer have restriction ends created by said partial digestion.
7 A method as claimed in any one of the preceding claims wherein the selected sub-set of layers consists of one layer.
8 A method as claimed in any one of claims 1 to 6 wherein the sub-set of layers consists of two layers.
9 A method as claimed in any one of the preceding claims wherein the fragments are purified at step (ii) .
10 A method as claimed in claim 9 wherein the purification removes fragments of less than 100 bases.
11 A method as claimed in any one of the preceding claims wherein the size range of the fragments in the library is between 100 and 2000 bps.
12 A method as claimed in any one of the preceding claims wherein enhancement linkers are added prior or during step (iv) to prevent undesired sub-sets of layers being included in said library, each of which enhancement linkers comprises: (i) a core sequence,
(ii) a portion that matches the restricted-end of an undesired subset, and (iii) a sequence to inhibit the fragments in the undesired sub-set recombining.
13 A method as claimed in claim 12 wherein the enhancement linkers comprise any of those given in Table 1.
14 A method as claimed in any one of the preceding claims wherein adaptor oligonucleotides are used in step (iv) to facilitate the ligation of the desired sub-set of layers into vectors adapted to receive it. 15 A method as claimed in any one of the preceding claims wherein said sample is derived from one of the following organisms or species : Human, Arabidopsis, wheat, rice, millet, soybean.
16 A method as claimed in any one of the preceding claims wherein libraries are prepared separately using methylation sensitive and non-sensitive restriction enzymes, whereby comparison of the libraries permits methylation distribution patterns in the sample to be revealed.
17 A method as claimed in any one of the preceding claims wherein the sequence of the sample nucleic acid is known, and the number of and type of the different restriction enzymes used in step (i), and the sub-set of layers selected in step (iii) are selected to produce the desired library size in accordance with the restriction site frequency of each enzyme in the sample nucleic acid sequence.
18 A method as claimed in claim 17 wherein the number of and type of the different restriction enzymes used in step (i) , and the sub-set of layers selected in step (iii) , are selected in accordance with the formula:
*=χw=ι
Nxl~x2 is the number of fragments with length between xl and x2 k is fragment length xl and x2 are upper and lower limits of the size range of the fragments in the library
Pi is the probability of having a restriction site at any given base for the Λi'th enzyme.
19 A method as claimed in claim 17 or 18 wherein a representative partition of a particular region is produced in accordance with a restriction map of the sample nucleic acid sequence. 20 A method as claimed in any one of claims 1 to 16 wherein the size of the sample nucleic acid is known, and the number of and type of the different restriction enzymes used in step (i) , and the sub-set of layers selected in step (iii) are selected to produce the desired library size in accordance with an assumed restriction site frequency of each enzyme in the sample nucleic acid.
21 A method as claimed in claim 20 wherein the restriction site frequency within the sample is assumed based on sequence information from the sample.
22 A method as claimed in claim 20 wherein the restriction site frequency is assumed to be randomly distributed
23 A method as claimed in any one of claims 20 to 22 wherein the number of and type of the different restriction enzymes used in step (i), and the sub-set of layers selected in step (iii), are selected in accordance with the formula:
fc=χl '=ι
Nxl~x2 is the number of fragments with length between xl and x2 k is fragment length G is the size of the sample xl and x2 are upper and lower limits of the size range of the fragments in the library
Pi is the probability of having a restriction site at any given base for the x i'th enzyme.
24 A method as claimed in claim 23 wherein the restriction enzymes used in step (i) are 4 and 6nt cutting restriction enzymes, and are selected on the basis of the formula:
| _=_2 iY'=4-12v'G^[(l-l/44)rt(l-l/46)( i] k=x\
wherein: k is fragment length G is the size of the sample xl and x2 are upper and lower limits of the size range of the fragments in the library n is the number of extra 4 nt cutters m is the number of extra 6 nt cutters
25 A method as claimed in any one of claims 20 to 24 wherein the size of the resulting library is estimated by the further steps of: (vii) sequencing the fragments in a fraction of the host cells in said library, (viii) estimating the size of the library using formula:
F = n(n-l)/∑lni(ni -l)±s wherein:
F is the estimated size of the library n is the total number of sequences obtained by sequencing, ni is the number of sequence in the ith contig, s is the standard error.
26 A method as claimed in claim 25 wherein an optimised library is generated by the further steps of:
(ix) providing a restriction site frequency for enzymes not used in step (i) , optionally using the sequence information obtained at step (vii) ,
(x) selecting further restriction enzymes on the basis of restriction site frequency to generate a desired size of partition using the formula given in claim 23,
(xi) producing an optimised nucleic library in accordance with steps (i)-(vi) using at least one of these further restriction enzymes,
(xii) optionally repeating steps (vii) to (xi) until the desired library size is obtained.
27 A method as claimed in any one of claims 1 to 16 wherein the size of the sample nucleic acid is unknown, and the number of and type of the different restriction enzymes used in step (i) , and the sub-set of, layers selected in step (iii) are selected to produce the desired library size in accordance with an assumed restriction site frequency of each enzyme in the sample nucleic acid.
28 A method as claimed in claim 27 wherein the restriction site frequency within the sample is assumed based on sequence information from the sample.
29 A method as claimed in claim 28 wherein the restriction site frequency is assumed to be randomly distributed
30 A method as claimed in any one of claims 27 and 29 wherein three 4nt- and one 6nt- cutting restriction enzymes are used in step (i) .
31 A method as claimed in claim 30 wherein Hpall, Alul, Dral, and Pstl are used in step (i) .
32 A method as claimed in any one of claims 27 to 31 wherein the size of the resulting library is estimated by the further steps of: (vii) sequencing the fragments in a fraction of the host cells in said library, (viii) estimating the size of the library using formula:
F = n(n-ϊ)/∑.ni(ni -ϊ)±s wherein: F is the estimated size of the library n is the total number of sequences obtained by sequencing, ni is the number of sequence in the ith contig, s is the standard error.
33 A method as claimed in claim 32 wherein' the size of the sample is estimated by the further steps of:
(ix) providing the restriction site frequency of the enzymes used in step (i) , optionally using the sequence information obtained at step (vii) , (x) calculating the sample size G using the formula:
Figure imgf000038_0001
wherein :
Nxl~x2 is the number of fragments with length between xl and x2 k is fragment length xl and x2 are upper and lower limits of the size range of the fragments in the library
Pi is the probability of having a restriction site at any given base for the Λi'th enzyme,
34 A method as claimed in claim 33 wherein an optimised library is generated by the further steps of:
(xi) providing a restriction site frequency for enzymes not used in step (i) , optionally using the sequence information obtained at step (vii) ,
(xii) selecting further restriction enzymes on the basis of restriction site frequency to generate a desired size of partition using the formula given in claim 33,
(xiii) producing an optimised nucleic library in accordance with steps (i)-(vi) using at least one of these further restriction enzymes, (xiv) optionally repeating steps (vii) to (xiii) until the desired library size is obtained.
35 A method as claimed in any one of the preceding claims wherein the sample nucleic acid comprises nucleic acid from two or more different sources which are pooled to produce a library comprising fragments from each.
36 A method for identifying a limited population of markers in a sample nucleic acid, which method comprises:
(a) providing sample nucleic acid from at least two different sources,
(b) providing a library containing a representative partition of the sample nucleic acid in accordance with any one of claims 1 to 35,
(c) identifying differences within corresponding sequences from said different sources contained within the library
37 A method as claimed in claim 36 wherein the two different nucleic sources are taken from different individuals.
38 A method as claimed in claim 36 wherein the markers are Single Nucleotide Polymorphisms.
39 A method as claimed in any one of claims 1 to 38 wherein the number of and type of the different restriction enzymes used in step (i) , and the sub-set of layers selected in step (iii) are selected in accordance with the output of program code run on a digital computer, which computer comprises a processor, a data storage system, at least one input device, and at least one output device, and which program code operates on the input of one or both of: (i) a reference sequence or restriction map from the sample nucleic acid,
(ii) a preference regarding partition size, and optionally preferred region of the sample to include in the partition.
40 A method as claimed in claim 39 wherein the program code includes a look up table including reference restriction site target sequences for different 4 and 6nt cutting restriction enzymes .
41 A method as claimed in claim 39 wherein the program code performs a function in accordance with a formula described in claim 32 or claim 33.
42 A system for selecting the number of and type of the different restriction enzymes used in step (i) , and the sub-set of layers selected in step (iii) of the method of any one of claims 1 to 38, which system comprises program code run on a digital computer, which computer comprises a processor, a data storage system, at least one input device, and at least one output device, and which program code operates on the input of one or both of: (i) a reference sequence or restriction map from the sample nucleic acid,
(ii) a preference regarding partition size, and optionally preferred region of the sample to include in the partition.
43 A system as claimed in claim 42 wherein the program code includes a look up table including reference restriction site target sequences for different 4 and 6nt cutting restriction enzymes .
44 A system as claimed in claim 43 wherein the program code performs a function in accordance with a formula described in claim 32 or claim 33.
45 A computer program for selecting the number of and type of the different restriction enzymes used in step (i) , and the sub-set of layers selected in step (iii) of the method of any one of claims
1 to 41, which computer program code operates on the input of one or both of: (i) a reference sequence or restriction map from the sample nucleic acid,
(ii) a preference regarding partition size, and optionally preferred region of the sample to include in the partition, and wherein the program code includes a look up table including reference restriction site target sequences for different
4 and 6nt cutting restriction enzymes, and wherein the program code performs a function in accordance with a formula described in claim 32 or claim 33.
46 A computer program as claimed in claim 45 which is stored on a storage media or device readable by a general or special purpose programmable computer.
47 A process for producing a chip for use in assaying a limited population of polymorphisms within a sample, which process comprises :
(i) providing a population of probe sequences, which probe sequences are derived from a representative partition of sample nucleic acid provided in accordance with any one of claims 1 to 39, and contain the population of polymorphisms,
(ii) incorporating the probe sequences into the chip.
48 A chip obtainable by the method of claim 47.
49 A method of genotyping a nucleic acid sample from an individual, which method comprises:
(i) providing the chip of claim 47 or claim 48,
(ii) isolating a representative partition of sample nucleic acid from the individual in accordance with the method used to provide the representative partition containing the population of polymorphisms contained in the probe sequences, (iii) contacting the chip with the sample and determining hybridization of the sample nucleic acid thereto.
PCT/GB2003/003866 2002-09-05 2003-09-05 Genome partitioning WO2004022758A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
DE60312875T DE60312875T2 (en) 2002-09-05 2003-09-05 GENOME DIVISION
AU2003260790A AU2003260790A1 (en) 2002-09-05 2003-09-05 Genome partitioning
US10/526,571 US20060281082A1 (en) 2002-09-05 2003-09-05 Genome partitioning
EP03793900A EP1546345B1 (en) 2002-09-05 2003-09-05 Genome partitioning
CA002496517A CA2496517A1 (en) 2002-09-05 2003-09-05 Genome partitioning

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB0220649A GB0220649D0 (en) 2002-09-05 2002-09-05 Genome partitioning
GB0220649.8 2002-09-05
GB0220773A GB0220773D0 (en) 2002-09-06 2002-09-06 Genome partitioning
GB0220773.6 2002-09-06

Publications (1)

Publication Number Publication Date
WO2004022758A1 true WO2004022758A1 (en) 2004-03-18

Family

ID=31980014

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2003/003866 WO2004022758A1 (en) 2002-09-05 2003-09-05 Genome partitioning

Country Status (7)

Country Link
US (1) US20060281082A1 (en)
EP (1) EP1546345B1 (en)
AT (1) ATE358182T1 (en)
AU (1) AU2003260790A1 (en)
CA (1) CA2496517A1 (en)
DE (1) DE60312875T2 (en)
WO (1) WO2004022758A1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006137733A1 (en) * 2005-06-23 2006-12-28 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
EP1907577A1 (en) * 2005-06-30 2008-04-09 Syngeta Participations AG METHODS FOR SCREENING FOR GENE SPECIFIC HYBRIDIZATION POLYMORPHISMS (GSHPs) AND THEIR USE IN GENETIC MAPPING AND MARKER DEVELOPMENT
EP2753715A4 (en) * 2011-09-09 2015-05-20 Univ Leland Stanford Junior Methods for obtaining a sequence
US9062348B1 (en) 2005-12-22 2015-06-23 Keygene N.V. Method for high-throughput AFLP-based polymorphism detection
US9951384B2 (en) 2012-01-13 2018-04-24 Data2Bio Genotyping by next-generation sequencing
US10023907B2 (en) 2006-04-04 2018-07-17 Keygene N.V. High throughput detection of molecular markers based on AFLP and high through-put sequencing
US10233494B2 (en) 2005-09-29 2019-03-19 Keygene N.V. High throughput screening of populations carrying naturally occurring mutations
US10316364B2 (en) 2005-09-29 2019-06-11 Keygene N.V. Method for identifying the source of an amplicon
WO2020109412A1 (en) 2018-11-28 2020-06-04 Keygene N.V. Targeted enrichment by endonuclease protection
WO2020169830A1 (en) 2019-02-21 2020-08-27 Keygene N.V. Genotyping of polyploids
WO2021116371A1 (en) 2019-12-12 2021-06-17 Keygene N.V. Semi-solid state nucleic acid manipulation
WO2021123062A1 (en) 2019-12-20 2021-06-24 Keygene N.V. Ngs library preparation using covalently closed nucleic acid molecule ends
WO2022074058A1 (en) 2020-10-06 2022-04-14 Keygene N.V. Targeted sequence addition
WO2022112316A1 (en) 2020-11-24 2022-06-02 Keygene N.V. Targeted enrichment using nanopore selective sequencing
WO2022112394A1 (en) 2020-11-25 2022-06-02 Koninklijke Nederlandse Akademie Van Wetenschappen Ribosomal profiling in single cells

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7666584B2 (en) * 2005-09-01 2010-02-23 Philadelphia Health & Education Coporation Identification of a pin specific gene and protein (PIN-1) useful as a diagnostic treatment for prostate cancer
WO2011053987A1 (en) * 2009-11-02 2011-05-05 Nugen Technologies, Inc. Compositions and methods for targeted nucleic acid sequence selection and amplification
GB2497838A (en) 2011-10-19 2013-06-26 Nugen Technologies Inc Compositions and methods for directional nucleic acid amplification and sequencing
CN105861487B (en) 2012-01-26 2020-05-05 纽亘技术公司 Compositions and methods for targeted nucleic acid sequence enrichment and efficient library generation
CN104619894B (en) 2012-06-18 2017-06-06 纽亘技术公司 For the composition and method of the Solid phase of unexpected nucleotide sequence
US20150011396A1 (en) 2012-07-09 2015-01-08 Benjamin G. Schroeder Methods for creating directional bisulfite-converted nucleic acid libraries for next generation sequencing
EP2971130A4 (en) 2013-03-15 2016-10-05 Nugen Technologies Inc Sequential sequencing
WO2015073711A1 (en) 2013-11-13 2015-05-21 Nugen Technologies, Inc. Compositions and methods for identification of a duplicate sequencing read
WO2015131107A1 (en) 2014-02-28 2015-09-03 Nugen Technologies, Inc. Reduced representation bisulfite sequencing with diversity adaptors
US11099202B2 (en) 2017-10-20 2021-08-24 Tecan Genomics, Inc. Reagent delivery system
CN116286991B (en) * 2023-02-10 2023-10-13 中国农业科学院农业基因组研究所 Whole genome enhancer screening system, screening method and application

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1995000530A1 (en) * 1993-06-25 1995-01-05 Affymax Technologies N.V. Hybridization and sequencing of nucleic acids
WO1995011995A1 (en) * 1993-10-26 1995-05-04 Affymax Technologies N.V. Arrays of nucleic acid probes on biological chips
WO1998051789A2 (en) * 1997-05-13 1998-11-19 Display Systems Biotech A/S A METHOD TO CLONE mRNAs AND DISPLAY OF DIFFERENTIALLY EXPRESSED TRANSCRIPTS (DODET)
EP1001037A2 (en) * 1998-09-28 2000-05-17 Whitehead Institute For Biomedical Research Pre-selection and isolation of single nucleotide polymorphisms
WO2001000816A1 (en) * 1999-03-18 2001-01-04 Complete Genomics As Methods of cloning and producing fragment chains with readable information content

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020090612A1 (en) * 1999-01-08 2002-07-11 Jonathan M. Rothberg Method of identifying nucleic acids

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1995000530A1 (en) * 1993-06-25 1995-01-05 Affymax Technologies N.V. Hybridization and sequencing of nucleic acids
WO1995011995A1 (en) * 1993-10-26 1995-05-04 Affymax Technologies N.V. Arrays of nucleic acid probes on biological chips
WO1998051789A2 (en) * 1997-05-13 1998-11-19 Display Systems Biotech A/S A METHOD TO CLONE mRNAs AND DISPLAY OF DIFFERENTIALLY EXPRESSED TRANSCRIPTS (DODET)
EP1001037A2 (en) * 1998-09-28 2000-05-17 Whitehead Institute For Biomedical Research Pre-selection and isolation of single nucleotide polymorphisms
WO2001000816A1 (en) * 1999-03-18 2001-01-04 Complete Genomics As Methods of cloning and producing fragment chains with readable information content

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ALTSHULER D ET AL: "An SNP map of the human genome generated by reduced representation shotgun sequencing", NATURE, MACMILLAN JOURNALS LTD. LONDON, GB, vol. 407, 28 September 2000 (2000-09-28), pages 513 - 516, XP002955779, ISSN: 0028-0836 *

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006137733A1 (en) * 2005-06-23 2006-12-28 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US9453256B2 (en) 2005-06-23 2016-09-27 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US10235494B2 (en) 2005-06-23 2019-03-19 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
CN105039313B (en) * 2005-06-23 2018-10-23 科因股份有限公司 For the high throughput identification of polymorphism and the strategy of detection
US8685889B2 (en) 2005-06-23 2014-04-01 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US8785353B2 (en) 2005-06-23 2014-07-22 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US9023768B2 (en) 2005-06-23 2015-05-05 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US10978175B2 (en) 2005-06-23 2021-04-13 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US9896721B2 (en) 2005-06-23 2018-02-20 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US9898576B2 (en) 2005-06-23 2018-02-20 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
CN101641449B (en) * 2005-06-23 2014-01-29 科因股份有限公司 Strategies for high throughput identification and detection of polymorphisms
US9376716B2 (en) 2005-06-23 2016-06-28 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US9447459B2 (en) 2005-06-23 2016-09-20 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US9898577B2 (en) 2005-06-23 2018-02-20 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US9493820B2 (en) 2005-06-23 2016-11-15 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
US10095832B2 (en) 2005-06-23 2018-10-09 Keygene N.V. Strategies for high throughput identification and detection of polymorphisms
EP1907577A1 (en) * 2005-06-30 2008-04-09 Syngeta Participations AG METHODS FOR SCREENING FOR GENE SPECIFIC HYBRIDIZATION POLYMORPHISMS (GSHPs) AND THEIR USE IN GENETIC MAPPING AND MARKER DEVELOPMENT
EP1907577A4 (en) * 2005-06-30 2009-05-13 Syngenta Participations Ag METHODS FOR SCREENING FOR GENE SPECIFIC HYBRIDIZATION POLYMORPHISMS (GSHPs) AND THEIR USE IN GENETIC MAPPING AND MARKER DEVELOPMENT
US10538806B2 (en) 2005-09-29 2020-01-21 Keygene N.V. High throughput screening of populations carrying naturally occurring mutations
US10233494B2 (en) 2005-09-29 2019-03-19 Keygene N.V. High throughput screening of populations carrying naturally occurring mutations
US10316364B2 (en) 2005-09-29 2019-06-11 Keygene N.V. Method for identifying the source of an amplicon
US11649494B2 (en) 2005-09-29 2023-05-16 Keygene N.V. High throughput screening of populations carrying naturally occurring mutations
US9334536B2 (en) 2005-12-22 2016-05-10 Keygene N.V. Method for high-throughput AFLP-based polymorphism detection
US10106850B2 (en) 2005-12-22 2018-10-23 Keygene N.V. Method for high-throughput AFLP-based polymorphism detection
US9062348B1 (en) 2005-12-22 2015-06-23 Keygene N.V. Method for high-throughput AFLP-based polymorphism detection
US11008615B2 (en) 2005-12-22 2021-05-18 Keygene N.V. Method for high-throughput AFLP-based polymorphism detection
US10023907B2 (en) 2006-04-04 2018-07-17 Keygene N.V. High throughput detection of molecular markers based on AFLP and high through-put sequencing
EP2753715A4 (en) * 2011-09-09 2015-05-20 Univ Leland Stanford Junior Methods for obtaining a sequence
US9725765B2 (en) 2011-09-09 2017-08-08 The Board Of Trustees Of The Leland Stanford Junior University Methods for obtaining a sequence
US9249460B2 (en) 2011-09-09 2016-02-02 The Board Of Trustees Of The Leland Stanford Junior University Methods for obtaining a sequence
US10704091B2 (en) 2012-01-13 2020-07-07 Data2Bio Genotyping by next-generation sequencing
US9951384B2 (en) 2012-01-13 2018-04-24 Data2Bio Genotyping by next-generation sequencing
WO2020109412A1 (en) 2018-11-28 2020-06-04 Keygene N.V. Targeted enrichment by endonuclease protection
WO2020169830A1 (en) 2019-02-21 2020-08-27 Keygene N.V. Genotyping of polyploids
WO2021116371A1 (en) 2019-12-12 2021-06-17 Keygene N.V. Semi-solid state nucleic acid manipulation
WO2021123062A1 (en) 2019-12-20 2021-06-24 Keygene N.V. Ngs library preparation using covalently closed nucleic acid molecule ends
WO2022074058A1 (en) 2020-10-06 2022-04-14 Keygene N.V. Targeted sequence addition
WO2022112316A1 (en) 2020-11-24 2022-06-02 Keygene N.V. Targeted enrichment using nanopore selective sequencing
WO2022112394A1 (en) 2020-11-25 2022-06-02 Koninklijke Nederlandse Akademie Van Wetenschappen Ribosomal profiling in single cells

Also Published As

Publication number Publication date
AU2003260790A1 (en) 2004-03-29
EP1546345B1 (en) 2007-03-28
ATE358182T1 (en) 2007-04-15
EP1546345A1 (en) 2005-06-29
US20060281082A1 (en) 2006-12-14
DE60312875D1 (en) 2007-05-10
CA2496517A1 (en) 2004-03-18
DE60312875T2 (en) 2007-12-06

Similar Documents

Publication Publication Date Title
EP1546345B1 (en) Genome partitioning
Grover et al. Development and use of molecular markers: past and present
Powell et al. Polymorphism revealed by simple sequence repeats
JP4588976B2 (en) Polynucleotides, products, and uses thereof as tags and tag complements
AU2006295556B2 (en) High throughput screening of mutagenized populations
EP1747285B1 (en) Method for amplifying specific nucleic acids in parallel
CN113166797A (en) Nuclease-based RNA depletion
JP5801349B2 (en) Method for identifying the clonal source of restriction fragments
JP2004522440A5 (en)
TW201321518A (en) Method of micro-scale nucleic acid library construction and application thereof
Cho et al. Sensitive detection of pre-integration intermediates of long terminal repeat retrotransposons in crop plants
Rustenholz et al. Specific patterns of gene space organisation revealed in wheat by using the combination of barley and wheat genomic resources
Liu et al. DLA-based strategies for cloning insertion mutants: cloning the gl4 locus of maize using Mu transposon tagged alleles
US20060228714A1 (en) Nucleic acid representations utilizing type IIB restriction endonuclease cleavage products
Yang et al. A genome-phenome association study in native microbiomes identifies a mechanism for cytosine modification in DNA and RNA
CN110218811B (en) Method for screening rice mutant
Jordan et al. Transcript profiling of the hypomethylated hog1 mutant of Arabidopsis
Shoemaker et al. Soybean genomics
Singh et al. Next-generation sequencing technologies: approaches and applications for crop improvement
WO2008015975A1 (en) Method for amplification of dna fragment
GB2621392A (en) Methods and uses
CN110144387A (en) A kind of multiple PCR method
WO2023086818A1 (en) Target enrichment and quantification utilizing isothermally linear-amplified probes
CN110144383A (en) Utilize the method for multiplex PCR enrichment target DNA fragments
Liu et al. Genbank accession nos: FJ829343-FJ829363

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
ENP Entry into the national phase

Ref document number: 2496517

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2003793900

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2003260790

Country of ref document: AU

WWP Wipo information: published in national office

Ref document number: 2003793900

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2006281082

Country of ref document: US

Ref document number: 10526571

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP

WWP Wipo information: published in national office

Ref document number: 10526571

Country of ref document: US

WWG Wipo information: grant in national office

Ref document number: 2003793900

Country of ref document: EP