CA2570068A1 - Methods for preparation of a library of submegabase resolution tiling pools and uses thereof - Google Patents

Methods for preparation of a library of submegabase resolution tiling pools and uses thereof Download PDF

Info

Publication number
CA2570068A1
CA2570068A1 CA002570068A CA2570068A CA2570068A1 CA 2570068 A1 CA2570068 A1 CA 2570068A1 CA 002570068 A CA002570068 A CA 002570068A CA 2570068 A CA2570068 A CA 2570068A CA 2570068 A1 CA2570068 A1 CA 2570068A1
Authority
CA
Canada
Prior art keywords
genomic
clones
smrt
clone
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002570068A
Other languages
French (fr)
Inventor
Wan Lam
Calum Macaulay
Spencer Watson
Adrian Ishkanian
Martin Krzywinski
Marco Marra
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
British Columbia Cancer Agency BCCA
Original Assignee
Bc Cancer Agency
Wan Lam
Calum Macaulay
Spencer Watson
Adrian Ishkanian
Martin Krzywinski
Marco Marra
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bc Cancer Agency, Wan Lam, Calum Macaulay, Spencer Watson, Adrian Ishkanian, Martin Krzywinski, Marco Marra filed Critical Bc Cancer Agency
Publication of CA2570068A1 publication Critical patent/CA2570068A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries

Abstract

A method of preparing a library of replenishable synthetic nucleic acid fragment pools is provided. Each of the pools represents one clone of a tiling set of genomic clones that provide substantially overlapping coverage of an entire genome, or a portion thereof. The fragment pools generated by this method can be replenished at any time without the need to re-isolate the original genomic clones that were used to create the tiling set. The pools can be used, for example, as a source of probes for applications such as nucleic acid hybridization or FISH. The libraries can be used, for example, to prepare nucleic acid arrays that can span an entire genome, or a portion thereof.

Description

DEMANDES OU BREVETS VOLUMINEUX
LA PRESENTE PARTIE I)E CETTE DEMANDE OU CE BREVETS
COMPRI~:ND PLUS D'UN TOME.
CECI EST ~.E TOME 1 DE 2 NOTE: Pour les tomes additionels, veillez contacter 1e Bureau Canadien des Brevets.
JUMBO APPLICATIONS / PATENTS
THIS SECTION OF THE APPLICATION / PATENT CONTAINS MORE
THAN ONE VOLUME.

NOTE: For additional vohxmes please contact the Canadian Patent Oi~ice.

METHODS FOR PREPARATION OF A LIBRARY OF
SUBMEGABASE RESOLUTION TILING POOLS AND USES
THEREOF
FIELD OF THE INVENTION
The present invention pertains to the field of genome analysis, and particularly to array-based genome analysis and diagnostics.
BACKGROUND
Nucleic acid arrays provide a tool to analyze information about large numbers of genes simultaneously. The majority of commercially available nucleic acid arrays are cDNA arrays. One type of cDNA array comprises short synthetic oligonucleotides which are either synthesized in silico (e.g. GeneChip~ Human Genome U133 from Affymetrix Inc.), or which are synthesized in vitro and subsequently deposited onto a substrate in a grid-like or arrayed configuration (e.g. MWG-Biotech's Pan~
Human array). The utility of oligonucleotide-based arrays is limited by the length of the DNA
oligonucleotides (which affects hybridization characteristics), by the distance between the markers selected for the array, and by the coverage of the array, which is limited to certain parts of the genome sequence. A second type of cDNA array comprises cDNA
generated by reverse transcription of RNA, followed by its ordered deposition onto the substrate. This type of cDNA array, however, represents only the transcriptional state of a select tissue type under a narrow range of physiological conditions.
A cDNA array that contains over 5000 cDNAs that averaged approximately 1 kB in size and its use to map gene amplifications and deletions in breast cancer has been described (Pollack et al. (1999) Nature Genetics 23:41-46).
As cDNA is derived from transcribed regions of the genome only, a cDNA array cannot comprise the entire gene set of a given genome. Furthermore, the content of a cDNA array excludes transcriptional regulatory sequences, introns, intergenic sequences and telomeres. Thus, the majority of commercially available arrays are used for gene expression profiling with very few being designed for detecting genetic alterations such as gene amplifications and deletions at any locus within the genome.
Such genetic alterations are involved in some forms of cancer and other genetic diseases, and in development and differentiation.
Various genome sequencing projects have elucidated the entire genomic sequence for a number of species, including human. In many cases, libraries comprising the full genome are available. Such libraries comprise a series of large fragments (up to 300 kilobases in length) cloned into a plurality of vectors and maintained as viable cell stocks. The human genome library, for example, comprises 400,000 bacterial artificial chromosome (BAC) clones. As the entire human genome can be represented in as few as 10,000 to 20,000 BAC clones, the 400,000 BAC clone library represents the human genome many times over.
The availability of the sequence data and libraries generated by the human and other genome sequencing projects has enabled the preparation of genomic arrays comprising genomic DNA, which can help to eliminate the requirement of handling large quantities of individual clones.
Generation of such genomic arrays, however, is not straightforward for a number of reasons. First, because the genomic clones are randomly generated and individually sequenced, the clones of any one library axe not ordered sequentially along the genomic sequence, and many of the clones are redundant within the library.
Second, the number of clones required to cover the genome is so large that culturing all of the clones separately is logistically prohibitive. Furthermore, multiple, commercial scale array constructions would require reculturing of the host cells harboring BAC
plasmids in order to replenish supplies. The use of purified DNA preparations may also introduce contamination of the host bacterial DNA into the array as it is not readily separable from the cloned DNA.
A few genomic arrays are available commercially, for example, the Affymetrix Mapping l OK Array provides a representation of the human genome with an average intermarker distance of 210 kB. The Vysis GenosensorTM Array 300 is a genomic array containing 287 probes that include telomeres, micro-deletions, oncogenes and tumor suppressor genes. With less than 300 probes representing the 3 billion base pairs of the human genome, the Vysis GenosensorTM Array 300 offers an average Megabase resolution. Spectral Genomics produces a series of BAC arrays for array based genome profiling. Spectral Genomics genomic arrays are produced at 1-4 Megabase resolution. Human BAC array 1400 (I~TH3-1400) comprises 1400 non-overlapping BAC clones from the RPCI BAC library spotted in duplicate, with a resolution in the 2-4 Megabase range representing 5% or less coverage of the genome.
One approach used to try to circumvent the problems encountered in the preparation of genomic arrays using genomic DNA clones is the conversion of such genomic clones to synthetic fragments using PCR based protocols. Two such approaches have been described. Fiegler et al. (Genes, Chromosomes & Ca~cer~ 36:361-374, 2003) describe a method of amplifying genomic inserts from BAC and PAC genomic clones using degenerate oligonucleotide primers (DOP). Arrays have been described that were generated using this approach, however, these arrays do not perform as well in assays designed to measure changes in DNA copy number, deletions, and amplifications, as arrays generated from genomic BAC clones. A second approach has been described in International Patent Application PCT/US98/23168 (WO
99/23256) that involves preparing representations of genomic DNA that can be used to prepare genomic DNA arrays. The method involves the steps of digesting the genomic DNA, ligating linkers to the ends of the resulting genomic DNA fragments and amplifying the fragments using primers that are complementary to the linkers in a ligation-mediated PCR reaction. The procedure described requires multiple liquid transfer and purification steps using organic extraction. U.S. Patent Application No.

describes a similar method for generating genomic DNA suitable for spotting onto arrays, but with a reduced number of purification steps. The requirement for liquid transfer and purification steps in both these methods can result in the introduction of errors and may necessitate extensive sequence verification and/or clone identification before arrays can be prepared.
Genomic arrays prepared to date include whole genome arrays as well as chromosome specific arrays. Snijders et al. (Nature Genetics 29:263-264, 2001) describe a whole genome array prepared according to the method described in U.S. Patent Application No. 20030087231 comprising representations of 2460 human genomic DNA BAC
and P 1 clones, which is reported to cover the entire genome at a resolution of 1.4 Mb.
As many gene alterations, however, occur at the submegabase level of resolution, S however, the level of resolution provided by this array would not allow the identification of such gene alterations. A whole genome array has also been constructed using the DOP-PCR based method described by Fiegler et al. (ibic~.
This array was generated using a set of approximately 3500 genomic DNA BAC and PAC
clones selected to cover the entire genome at a resolution of 1 Mb. As the resolution of this array is not in the submegabase range, it will still be8unable to detect gene alterations that occur at the submegabase level of resolution. A high-resolution chromosome-specific array has also been described (Buckley et al. Human Molecular Genetics (2002) 11:3221-3229) that is reported to provide an average resolution of 7S
kb, but covers only 34.7 Mb of the long arm of chromosome 22.
1 S This background information is provided for the purpose of making known information believed by the applicant to be of possible relevance to the present invention. No admission is necessarily intended, nor should be constnted, that any of the preceding information constitutes prior art against the present invention.
SUMMARY OF THE TNVENTION
An object of the present invention is to provide methods for the preparation of a library of submegabase resolution tiling pools and uses thereof. In accordance with an aspect of the present invention, there is provided a method of preparing a submegabase resolution library comprising a collection of synthetic nucleic acid fragment pools, each of said synthetic nucleic acid fragment pools corresponding to a 2S genomic clone, said method comprising the steps of: selecting a set of genomic clones from at least one library of genomic clones, each of said clones comprising a genomic insert, wherein between about 17 by and about 1,500 by of the sequence of the genomic insert of at least 9S% of the clones in the set overlaps with the sequence of the genomic insert of an adjacent genomic clone; and preparing a synthetic nucleic acid fragment pool from each genomic clone in the set by fragmenting the genomic clone to produce nucleic acid fragments; and ampiirying said fragments to generate a synthetic nucleic acid pool.
In accordance with another aspect of the present invention, there is provided a submegabase resolution library comprising a collection of synthetic nucleic acid fragment pools, wherein said library is prepared by a method of the invention.
In accordance with another aspect, there is provided an array comprising one or more submegabase resolution library of the invention.
In accordance with another aspect of the present invention, there is provided a method of preparing a submegabase resolution tiling set of genomic clones representing at least a portion of a genome, said method comprising selecting a set of genomic clones from at least one library of genomic clones representing said genome, each of said clones containing a genomic insert, wherein between about 17 by and about 1,500 by of the sequence of the genomic insert of at least 95% of the clones in the set overlaps with the sequence of the genomic insert of an adjacent genomic clone.
In accordance with another aspect of the present invention, there is provided a method of preparing a synthetic nucleic acid fragment pool from a genomic clone comprising:
(a) preparing genomic clone DNA; (b) fragmenting genomic clone DNA to produce DNA fragments; and (c) amplifying said DNA fragments to generate a SMRT pool, wherein step b) or step c) comprises one or more dilution-processing steps.
In accordance with another aspect of the present invention, there is provided a high throughput method for determining the identity of a genomic clone having a genomic insert, said method comprising the steps of: preparing a solution comprising at least 20 fmol of said genomic clone, a primer labelled with a detectable label and amplification reagents; submitting said solution to between 65 and 100 cycles of thermal amplification to provide an amplified solution; submitting said amplified solution to sequence analysis to determine a sequence of at least 17 base pairs in length of said genomic insert; and comparing said sequence to a reference database in order to determine the identity of said genomic clone.
In accordance with another aspect of the present invention, there is provided an array providing a representation of a tiling set of genomic clones, said array comprising a plurality of pools of synthetic nucleic acid fragments deposited on one or more solid support, wherein each pool is derived from one of said genomic clones and is present at one or more distinct locations on said one or more solid support, and wherein between 17 by and 1,500 by of the sequence of the genomic insert of at least 95% of the clones in said tiling set o v erlaps with the sequence of the genomic insert of an adj acent genomic clone.
In accordance with another aspect of the present invention, there is provided a method of preparing an array comprising the steps of: selecting a set of genomic clones from at least one library of genomic clones, each of said clones containing a genomic insert, wherein between 17 by and 1,500 by of the sequence of the genomic insert of at least 95% of the clones in the set overlaps with the sequence of the genomic insert of an adjacent genomic clone; preparing a synthetic nucleic acid fragment pool from each genomic clone in the set by fragmenting the genomic clone to produce nucleic acid fragments; and amplifying said fragments to generate a synthetic nucleic acid pool, and depositing each of said synthetic nucleic acid pools onto a solid support at one or more distinct locations.
In accordance with another aspect, there is provided a use of one or more submegabase resolution library of the invention to prepare an array.
In accordance with another aspect, there is provided a use of an submegabase resolution library of the present invention to prepare one or more probes.
In accordance with another aspect, there is provided a use of an array of the invention for comparative genome hybridization analysis.
In accordance with another aspect, there is provided a use of an array according of the invention for the diagnosis of disease, determination of predisposition to disease, determination of resistance to treatment, or to enable the selection of a treatment regime.
In accordance with another aspect, there is provided a use of an array of the invention for the analysis of gene expression.
In accordance with another aspect, there is provided a use of an array of the invention for the identification of novel genes.
BRIEF DESCRIPTION OF THE FIGURES
Figure 1 presents a flow diagram of preparation and identification of SubMegabase Resolution Tiling (SMRT) pools in one embodiment of the invention. (A) Multistep process for the conversion of BAC DNA to SMRT pools. (B) Target fragments for specific primer extension for SMRT pool analysis.
Figure 2 depicts the coverage of the sequence assembly provided by the clones in a SMRT set in one embodiment of the invention. For each chromosome, the coverage of adjacent 700 kb regions is plotted according to the legend at the top of the figure.
Regions in the assembly without sequence information appear as black axeas.
Distance scale is in Mb.
Figure 3 depicts the coverage resolution provided by the clones in a SMRT set in one embodiment of the invention. For each chromosome, the average clone cover size is coded by colour. Regions in the assembly without sequence information appear as black areas. Distance scale is in Mb.
Figure 4 depicts three SMRT pool sequence products. (A) Sequence read of a SMRT
pool derived from BAC RP11-124P12 with an MseI restriction site 260 by downstream of the T7 primer. (B) Sequence read of a SMRT pool derived from BAC
RP11-125E6 with an MseI restriction site 127 by downstream of the T7 primer.
(C) Sequence read of a SMRT pool derived from BAC RP11-124P22 with an MseI
restriction site 17 by downstream of the T7 primer.
Figure 5 demonstrates the probability of identifying a 96 well plate. In a 96 well format the number of SMRT pools sequenced increase the probability of identifying the plate. Solid squares denote SP6 primer sequencing. Solid diamonds denote primer sequencing. Solid triangles denote sequencing of the SMRT pools with both Sp6 and T7 primers. 95% confidence intervals are represented by vertical bars on all data points.
Figure 6 depicts identification of a SMRT pool by Southern Analysis. 200 ng 156K13 HihdIII digest (lane 1). 200 ng RP11-104F14 Hi~dIII digest (lane 2).
(A) In silico fingerprint of RPl 1-156K13 and RP11-104F14. (E) Southern transfer hybridized with radiolabeled SMRT pool from BAC clone RP11156K13 without Cot-1 DNA blocking. (C) Southern transfer hybridized with radiolabeled SMRT pool from BAC clone RP11156K13 with 50 ~,g Cotl DNA blocking.
Figure 7 depicts identification of a SMRT pool by FISH analysis. Red represents a random primed SMRT pool probe generated from clone RP 11328P22 (locus:
AL353195) labeled with Cy3-dCTP. Chromosomes background stained with DAPI.
Figure 8 depicts (A) Whole genome profile of a TAT-1 lymphoma cell line versus normal male DNA. Vertical lines represent a loge ratio of 0.5 and loge ratio of -0.5, as labelled. Each dot represents one unique LMPCR amplified BAC on the whole genome array. (B). Chromosome view of 8q showing MYC amplification between BAC clones RP11-143H8 and RP11-263C20. (C) Chromosome view of 18q showing BCL2 amplification between BAC clones RP11-159K14 and RP11-565D23. Vertical lines are scale bars indicating log2 ratios of +0.5 and -0.5, respectively.
Figure 9 presents the results of a SMRT array comparative genome hybridization experiment using lung cancer cell line H526. (A) Whole-genome view of H526 versus reference male DNA. (B) Amplified view of deletion breakpoint at 3p21.1 between BAC clones RP11-63205 and RP11-594F16, also seen in A. Vertical lines are scale bars indicating loge ratios of +0.5 and -0.5, respectively. (C) FISH
confirmation of breakpoint in B showing single-copy loss of BAC clone RP11-594F16 (green) and normal copy number of BAC clone RP11-632005 (red).
Figure 10 presents the results of amplification of chromosome 8q24.12-13 in colorectal cancer cell line COL0320. This 1.9-Mb amplification containing MYC
is bounded by BAC clones RP11-810D23 and RP11-294P7. Vertical lines are scale bars indicating log2 ratios of +0.5 and -0.5, respectively.

Figure 11 presents a detailed analysis of microamplifications on chromosome arms 13q, 15q, 16p, and 22q in COL0320 cells.
Figure 12 depicts the identification of a new microamplification by SMRT array comparative genome hybridization in the cancer cell line COLO320. (A) 300-kb microamplification on chromosome 13q12.2 containing genes GSHl, CDX2 and IPFI
and bounded by BAC clones RP11153M24 and RP11-152N3. Vertical lines are scale bars indicating loge ratios of +0.5 and -0.5, respectively. (B) High copy-number amplification of RPl 1-153M24 detected by FISH hybridization. Amplification was located in a homogeneously staining region.
Figure 13 depicts the identification of microdeletions. (A) Identification of a 1.25-Mb deletion at 9p21.3 in a mantle cell lymphoma cell line containing CDKN2A
bounded by BAC clones RP11-328C2 and RP11-275H17. (B) 240-kb deletion at 7q22.3 in breast cancer cell line BT474 containing PRKAR2B and HBPI bounded by BAC
clones RP11-258L19 and RP11-262616. Vertical lines are scale bars indicating log2 ratios of +0.5 and -0.5, respectively.
Figure 14 depicts-the results of a SMRT array comparative genome hybridization profile of HCC 15 showing a two-fold copy number deletion at 6q24.3-ter.
Figure 15 depicts the results of a SMRT array comparative genome hybridization analysis of reference male versus reference female hybridization Figure 16 depicts the results of a SMRT array comparative genome hybridization amalysis of oral tumors at 8q21-24. (A)-(D) CGH plots of 4 oral tumors. Shaded areas highlight regions of amplification. (A) Tumor 211 T shows no copy number change within 8q21-24. (B) Tumor 199T shows amplification of the entire tiling set (8q21-24). (D)- (D) Tumor 528T and 24T show multiple amplifications at 8q22, in addition to that at 8q24.
Figure 17 depicts the identification of a minimal region of alteration at 8q22. (A) Tiling set of BAC clones, selected from the human RPCI-11 BAC library, spanning part of 8q22. Black boxes indicate clones present on the array. (B) Regions of segmental copy increase observed in four samples (566T, 574T, 166T, 573T) aligned with the BAC tiling set. The 5.3 Mb MRA and the three genes subjected to expression analysis are indicated. (C) BAC array CGH profile of sample 574T. Data are displayed as a normalized signal ratio between tumor and reference DNA for each BAC
clone.
Each data point represents the average of three replicate spots on the array and includes the standard deviation. Shading indicates large regions with copy number increase, which contains LRP12 at 8q22 and MYC at 8q24.
Figure 18 depicts the results of a SMRT array comparative genome hybridization analysis of lung CIS samples at 8q21-24.
Figure 19 presents a map of chromosome region 6q16-q22.1 showing known deletions. Relative locations of genes, markers, and BAC, PAC, and YAC clones suitable for use as FISH probes axe shown. Shading indicates minimal region of deletion.
Figure 20 depicts the results of using a SMRT pool product from BAG clone (Green) and 619019 (Red) in standard FISH hybridization protocol.
Figure 21 depicts the detection of an imprinted gene in a 150 kb region on 18q21.1.
The boxed region shows where Elongi~t A3, a reported imprinted gene, is hypermethylated in a lymphobast cell line derived from a normal individual.
Vertical lines denote loge signal ratios from -1 to 1 with hypermethylation to the right and hypomethylation to the left of zero. Each black line segment represents a single BAC
clone.
Figure 22 presents a schematic for the analysis of methylation in genomic DNA.
DETAILED DESCRIPTION OF THE INVENTION
The present invention relates to a method of preparing a library of replenishable synthetic nucleic acid fragment pools from a tiling set of genomic clones.
Each pool represents one clone derived from a tiling set of genomic clones, wherein the tiling set is distinguishable from other tiling sets commonly used in the art in terms of its representation of a genome.

The method comprises two steps: preparing a tiling set of genomic clones and preparing a library from the tiling set. A tiling set is prepared by selecting appropriate genomic clones from one or more genome-ordered libraries to optimize size, map coverage and overlap of the genomic inserts. A library of synthetic fragment pools is subsequently generated from the tiling set of clones by high throughput amplification techniques.
In contrast to other methods known in the art, the method of the present invention allows for the amplification of DNA in a tiling set containing more than 4,000 clones by employing a high throughput automatable procedure comprising one or more dilution-processing steps. By eliminating certain pacification steps from the amplification procedure, this invention provides the capability of processing large clone numbers, which therefore enables the inclusion of a large number of clones in the tiling set.
The larger the number of clones with extensive overlap within a tiling set, the greater the precision attainable for mapping a target sequence to a location in a genome. The larger the degree of overlap, the more clones that are required in constructing a tiling set. Without the dilution-based amplification steps (minus the traditional purification steps) one would not contemplate generating a tiling set with large clone numbers, as there are no other known methods for amplifying such a large number of clones in a reasonable period of time. Thus, the dilution-based amplification step enables the design of a tiling set comprising a large number of clones with a high degree of overlap. In its greatest extent, an entire genome can be represented by the tiling set with a high degree of overlap resulting in a resolution of less than lMb.
Moreover, the fragment pools generated by this method can be replenished at any time without the need to re-isolate the original genomic clones that were used to create the tiling set. The pools can be used, for example, as a source of probes for applications such as nucleic acid hybridization or FISH. The library can be used, for example, to prepare a nucleic acid array that can span an entire genome, or a portion of a genome, with a submegabase resolution.

Defihiti~hs Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The term "tiling set," as used herein, refers to a finite collection of cloned DNAs, wherein each cloned DNA comprises a fragment of the genomic DNA of an organism.
The collection of cloned DNAs can comprise all or a part of the genomic sequence of the organism. The individual cloned DNAs are ordered sequentially relative to the genome, so that the genomic insert start point of each cloned DNA follows the start point of the preceding cloned DNA and precedes the start point of the following cloned DNA. Typically, the clones are in the form of a cloning vector containing the fragment of genomic DNA as an insert.
The term "SubMegabase Resolution Tiling pool" (SMRT pool), as used herein, refers to a pool of synthetic nucleic acid molecules generated by fragmentation of a clone of a tiling set and amplification of the resultant fragments. A SMRT pool typically comprise vector DNA fragments and genomic insert DNA fragments.
The term "SubMegabase Resolution Tiling library" (SMRT library), as used herein, refers to a library comprising a plurality of SMRT pools derived from a tiling set, or the constituent synthetic fragments of the SMRT pools. A SMRT library of the invention, therefore, can represent an entire genome or a portion of a genome.
As used herein, a "sub-library" refers to a collection of SMRT pools selected from a SMRT library, wherein the number of SMRT pools in the collection is fewer than the number of SMRT pools in the SMRT library.
The term "SubMegabase Resolution Tiling array" (SMRT array), as used herein, refers to an ordered array comprising the SMRT pools of a SMRT library, or a sub-library, or a plurality of SMRT libraries, or the constituent synthetic nucleic acid molecules of an SMRT pools deposited onto a solid support substrate.
The term "resolution" as used herein with reference to a SMRT set is defined in terms of "clone covers." In accordance with the present invention, a SMRT set representing a selected genomic sequence comprises a set of genomic clones, the inserts of which have overlapping sequences. The ends of each region of overlap between two clones define the boundaries of a "clone cover." The length of the clone cover will vary according to the extent of the overlap between the clones. The average size of all the clone covers in a SMRT set can be calculated and this value used to define the resolution of the SMRT set. The effective resolution of a SMRT set is calculated by the weighted average of the clone cover size wherein the weights are given by the fraction of the selected genomic sequence represented in clone covers of a given size.
The smaller the weighted average size of the clone cover, the higher the resolution of the SMRT set.
The term "sequence coverage," as used herein, means the percentage of a genome, or selected portion of a genome, that is represented by a SMRT tiling set.
The terms "depth of coverage" or "coverage depth," as used herein, refer to the number of times a given genomic segment is represented in a SMRT set.
The term "array-based comparative genome hybridization" ("array-based CGH"), as used herein, refers to a method of identifying genomic alterations such as gain, loss or rearrangement of chromosomal regions in a genomic test sample through competitive hybridization of differentially-labeled test and reference genomic DNA to an array of probes. The ratio of labeling intensities at each probe array point indicates the copy number of the DNA in the test sample relative to the corresponding copy number in the reference.
The term "probe," as used herein, refers to one or more nucleic acid molecules of known sequence used in hybridization studies to interrogate a target nucleic acid sequence. With particular regard to this invention, a SMRT pool of nucleic acid sequences constitutes a probe which may be used to determine whether a sequence is present in a test sample that maps to a specific location on the parent genome. A
probe may be labeled, for example, when used in fluorescent in situ hybridisation (FISH), or may be unlabeled, for example, when incorporated into an array.
SUBMEGABASE RESOLUTION TILING (SMRT) LIBRARIES

A library of the present invention is a collection of synthetic nucleic acid pools, wherein each pool represents one clone from a tiling set of genomic clones. As indicated above, the clones in the tiling set are selected from one or more genome-ordered libraries to optimize size, map coverage and overlap of the inserts.
When the tiling set confers submegabase resolution, it is referred to herein as a SubMegabase Resolution Tiling (SMRT) set. Accordingly, a synthetic nucleic acid fragment pool derived from each clone in the SMRT set is referred to as a SMRT pool, and a library comprising the SMRT pools is referred to as a SMRT library.
In accordance with the present invention, a library is prepared by: (i) preparing a tiling set of genomic clones, (ii) preparing a pool of synthetic nucleic acid fragments from each clone in the tiling set by fragmentation of the clone and amplification of the fragments.
While the methods described herein refer to a tiling set that represents a genome, or portion of a genome, with submegabase resolution, it will be readily apparent to one skilled in the art that the methods are equally applicable to a tiling set providing a lower resolution representation of a genome, or a portion of a genome.
1.0 Preparing a SubMegabase Resolution Tiling (SMRT) Set Preparing a SMRT tiling set entails selecting overlapping genomic clones from at least one library of clones to form a set that covers a whole genome, or a portion of a genome, at high resolution. In one embodiment of the invention, the resolution is <1 Mb. The portion of the genome can be one or more chromosomes, or one or more regions of a genome that are relevant to the study of a disease, for example cancer.
Thus, a SMRT set can be selected to span a specific region of interest in a genome, one or more chromosomes, or an entire genome. In one embodiment of the present invention, the SMRT set spans a human genome sequence that is minimally 35 Mb in length. In another embodiment, the SMRT set spans two or more chromosomes. In a further embodiment the SMRT set covers orie or more chromosomes. In another embodiment, the SMRT set covers a region of interest, for example, a region known to be important in the diagnosis of disease.

The SMRT set can also be selected to cover a minimal percentage of a selected genome, which may represent a fragment of the genome or substantially all of the genome. Thus, in one embodiment of the invention, the SMRT set minimally spans about 10% of a selected genome. In another embodiment, the SMRT set minimally spans about 20% of a selected genome. In another embodiment, the SMRT set minimally spans about 30% of a selected genome. In another embodiment, the SMRT
set minimally spans about 40% of a selected genome. In another embodiment, the ,, SMRT set minimally spans about 50% of a selected genome. In another embodiment, the SMRT set minimally spans about 60% of a selected genome. In another embodiment, the SMRT set minimally spans about 70% of a selected genome. In another embodiment, the SMRT set minimally spans about 80% of a selected genome.
In another embodiment, the SMRT set minimally spans about 90% of a selected genome. In another embodiment, the SMRT set minimally spans 95% of a selected genome. In other embodiments, the SMRT set minimally spans about 96%, about 97%, about 98% and about 99% of a selected genome.
The SMRT set can be contiguous over each chromosome, and constitute a plurality of tiling subsets equal, for example, to the number of different chromosomes in the genome. For example, where the genome is a human genome, the tiling set can comprise 22 somatic and 2 sex chromosome subsets.
Due to the high throughput amplification procedure used to generate a SMRT
library from a SMRT set, a large number of clones can be included in the SMRT set. In one embodiment of the invention, therefore, the SMRT set comprises at least about 4,000 clones. In another embodiment, the SMRT set comprises at least about 6,000 clones.
In another embodiment, the SMRT set comprises at least about 8,000 clones. In another embodiment, the SMRT set comprises at least about 10,000 clones. In another embodiment, the SMRT set comprises at least about 15,000 clones. In another embodiment, the SMRT set comprises at least about 20,000 clones. In another embodiment, the SMRT set comprises at least about 25,000 clones. In another embodiment, the SMRT set comprises at least about 30,000 clones. In another embodiment, the SMRT set comprises at least about 35,000 clones.

It will be readily apparent to one skilled in the art that, while the methods provided by the instant invention enable the production of libraries from SMRT sets comprising large numbers of genomic clones, the methods are equally applicable to tiling sets that comprise small numbers of clones. For example, a SMRT set can comprise fewer than 4,000 clones and still cover a particular region of the genome with high resolution when the region of the genome is fairly small. Such SMRT sets may be useful in detecting changes in one or more regions of the genome that are related to a particular disease state. In one embodiment, therefore, the SMRT set comprises less than 4,000 clones. In another embodiment, the SMRT set comprises less than 4,000 clones that cover a portion or portions of a genome, wherein the portions) of a genome is greater then 3 5 mB in size.
In one embodiment of the present invention, the SMRT set comprises the fewest number of genomic clones required to span the entire genome, i. e. constitutes a "minimal SMRT set." In another embodiment, the SMRT set c9mprises the fewest number of genomic clones required to span the entire genome, together with additional clones to increase resolution in regions known to be prone to rearrangement events associated with disease states.
1.1 Suitable genomes Suitable genomes for the construction of a SMRT set are those of metazoan organisms which undergo programmed developmental changes, and which are prone to aberrant changes in developmental programming such as cancers and other developmental diseases. The genome may be a plant genome or an animal genome for which essentially the entire sequence is known and/or for which a physical map is available.
Examples of suitable genomes that are currently known in the art include, but are not limited to, human, mouse, rat, chimpanzee, chicken, zebrafish, fugu, honeybee, C.
elegans, C. briggsae, Drosophila, Arabidopsis, maize, oat, soybean, yeast.
In the rapidly evolving field of genomics, sequencing the genomes from various other metazoan organisms will likely be completed in the future and it will be readily appreciated by those of skill in the art that the methods of the present invention will be equally applicable to such genomes. In one embodiment of the present invention, the genome from which the SMRT set and corresponding library is prepared is the human genome. In another embodiment, the genome is from a mammal other than a human.
In a further embodiment, the genome is from an agriculturally significant plant or animal.
1.2 Maps A map of a genome of interest is required in order to select clones for the SMRT set.
The map provides information about the oxder of individual genomic clones and enables the selection of genome-ordered clones from the one or more genomic libraries. The map can be a sequence map or a physical map.
In one embodiment, the map is a base pair coordinates map that allows one skilled in the art to select appropriate genome-ordered clones according to their physical base pair position as determined by sequencing.
In another embodiment, the map is a fingerprint map generated by restriction digestion of genomic clones of one or more genomic libraries to generate a set of bands, unique i 5 in number and position that form the fingerprint for a particular clone.
For example, the restriction enzyme Hind III may be used to generate the fingerprint map.
The patterns of bands from multiple clones are analyzed by computer and aligned to determine the amount of overlap.
1.3 Genomic Librarnes The size of a typical metazoan genome is usually on the order of several Gigabases (e.g. 3 Gigabases for the human genome). Given this size, a more manageable form of the genome, such a clone library, is required. The genomic libraries used to select clones must provide clones that overlap in theix coverage of the genome. A
genomic library may be available in the public domain or may be constructed specifically for the purpose of generating a SMRT library. The genomic sequences provided by a library are usually cloned into a suitable vector. A number of vectors are known in the art that are suitable for cloning large genomic DNA fragments. The library can reside in, for example, bacterial artificial chromosome (BAC) vectors, yeast artificial chromosome (YAC) vectors, PAC vectors, P1 vectors, or combinations thereof.

Various genomic clone libraries are known in the art and many are commercially available. Examples of clone libraries that are in the public domain include, Caltech human genomic libraries A through D, human genomic libraries RPCI -11 and RPCI-13 (Rosewell Park). Many methods of preparing clone libraries are also known in the art. The genomic libraries can cover the entire selected genome or a region thereof.
1.4 Clone Selection Once a source, or sources, of ordered genomic clones has been identified or constructed, a set of overlapping clones is selected from the sources) to construct a SMRT set. As indicated above, the SMRT set can be selected to span a specific region of interest in the genome, one or more chromosomes, or an entire genome.
The clones for the SMRT set are selected by mapping the location of the clones to a map of the genome. In one embodiment, sequential genomic clones are selected from an ordered set of canonical clones in the genomic library.
Selection of sequential clones for the tiling set can be performed manually, in which case at least the ends of the genomic insert must be sequenced in order to situate the clone on a rnap of the genome. Alternatively, the ordering can be performed with the aid of a computer, i. e. ivr silico. When manual ordering is employed, various methods of sequencing known in the art can be used to sequence the ends of the genomic inserts of selected clones. Alternatively, where a fingerprint map is available for the genome, the clones can be subjected to restriction analysis. When irc silico ordering is used, appropriate parameters can be chosen by one skilled in the art to allow an optimum tiling set to be selected.
As indicated above, the clones selected for the SMRT set contain genomic inserts that provide overlapping coverage of the selected region of the genome. The amount of overlap between the clones will be dependent on the type and number of available genomic clones and the extent of coverage of the selected genome provided by these clones. For example, with respect to the human genome, the availability of a large number of different genomic clones, together with information relating to their respective map positions and sequences, allows one skilled in the art to select various tiling sets representing all, or one or more portions of the genome. For other less well-characterised genomes with fewer libraries of genomic clones available, the possible SMRT sets that can be constructed may be more limited.
Thus, in one embodiment of the invention, the minimal sequence overlap between the insert of one clone and that of each of its neighbouring clones is about 17 bp. In another embodiment, the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about 20 bp. In another embodiment, the minimal sequence overlap between the insert of cne clone and that of its neighbouring clones is about 50 bp. In a further embodiment, the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about 75 bp. In another embodiment, the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about 100 bp. In other embodiments, the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about 150 bp, about 175 bp, about 200 bp, about 250 by and about 500 bp.
In another embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 17 by to 1,500 bp. In another embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 17 by to 1,000 bp. In a further embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 17 by to 750 bp. In another embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 17 by to 500 bp. In another embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 25 by to 1,500 bp. In another embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 25 by to 1,000 bp. In another embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 25 by to 750 bp. In other embodiments, the sequence overlap between the inserts of neighbouring clones ranges from 50 by to 1,500 bp, from 75 by to 1,500 bp, from 100 by to 1,500 by and from 250 by to 1,500 bp.
In an alternate embodiment of the invention, the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about 10%. In another embodiment, the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about 30%. In another embodiment, the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about 40%. In another embodiment, the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about SO%. In another embodiment, the minimal sequence overlap between the insert of one clone and that of its neighbouring clones is about 60%.
In a further embodiment of the invention, the sequence overlap between the inserts of neighbouring clones ranges from 10% to 90 %. In another embodiment, the sequence overlap ranges from 20% to 80 %. In another embodiment, the sequence overlap ranges from 30% to 70%. In another embodiment, the sequence overlap ranges from 40% to 60 %. In another embodiment, the sequence overlap ranges from 20% to 90 %.
In another embodiment, the sequence overlap ranges from 30% to 90 %. In another embodiment, the sequence overlap ranges from 40% to 90 %. In another embodiment, the sequence overlap ranges from 10% to 80 %. In one embodiment, the sequence overlap ranges from 10% to 70 %. In another embodiment, the sequence overlap ranges from 10% to 60 %. In another embodiment, the sequence overlap ranges from 10% to 50 %. In another embodiment, the sequence overlap ranges from 10% to 40 %. In another embodiment, the sequence overlap ranges from 20% to 40 %. In another embodiment, the sequence overlap ranges from 30% to 40 %.
A larger overlap between selected clones would allow a greater number of clones covering a specific sequence of the genome and potentially better validation of a select region. In one embodiment of the present invention, clones that are completely sequenced are selected over unsequenced clones for inclusion in the SMRT
library.
Genornic clones selected for inclusion in the SMRT can have inserts of varying size.
In one embodiment of the present invention, the SMRT set comprises clones in which the genomic inserts are between about 15 kb and about 300 kb in length. In another embodiment, the SMRT set comprises clones in which the genomic inserts are between about 50 kb and about 200 kb in length. In another embodiment, the SMRT
set comprises clones in which the genomic inserts are between about 50 kb and about 300 kb in length. In another embodiment, the SMRT set comprises clones in which the genomic inserts are between about 100 kb and about 300 kb in length. In another embodiment, the SMRT set comprises clones in which the genomic inserts are between about 100 kb and about 200 kb in length.

The genomic insert start points of the selected clones are staggered throughout the genome. In one embodiment, the insert start points are between about 0.07 and Megabase apart . In another embodiment, the genomic insert clone start points are between about 0.15 and 0.5 Megabase apart. If desired, the clone representation can be denser in those regions of the genome containing loci that are known or suspected to be prone to rearrangement events in order to provide higher resolution at these points.
If gaps occur in the genome tiling set when a single source of clones is employed to generate the SMRT set, these can be bridged by selecting genomic clones from alternate libraries in the public domain. Alternatively, if suitable clones do not exist in the public domain or cannot be found in a library constructed for the purpose, then the gaps can be filled in by chromosome walking, or by other methods known to a worker skilled in the art.
An exemplary method of selecting appropriate clones for a SMRT set when comprehensive information relating to a genome is publicly available, and a large number of genomic clones are accessible, is as follows: The base pair positions of all clone inserts relating to the genome can be downloaded from an appropriate source (for example, for the human genome, from the UCSC genome browser website or other similar site). A clone at one end of a first chromosome is selected. A
second clone is then selected that overlaps by an appropriate amount (such as between 17 by and 1,500 bp) with the end of the first clone as determined by the base pair positions of the two clones. A third clone is then selected that overlaps by an appropriate amount with the end of the second clone as determined by base pair position.
This process is continued until sufficient clones have been selected to cover the whole chromosome, genomic region or entire genome. For example, utilising the USSC
genome browser screenshot of a chromosome, a tiling set can be manually picked or the positional information can be used to arrange the clones into an overlapping clone set.
In an exemplary embodiment of the invention, the clones for the SMRT set are selected using a physical map generated by restriction enzyme digestion such that neighbouring clones share no fewer than 4 restriction enzyme fragments with respect to the fingerprint map (i. e. if, for example, the physical map being used is a Hind III

restriction map, then each clone insert should overlap by at least 4 Hind III
fragments); the clone inserts are between about 151cB and about 300 kB in length, and none of the inserts in the selected clones of a tiling set should share the same 3' or 5' ends (i. e. both ends of the insert of each clone are staggered throughout the genome, or portion of the genome). In addition, each of the selected clones contains more than about 20 restriction enzyme sites with respect to the fingerprint map.
1.5 Clone Validation and Replacement Once a preliminary SMRT set has been generated, it can be validated by fingerprinting or by sequencing to ensure that each genomic clone in the set corresponds to that stored in the genomic reference map. If the genomic clone does not pass the validation step, it is replaced with another clone, or clones, providing equivalent sequence coverage of the genome. In addition, if any gaps in genome sequence coverage are identified, additional genomic clones can be selected to cover the gaps.
If a greater depth of coverage is desired for certain regions of the genome, then additional clones that cover the region of interest can also be added at this stage, provided that they otherwise meet the above criteria. The resulting set of clones constitutes the final SMRT set.
1.6 Characterization of the SMRT Set Once the final SMRT set is generated it can be characterized if required according to parameters such as clone location on the map, sequence coverage, resolution, coverage depth, gaps and sequence overlap.
Sequence coverage of the SMRT set can be defined in terms of the percentage of the selected genome or selected genomic region that is represented in the SMRT
set. In accordance with one embodiment of the present invention, the sequence coverage of the final SMRT set is at least 90% of the selected genome or genomic region.
In another embodiment, the sequence coverage of the final SMRT set is at least 95% of the selected genome or genomic region. In another embodiment, the sequence coverage of the final SMRT set is at least 96% of the selected genome or genomic region. In another embodiment, the sequence coverage of the final SMRT set is at least 97% of the selected genome or genomic region. In another embodiment, the sequence coverage of the final SMRT set is at least 98% of the selected genome or genomic region. In another embodiment, the sequence coverage of the final SMRT
set is at least 99% of the selected genome or genomic region.
In accordance with one embodiment of the present invention, the resolution of the final SMRT set is less than 1 Mb. In one embodiment of the invention, the resolution of the final SMRT set is less than 1 Mb. In another embodiment, the resolution of the final SMRT set is less than 0.95 Mb. In a further embodiment, the resolution of the final SMRT set is less than 0.9 Mb. In another embodiment, the resolution of the final SMRT set is less than 0.85 Mb. In another embodiment, the resolution of the final SMRT set is less than 0.8 Mb.
The average depth of coverage of the final SMRT set should be at least 1X. In one embodiment, the average depth of coverage of the final SMRT set is at least 1.2X. In another embodiment, the average depth of coverage of the final SMRT set is at least 1.3X. In another embodiment, the average depth of coverage of the final SMRT
set is at least 1.4X. In another embodiment, the average depth of coverage of the final SMRT set is at least 1.5X. In another embodiment, the average depth of coverage of the final SMRT set is at least 1.6X.
In general, the clones that are comprised by the final SMRT set have been selected such that there are minimal gaps in sequence coverage provided by the SMRT
set. In one embodiment of the invention, there are essentially no gaps in the sequence coverage provided by the SMRT set. In another embodiment, there are less than about 0.1 % gaps in the sequence coverage provided by the SMRT set. In another embodiment, there are less than about 0.2% gaps in the sequence coverage provided by the SMRT set. In another embodiment, there are less than about 0.5% gaps in the sequence coverage provided by the SMRT set. In another embodiment, there are less than about 1.0% gaps in the sequence coverage provided by the SMRT set. In another embodiment, there are less than about 1.5% gaps in the sequence coverage provided by the SMRT set. In another embodiment, there are less than about 2% gaps in the sequence coverage provided by the SMRT set. In other embodiments, there are less than about 3%, less than about 4% and less than about 5% gaps in the sequence coverage provided by the SMRT set.

As indicated above, the inserts of the clones selected for the SMRT set have overlapping sequences. In accordance with one embodiment of the invention, the sequence overlap between the inserts of neighbouring clones ranges from 17 by to 1,500 bp. In another embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 17 by to 1,000 bp. In a fiuther embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 17 by to 750 bp. In another embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 17 by to 500 bp. In another embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 2.5 by to 1,500 bp. In another embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 25 by to 1,000 bp. In another embodiment, the sequence overlap between the inserts of neighbouring clones ranges from 25 by to 750 bp. In other embodiments, the sequence overlap between the inserts of neighbouring clones ranges from 50 by to 1,500 bp, from 75 by to 1,500 bp, from 100 by to 1,500 by and from 250 by to 1,500 bp.
In accordance with an alternate embodiment of the present invention, the average sequence overlap between the inserts of neighbouring clones is between about 10%
and about 90%. In another embodiment, the average sequence overlap between the inserts of neighbouring clones is between about 20% and about 80%. In another embodiment, the average sequence overlap between the inserts of neighbouring clones is between about 30% and about 70%. Zn another embodiment, the average sequence overlap between the inserts of neighbouring clones is between about 30% and about 60%. In another embodiment, the average sequence overlap between the inserts of neighbouring clones is between about 30% and about 50%.
One skilled in the art will readily appreciate that if there are gaps in the sequence coverage of the final tiling set, then some of the clones in the set may contain an insert that does not overlap with the insert of an adjacent clone. The tiling set of genomic clones, therefore, provides substantially overlapping coverage of the selected region of the genome. As used herein, "substantially overlapping" means that at least 95% of the clones within the SMRT set contain genomic inserts that overlap. In one embodiment of the invention, at least 96% of the clones within the SMRT set contain genomic inserts that overlap. In another embodiment, at least 97% of the clones within the SMRT set contain genomic inserts that overlap. In another embodiment, at least 98% of the clones within the SMRT set contain genomic inserts that overlap. In a further embodiment, at least 99% of the clones within the SMRT set contain genomic inserts that overlap. In another embodiment, the genomic inserts of all the clones within the SMRT set overlap.
2.0 Preparing the SubMegabase Resolution Tiling (SMRT) Library The procedure for preparing a SMRT library entails preparing a SMRT pool from each genomic clone in the SMRT set using a protocol suitable for high throughput and automated generation of SMRT pools. This protocol eliminates selected purification steps by use of at least one dilution-processing step and results in the production of a SMRT pool that is replenishable by further amplification. The SMRT pools subsequently become the source material from which greater quantities of SMRT
pools can be produced, thereby allowing fiuther amplification and generation of synthetic genomic DNA fragments without the need for preparation of additional amounts of the starting genomic clones.
The protocol entails a) preparing genomic clone DNA, b) reduction of the genomic clone DNA into fragments, and c) amplifying the fragments to generate a SMRT
pool.
The procedure comprises one or more dilution-processing steps that can be part of step (b) or (c) or both.
Dilution processing comprises taking an aliquot of a first reaction and adding this aliquot to a subsequent reaction mixture such that the aliquot is diluted or by taking the entire first reaction, or a portion thereof, diluting this with a suitable diluent and using all or an aliquot of the diluted first reaction in the subsequent reaction. The preparation of dilutions is known in the art and variations in the practical steps of making dilutions are contemplated by the method of the invention. Suitable diluents include water and various solutions that do not comprise components that would interfere with subsequent procedures in the method. In one embodiment of the invention, the dilution-processing step comprises diluting a reaction, or aliquot thereof, between about 1:2 to about 1:500 in either a diluent or a subsequent reaction mixture. In another embodiment of the invention, the dilution-processing step comprises diluting a reaction, or an aliquot thereof, between about 1:2 to about 1:100.
In another embodiment, the dilution-processing step comprises diluting a reaction, or an aliquot thereof, between about 1:2 to about 1:50. In another embodiment of the invention, the dilution-processing step comprises diluting a reaction, or an aliquot thereof, between about 1:2 to about 1:40. In another embodiment of the invention, the dilution-processing step comprises diluting a reaction, or an aliquot thereof, between about 1:5 to about 1:40.
In one embodiment of the invention, dilution-processing is conducted after fragmentation of the genomic clone DNA, i. e. the fragmented clone DNA is diluted prior to amplification. In another embodiment, the dilution-processing is conducted as part of the amplification step. In a fiu-ther embodiment, the amplification step comprises more than one amplification reaction and the dilution-processing is conducted between amplifications. In another embodiment, the procedure comprises more than one amplification reaction and more than one dilution-processing step.
2.1 Preparation of Genomic Clone DNA
Preparation of DNA from the genomic clones of the SMRT set can be carried out by various methods known in the art (see, for example, Sambrook, et al., Molecular Cloning: A Laboratory Manual, 2d ed. (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.; Ausubel et al. (eds.), Current Protocols in Molecular Biology, J. Wiley & Sons, New York, NY). A variety of commercial kits are also available for this purpose. Such methods include high throughput automatable method for isolation of clone DNA from bacteria. An exemplary method is described in the Examples section.
2.2 Reduction of genomic clone DNA into fragments Reduction of the genomic clones into fragments can be achieved by a number of techniques known in the art, for example, by digestion with one or more restriction enzymes, by exposure to UV light or gamma radiation, or by physical methods, such as sonication or mechanical shearing. Limited DNase I digestion may also be employed to generate fragments of the genomic clones. In one embodiment of the invention, the method of preparing a SMRT library includes the step of reduction of genomic clones into fragments by restriction digestion. Once restriction digest of the genomic clone DNA is complete, the restriction enzyme can be inactivated, for example, by heating the restriction enzyme digest mixture, precipitation of the DNA, or use of a commercial kit.
Reduction of the genomic clones to fragments should yield fragments of a size such that they are readily amplifiable by conventional techniques and generate amplified sequences that are sufficiently long to serve as hybridization probes. In one embodiment of the invention, therefore, the average length of the fragments is between about 50 and about 5000 nucleotides. In another embodiment, the average length of the fragments is between about SO and about 1500 nucleotides. In another embodiment, the average length of the restriction fragments is between about 100 and about 1000 nucleotides. In yet another embodiment, the average length of restriction fragments is between about 100 and about 2000 nucleotides. In a further embodiment, the average length of the fragments is between about 100 and about 500 nucleotides.
A worker skilled in the art will appreciate that when the fragments are generated by digestion of the genomic clone DNA with one or more restrictionenzymes, the final length of the fragments will be influenced by the selection of the restriction enzyme(s). For example, the use of a restriction enzyme that has a four-nucleotide recognition sequence will generate DNA fragments having an average length of about 44, or 256 nucleotides, whereas the use of a restriction enzyme having a five-nucleotide recognition sequence will generate DNA fragments having an average length of about 45, or 1024 macleotides. In addition, the base composition of a genome, which is specific for a given organism, will also influence the average length of fragments generated by restriction enzyme digestion. For example, a genome rich in GC content will contain more sites for a restriction enzyme whose recognition sequence is also GC rich, than sites for a restriction enzyme whose recognition sequence is AT rich. Selection of suitable restriction enzymes) to generate appropriate length fragments is considered to be within the ordinary skills of a worker in the art. The selected restriction enzymes) may generate 5' overhangs, blunt ends, or 3' overhangs. Restriction enzymes suitable for use typically have at least 4-base cleavage sites. Enzymes having 5-, or 6-base cleavage sites are also suitable.
Non-limiting examples of suitable restriction enzymes include the following 4-base cutters:
CviJI, MnlI, AIuI, BsuFI, HapII, HpaII, MseI, MspI, AccII, BstUI, BsuEI, FnuDII, Thai, Bce243I, BsaPI, Bsp67I, BspAI,BspPII, BsrPII, BssGII, BstEIII, BstXII, CpaI, CviAI, DpnII, FnuAII, FnuCI, FnuEI, MboI, MmeII, MnoIII, MosI, MthI, NdeII, NflI, NIaII, NsiAI, NsuI, PfaI, Sau3AI, SinMI, HhaI, HinPI, BsuRI, HaeIII, NgoII, CviQI, RsaI, TaqI, and TthHBI. In one embodiment of the invention, the method of preparing the SMRT library includes the step of digesting the genomic clones with MseI.
2.3 Amplification of the DNA fragments to provide SMRT pools Amplification of the DNA fragments can be achieved using one of a number of amplification techniques known in the art that are suitable for high throughput generation of amplified fragments. Both strands of the fragment may be amplified, or one strand only may be amplified. In one embodiment of the invention, amplification of the DNA fragments is achieved using a PCR-based method. Suitable PCR-based methods include, but are not limited to, degenerate oligonucleotide-primed PCR
(DOP-PCR; which utilizes partially degenerate primer sequence (6 out of 21) and repeated thermocycling (see Telenius, et al., Ge~comics 13(3):718-25, 1992)), primer-extension preamplification (PEP) (Zhang et al., Proc. Natl. Acad. Sci. USA
89:5847-5851, 1992), random primer PCR and ligation-mediated PCR (LM-PCR) methods.
One or more rounds of amplification can be conducted. For example, the products of a first amplification reaction can be used as a template for a second round of amplification to increase the yield of the SMRT pools. Further rounds of amplification can be conducted if desired.
In one embodiment of the invention, amplification of the restriction fragments is conducted by a LM-PCR protocol, in which a known sequence (either an adaptor or a synthetic linker) is attached to the ends of the DNA fragments, thus providing primer binding sites for PCR amplification. The DNA can then be amplified by PCR
using primers that are complementary to the sequence of the adaptor or linker.

A variety of methods known in the art for attachment of linker oligonucleotides to the genomic DNA fragments can be used. Typically, the linkers are attached to the genomic DNA fragments using a DNA ligase according to procedures known in the art or provided by the manufacturer of the ligase enzyme.
The linking oligonucleotides that are attached to the ends of each fragment may comprise the same or different sequences and can be ligated to the fragmtents in separate reactions, or simultaneously. If desired, the linking oligonucleotides can contain one, or a plurality of, restriction enzyme recognition sequences. The linking oligonucleotides may also be modified, such as by the inclusion of a moiety which is a first member of a binding pair, to allow binding of the fragment by a second member of the binding pair coated on the surface of a solid support. The term "binding pair" as used herein, means a pair of moieties whose physiochemical properties are known and can be exploited to allow specific, mutual binding to, or interaction with, the other member of the binding pair. Examples of suitable binding pairs for use in the present invention include, but are not restricted to: antigen or hapten with antibody;
antibody with anti-antibody; receptor with ligand; enzyme or enzyme fragment with substrate, substrate analogue or ligand; biotin or lectin with avidin or streptavidin;
lectin with carbohydrate; digoxin with anti-digoxin; benzamidine with trypsin or other serine proteases; protein A with immtmoglobulin; pairs of leucine zipper motifs (see, for example, U.S. Patent No. 5,643,731), bacitracin with undecaphosphopreyl pyrophosphate and the like.
In one embodiment, PCR is used to amplify the DNA fragments after ligation of two different linker oligonucleotides to the fragments. In a further embodiment, one of the primers for the amplification reaction is modified by inclusion of one member of a binding pair.
When a LM-PCR protocol is employed, the primer oligonucleotides are complementary to one or both linker oligonucleotides attached to the ends of each DNA fragment. The primer oligonucleotides may be modified, such as by the inclusion of a moiety which is a first member of a binding pair, to allow binding of the DNA by a second member of the binding pair coated on the surface of a solid support as described above.

In accordance with one embodiment of the present invention, LM-PCR is employed to prepare the SMRT pools and a dilution-processing step is included after ligation of the linkers to the fragments and prior to the amplification step. In another embodiment, an additional round of amplification is included and a second dilution-processing step is employed in which the products of the first round of amplification are diluted prior to the second round of amplification.
If desired, one or more detectable label may be incorporated into the DNA
fragment during or after the amplification reaction. Detectable labels are molecules or moieties a property or characteristic of which can be detected directly or indirectly and are chosen such that the ability of the nucleic acid molecule to hybridise with its target sequence is not affected. Methods of labelling nucleic acid sequences are well-known in the art (see, for example, Ausubel et al., (1997 & updates) Cur~eht Protocols ivc Molecular Biology, Wiley & Sons, New York). Labels contemplated by the present invention include directly detectable labels, such as radioisotopes, fluorophores, chemiluminophores, enzymes, colloidal particles, fluorescent microparticles, intercalating dyes such as SYBR green or ethidium bromide and the like. One skilled in the art will understand that directly detectable labels may require additional components, such as substrates, triggering reagents, light, and the like to enable detection of the label. The present invention also contemplates the use of labels that are detected indirectly. Indirectly detectable labels are typically pairs of binding members one of which is attached or coupled to a directly detectable label.
Non-limiting examples of suitable binding pairs are provided above.
The SMRT pools generated by the method of the present invention comprise a quantity of DNA fragments of varying size, depending on the way the original clone was reduced into fragments. In one embodiment of the invention, the SMRT pools comprise DNA fragments from about 50 by to about 5000 bp. In another embodiment, the SMRT pools comprise DNA fragments from about 100 to about 2000 bp. The quantity of DNA fragments produced by the methods of the invention will be dependent on the starting ocncentration of genomic clone and also on the number pf amplification reactions that are conducted. Typically, the methods described herein provide for the generation of SMRT pools comprising between about 20~g to about 100 ~,g of DNA. In one embodiment, the SMRT pool comprises between about 40 ~,g to about SO~,g of DNA.
Once generated, the SMRT pools of the invention may be stored in a mufti-well format to facilitate the preparation of SMRT arrays therefrom. For example, SMRT
pools representing different chromosomes can be stored in separate mufti-well plates such that certain plates correspond to a particular chromosome. Alternatively, SMRT
pools representing regions of the genome involved in a particular disease or condition can be stored together in one or more mufti-well plate. The SMRT pools can be prepared for storage by conventional techniques, for example, by refrigeration, by freezing, either directly or after addition of a suitable cryoprotectant, such as glycerol or dimethyl sulphoxide (DMSO), by lyophilsation, or similar procedure. Storage can be at room temperature, under refrigeration (for example, at 4°C), or under freezing conditions (for example at -20°C, -70°C or -80°C).
Selection of appropriate storage techniques and conditions is considered to be within the skills of a worker in the art.
3.0 Quality Control The generation of SMRT pools.requires multiple steps and generally takes place in a mufti-well format to facilitate high-throughput. Quality control procedures can be used, therefore, to confirm sequence identity, detect any plate exchanges or mis-labelling, and/or to assess any well-to-well contamination.
A number of techniques known in the art can be employed for quality control testing.
For example, DNA restriction digest fingerprint analysis, fluorescence in situ hybridization (FISH) mapping and DNA sequencing. Protocols for these methods are known to a skilled worker (see, for example, Sambrook, et al., Molecular Cloning: A
Laboratory Manual, 2d ed. (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.; Ausubel et al. (eds.), Current Protocols in Molecular Biology, J. Wiley & Sons, New York, NY).
For example, quality control of SMRT pools using DNA restriction digest fingerprint analysis can be achieved by digesting the genomic clone corresponding to that used to generate the SMRT pool with an appropriate restriction enzyme and running the digest on an agarose gel. The SMRT pool is then labelled with a detectable label and used as a hybridization probe in a Southern hybridization procedure. If the labelled SMRT
pool is able to detect all the digest fragments of the genomic clone by hybridization and does not hybridize substantially to negative control samples, the identity of the SMRT pool is considered to be verified.
Quality control using FISH analysis involves labelling a SMRT pool with a detectable label and subsequent hybridization of the labelled SMRT pool to a metaphase chromosome preparation. This procedure allows the mapping of a SMRT pool to a chromosomal region and, therefore, provides a crude quality control step. FISH
analysis does not, however, provide positive verification that a SMRT pool is derived from a specific genomic clone.
DNA sequencing can be employed for quality control by removal of a sample of a SMRT pool and submitting this to standard DNA sequencing techniques using end primers designed to be complementary to vector sequences flanking the genomic insert in the clone used to generate the SMRT pool. This approach is successful because within the large collection of amplified products from one SMRT pool, a subset of fragments contain vector sequence followed by a short stretch of unique sequence terminating at the most proximal site of fragmentation. An appropriate primer is selected for the sequencing reaction based on the type of vector contained in the original genomic clones. The primers are selected to anneal to a region of the vector that is proximal to the insert contained in the clone such that it can be used to generate sequence data relating to the insert. In one embodiment of the invention, the primer sequences are complementary to a region in close proximity to the multiple cloning site of the vector. Methods of identifying and preparing such primers are known to a worker skilled in the art. Many suitable primers are also available commercially. Examples of suitable commercially available primers include, but are not limited to, T7 primer, SP6 primer, M13 forward primer and M13 reverse primer.
Standard sequencing protocols and kits, which are based on chain termination methods employing radioactive or fluorescent labels, can be employed to carry out the sequencing step. Cycle sequencing, which is a modification of this procedure in which the chain terminators are incorporated using PCR, is also suitable for use as a quality control procedure in accordance with the present invention. Several kits for sequencing DNA, including cycle sequencing kits, are commercially available.
Examples include, but are not limited to, Applied Biosystems Big Dye Cycle Sequencing kit. The methods for separation and detection of labelled DNA
fragments generated during the sequencing reaction are typically automated and can be performed using DNA sequencers and analyzers that are also commercially available.
These methods generally require purification of the template prior to sequencing in order to remove any nucleotides or primers that may be carried over from prior manipulations of the template.
Once sequence information regarding a SMRT pool has been obtained, it can be analyzed against the appropriate genome sequence, for example, using the tools provided by the National Center for Biotechnology Information (NCBI) such as BLASTN. Scripts can be written for use with BLAST to allow multiple sequences to query BLAST simultaneously.
Quality control procedures can be used to verify all of the SMRT pools generated in the production of an SMRT library. Alternatively, when the SMRT pools are generated and stored in a mufti-well format, selected SMRT pools can be verified only, which will allow for detection of plate switches and flips that may occur during handling. This can be achieved, for example, by verifying one sample from each archived row or selected samples from each plate. As demonstrated in the Examples contained herein, quality control can be accomplished by sequencing as few as three samples from each 96-well plate. In one embodiment of the invention, all of the SMRT pools in an SMRT library are verified by quality control procedures. In another embodiment of the invention, selected SMRT pools from each mufti-well plate in the library are verified by quality control procedures. In a further embodiment, $ SMRT pools from each mufti-well plate in the library are verified by quality control procedures. In yet another embodiment, 3 SMRT pools from each mufti-well plate in the library are verified by quality control procedures.
3.1 High Throughput Insert End Sequence Analysis In one embodiment of the present invention, in order to facilitate high throughput screening of large numbers of SMRT pools and to reduce cost of the quality control procedure, a high-throughput end sequence analysis protocol is employed as a quality control procedure. The protocol uses cycle sequencing and involves the end sequence analysis of less than 200 by of the genomic insert in order to confirm insert identity.
There are a number of factors that enable this analysis to be performed in a high throughput manner. The first factor is that less than 200 bp, and in general less than 50 bp, of the end sequence of the insert are required to be analyzed in order to confirm insert identity, whereas typical sequence reads are between about 200 by and about 600 bp. The minimum required sequence read is 17 bp. The second factor is that, rather than conduct a purification of the template to be amplified to remove interfering excess nucleotide and primers, primer annealing is conducted in the presence of contaminating amplification primers. The third factor is that a higher amount of template is used, and the fourth factor is that a greater number of thermal cycles is employed in the sequencing reaction (for example, about 65 or more cycles versus the standard of 2S to 35 cycles).
As indicated above, sequence analysis of SMRT pools is enabled by the fact that within each SMRT pool is a subset of amplification products that contain vector sequence followed by a short stretch of unique sequence corresponding to the genomic insert and terminating at the most proximal site of fragmentation. In contrast to standard sequence identification, it has been determined that as few as 17 by of this unique sequence is required to confirm the identity of the SMRT pool, such that the targeted sequence read ranges from at least 17 by to about 200 by in size. In one embodiment of the invention, the targeted sequence read ranges from 17 by to about 100 by in size. In another embodiment, the targeted sequence read ranges from 17 by to about 50 by in size.
The fact that such a short sequence read is required to confirm the identity of the SMRT pools means that a low efficiency sequence analysis can be employed, which in turn facilitates high throughput analysis of the SMRT pools. Thus, the high throughput end sequence analysis protocol of the present invention employs unpurified template for the sequencing reaction. The sequence analysis is carried out using an aliquot of the selected SMRT pool without the need to remove excess nucleotides and primers remaining in the pool from the one or more amplification step. Similarly, if a library of SMRT pools is to be used for generating an array, the SMRT pools can be sequenced after resuspension in spotting solution just prior to spotting on the array, provided that the spotting solution does not contain components that interfere with the sequencing reaction. A SMRT pool can be sequenced after one round of amplification, or after a second, or subsequent, round of amplification.
The use of unpurified template is also facilitated by modifications made to the cycle sequencing protocol employed in the sequencing reaction. These modifications comprise increasing the amount of template used in the reaction and increasing the number of cycles in the reaction. The number of PCR cycles in the extension reaction is increased substantially over the number of cycles described in protocols found in the art, which typically range from about 25 to about 50. In one embodiment of the invention, the number of cycles used in the sequencing reaction is between about 65 to about 100. In another embodiment, the number of cycles used in the reaction is between about 65 and about 95. In another embodiment,,the number of cycles used in the reaction is between about 65 and about 90. In another embodiment, the number of cycles used in the reaction is between about 75 and about 100. In another embodiment, the number of cycles used in the reaction is between about 85 and about 100. In fiu-ther embodiments, the number of cycles used in the reaction is between about 85 and about 95, and between about 85 and bout 90.
The amount of DNA template used in the sequencing reaction can vary from 10 finol to 75 fmol. In one embodiment, the amount of DNA template added to the sequencing reaction is between about 20 fmol and about 50 fmol. In another embodiment, the amount of DNA template added to the sequencing reaction is between about 30 finol to about 40 fmol. In an alternative embodiment, the amount of DNA template added to the sequencing reaction is between about 1 ng to about 50 ng. In one embodiment, the amount of DNA template added to the sequencing reaction is between about 5 ng to about 40 ng. In another embodiment, the amount of DNA template added to the sequencing reaction is between about 15 ng to about 30 ng.

In one embodiment of the invention, the high throughput end sequence analysis employs about 4% of each unpurified SMRT pool as the DNA template in the sequencing reaction and a cycle sequencing kit. The protocol supplied by the manufacturer of the kit is followed but the number of PCR cycles in the extension reaction is increased to about 85. In another embodiment of the invention, the high throughput end sequence analysis employs an amount of each unpurified SMRT
pool that corresponds to approximately 20 fmol of DNA. In a further embodiment, the number of PCR cycles in the extension reaction is increased to about 95.
SUB-LIBRARIES
The present invention also contemplates sub-libraries derived from the SMRT
libraries of the invention. A sub-library comprises a sub-set of SMRT pools that make up the SMRT library. The selected SMRT pools can be, for example, pools that correspond to a specific region of the genome that is of interest, such as one or more chromosomes, an arm of a chromosome, or particular regions of the genome known to be involved in disease, drug resistance or susceptibility, or the like. If the specific SMRT pools required for the sub-library are known then they can readily be removed from the SMRT library and transferred to an alternative mufti-well container or spotted onto a solid support to provide an array (see below). Similarly, if the location of the regions) or chromosomes) of interest on the physical map of the genome is known, then the appropriate genomic DNA clones) in the tiling set can be identified and the SMRT pools derived from these clones can then be selected from the library.
Alternatively, a library can be screened for SMRT pools corresponding to a region of interest, for example, by hybridization techniques, and the appropriate SMRT
pools selected.
USES OF THE LIBRARIES OF SMRT POOLS
Once the library of SMRT pools has been generated, it can be used for the preparation of a SMRT array or for the selection of probes for various applications, for example, Southern hybridization or FISH analysis.

1.0 SMRT arrays A SMRT array can be prepared by spotting each member of the library, or libraries, of SMRT pools onto one or more solid support in an arrayed configuration, using standard techniques known in the art, wherein each point of the array corresponds to a SMRT pool. In order to prepare a SMRT array, the SMRT pools are typically precipitated, for example, with ethanol and then resuspended directly in a suitable spotting solvent, for example, 20% DMSO, 50% formamide. The SMRT pools of the SMRT library can be deposited onto the solid support. The SMRT pools can be deposited in random order, or in a specific order, for example, according to their map position. Methods of deposition in array construction are known in the art. In general, an array is constructed by binding nucleic acid molecules of the SMRT pools to a solid support in an ordered spatial arrangement so that each SMRT pool is present at a specified location on the support. The solid support can be a membrane, such as a nylon membrane, activated nylon membrane or nitrocellulose membrane, a filter, a chip, a glass slide, or other suitable solid support [see, for example, U.S.
Pat. No.
5,837,832; PCT application W095/11995; Lockhart, D. J., et al., (1996) Nat.
Bioteeh., 14:1675-1680; Schena, M., et al., (1996) Proc. Natl. Acad. Sci.
TISA, 93:10614-10619; U.S. Pat. No. 5,807,522].
As SMRT libraries can span an entire genome, or portion of the genome, the SMRT
arrays can also cover an entire genome or portion of a genome. A subset of pools from a SMRT library (a sub-library) may be used to generate the SMRT array.
The SMRT arrays can contain more than one SMRT library if desired, and thus can cover more than one genome.
Furthermore, the SMRT arrays can comprise combinations of one or more SMRT
libraries and other types of nucleic acids. Examples of other types of nucleic acids include, for example, viral DNA, plasmid DNA, and oligonucleotides. For example, a SMRT array can comprise a SMRT library representing a genome or portion of a genome in addition to a series of oligonucleotides designed to increase the resolution of an array in a particular region of the genome, or a SMRT library representing a genome or portion of a genome in addition to genomic DNA of a relevant virus, bacteria or organelles) associated with a particular disease state.
Alternatively, a SMRT array can comprise a SMRT library in combination with one or more control or reference DNA sequences that allow for identification, orientation, or normalization of the results generated using the array.
The present invention also contemplates genome-wide high resolution arrays, which can be used to analyze specific regions of interest in the genome either by blocking portions of the array from exposure to hybridization solution, or by only analyzing portions of the array that are of interest.
SMRT arrays may be constructed in a low or high density format. The term "high density' as used herein with reference to an array, means that the array comprises more than about 60 different SMRT pools per cm2. In one embodiment of the invention, a high density SMRT arrays comprises more than about 100 different SMRT pools per cm2. In another embodiment, a high density SNRT array comprises more than about 600 different SMRT pools per cm2: In another embodiment, a high density SMRT array comprises more than about 1000 different SMRT pools per cm2.
In another embodiment, a high density SMRT array comprises more than about 5,000 different SMRT pools per cm2. In another embodiment, a high density SMRT array comprises more than about 10,000 different SMRT pools per cm2. A high density array provides for rapid, essentially simultaneous, evaluation of a number of hybridizations in a single test.
High density arrays can be prepared using robotic spotters to deposit DNA
samples onto one or more solid support. Such robotic spotters use high-grade stainless steel pins to pick up samples and then deposit them in the correct locations on the support.
Robotic spotters are commercially available, for example, from Virtek Biotech, or Telechem. Each SMRT library can be spotted on the array once or in multiplicate. In one embodiment the SMRT library is spotted in triplicate. In another embodiment, the SMRT library is spotted in duplicate.
One embodiment of the invention provides for a high-density array comprising one or more SMRT library. In another embodiment of the invention, the resolution of the high-density array is between 0.03 and 1 Mb. In another embodiment of the invention, the resolution of the high-density array is between 0.05 and 0.08 Mb. In another embodiment, the resolution of the high-density array is 77 Kb (0.077 Mb). In a further embodiment, a sub-Megabase resolution SMRT array representing a complete minimal tiling set across the sequenced human genome is provided.
The present invention also provides for a two-stage SMRT array system. The first stage of such a system comprises a low resolution, genome-wide SMRT array and the second stage comprises one or more chromosome-specific or region-specific (for example, a disease-specific region) high resolution SMRT arrays. The first low resolution array can be used for the initial localization of genetic alterations by gross mapping of altered regions and typically comprises between about 50 and about array points per chromosome arm and used for gross mapping of altered regions.
The second high-density (high resolution) arrays) comprise about 500 to 10,000 array points. Once the altered regions have been mapped using the low resolution array, an appropriate high-resolution array is selected and used to facilitate fine mapping of the altered regions. By using a two-stage system, the number of array points that need to be assayed can be reduced, thus reducing array costs and probe costs.
The present invention fiuther contemplates SMRT arrays that are disease-specific and comprise a tiling subset wherein each point comprises an SMRT pools covering a region of interest. Upon application of the array to a specific disease and identification of relevant spots on the array, a subsequent SMRT array can be generated comprising SMRT pools determined to be in a region of interest. This allows the high density array to be employed to discover novel regions or patterns of genetic alterations and then generate a refined SMRT array which can reduce costs for commercial applications.
SMRT arrays can also be used for preparation of sub-libraries as described above.
Arrays of a sub-library of SMRT pools could be spotted directly onto a solid substrate using the original SMRT library housed in a mufti-well container by programming the array printer to select samples only from those wells containing the desired SMRT
pools.
1.1 Uses of the SMRT Arrays The SMRT arrays of the present invention are useful in a variety of clinical and research settings. For example, SMRT arrays can be used to analyze genetic alterations (polymorphisms, chromosomal rearrangements and translocations), DNA
copy number changes, epigenetic changes such as changes in methylation, changes in gene expression and the discovery of novel genes relevant to disease as well as the identification of known genes that are related to specific diseases, and evolutionary genomic changes. The SMRT arrays can also be used in combination with chromatin imxnunoprecipitation to locate chemical modifications of chromatin, identify chromosome targets of proteins involved in DNA binding or chromatin remodelling, and to identify the sites throughout the genome at which DNA binding proteins interact. This type of information can be used to determine patterns of genomic alteration that may be indicative of disease. The SMRT arrays can be used as tools to diagnose diseases, enable the selection of treatment regimens, and predict resistance to particular treatments.
1.1.1 Array-based Comparative Genome Hybridization (CGH) Array-based CGH detects gain or loss of chromosomal regions through competitive hybridization of test and reference samples to the array. The test and reference samples are distinctly labelled, for example by chromophores or fluorophores of different colours or with different emission spectra, so as to be distinguishable from each other. The signal ratio from the chromophores or fluorophores indicates levels of hybridization to the array, and indicates an increase or reduction in the copy number of the corresponding test DNA sequence. For example, duplication in a region~of the test sample will result in that sample 'competing out' the corresponding reference sample in a competitive hybridization, while deletion in a region of the test sample will result in that sample being 'competed out' by the corresponding reference sample in a competitive hybridization. An increase in the relative signal strength, or an absence of signal, from the test sample at a given array point indicates a duplication or deletion in the corresponding region of the sample DNA. The resolution of the array determines the accuracy with which the rearrangement endpoints can be defined.
As described above, this technique has been used to determine DNA copy number variation in a test genomic DNA sample. This type of information can be used to identify and characterize genes that are related to human disease. The SMRT
arrays of the present invention allow high resolution mapping of amplifications and deletions in the genome, and are thus able to detect microdeletions and microamplifications.
This SMRT arrays can, therefore, be used in CGH-based genomic profiling techniques to diagnose diseases, select treatment regimens, predict the occurrence of drug resistance, or radiation resistance, or predict side-effects of treatments.
The SMRT
array-based CGH experiments can be used to correlate pharmacogenomic and toxicogenomic studies with an individuals genetic profile, thus allowing clinicians to optimize response to therapies. For example, it has been determined (van t'Veer et al.
Nature 415(6871), 530-536) that breast cancer patients with the same disease state responded differently to treatment, depending on their genetic profile.
The SMRT arrays of the present invention can also be used to identify genes that are related to specific diseases thus allowing the selection of specific gene targets for drug discovery.
1.1.2 Epigenetic Modifications to the Genome The SMRT arrays of the present invention can also be used as a tool for detecting epigenetic changes through DNA methylation of CpG islands, which may be a separate, causative mechanism for various diseases, including tumor progression. A
variety of previously published protocols are capable of distinguishing genes with abnormal methylation within total genomic DNA, for example, methylation sensitive AP-PCR (Huang , et al. 1997 Cancer Res. 57(6):1030-1034; Gonzalgo, et al, 1997 Cancer Research 57(4):594-599; Liang, et al, 1998 Genomics. 53(3):260-8) and Genomic Mismatch Scanning (GMS), and other subtractive techniques (Nelson, et al.
Nat Genet. 1993 May;4(1):11-8). Other protocols are known in the art and use of selected protocols in combination with gene array analysis have been reviewed (see, Mantripragada et al. (2004) Ti~euds in Genetics 20: 87-94; Albertson et al.
(2003) Human Molecular Genetics 12:8145-8152), and are contemplated by the invention.
In methylation sensitive AP-PCR, probes can be generated from AP-PCR products of disease samples and normal samples and competitively hybridized to a human SMRT
array. Ratio differences between these probe populations would indicate regions of altered methylation. In GMS, subtracted non mismatched heterohybrids can be competitively hybridized with normal DNA to a SMRT array. In either case, use of a SMRT array in combination with the techniques described above, provides a template from which to comprehensively assess methylation differences in multiple human diseases.
A schematic diagram of an exemplary method of analyzing methylation differences between samples is depicted in Figure 25. Variations of tlus method known to the skilled worker are contemplated within the scope of the invention.
SMRT arrays can also be used to analyze chemical modifications of chromatin and the sites at which DNA-binding proteins interact within the genome. For example, the sites within the genome at which a DNA-binding protein of interest interacts can be determined as follows. In vivo protein-DNA interactions are preserved by cross-linking and then fragmented to reduce the genome to DNA fragments of manageable size. Immunoprecipitation is then carried out using an antibody against the protein of interest, which co-immunoprecipitates DNA that it is cross-linked to. The DNA
fragments are isolated and the cross-linking reversed. The fragments are then amplified, labelled, and hybridized to the SMRT array in order to locate the sites at which the protein of interest binds.
1.L3 Gene annotation tools The SMRT arrays of the present invention can be used to analyze transcriptional activity at the genomic level. It is possible to simultaneously analyze the transcription level of all the genes in a genome. In comparative transcriptome analysis, the target RNA transcripts of all the genes in a test sample (collectively called the 'transcriptome') are first transcribed into corresponding cDNAs, which are labelled with a first label. The target cDNA library represents the full transcriptome.
A
corresponding cDNA library is prepared from a reference transcriptome, and labelled with a second, distinguishable label. The test and reference cDNA libraries are then competitively hybridized against the SMRT array library. The ratio of labelling intensities at each probe array point indicates the transcriptional activity corresponding to a region of genomic DNA of the sample relative to the corresponding gene in the standard. In this way, the global transcriptional activity of the sample can be mapped to specific regions of the genome.
Alternatively, for applications such as drug resistance screening and the like, labelled test cDNA can be hybridised to the SMRT array in the absence of a reference cDNA.
1.2 Interpretation of Array Data Interpretation of data generated from use of the arrays of the invention can be analyzed using methods known in the art. For example, a softwaxe tool called SeeGH
can be used to view and analyze array CGH data. The software gives users the ability to view the data in an overall genomic view as well as magnify specific chromosomal regions facilitating the precise localization of genetic alterations. The application translates spot signal ratio data from array CGH experiments to displays of high resolution chromosome profiles. Data is imported from a simple tab delimited text file obtained from standard microarray image analysis software. SeeGH processes the signal ratio data and graphically displays it in a conventional CGH karyotype diagram with the added features of magnification and DNA segment annotation. In this process, SeeGH imports the data into a database, calculates the average ratio and standard deviation for each replicate spot, and links them to chromosome regions for graphical display. ~nce the data is displayed, users have the option of hiding or flagging DNA segments based on user defined criteria, and retrieve annotation information such as clone name, NCBI sequence accession number, ratio, base pair position on the chromosome, and standard deviation. Detailed information regarding this software is found in Chi et al. (2004) BMC' Bioinformatics 5:13.
In one embodiment, the present invention comprises software for the analysis of data, such as signal intensity and array co-ordinate data generated by use of the SMRT array of the present invention.
2.0 Use of SMRT pools for the preparation of probes In fluorescent iu situ hybridization (FISH), probes are used to investigate in situ the presence or absence of complementary sequence on a metaphase chromosome spread, thereby detecting genomic rearrangement events, if present. Methods of preparing probes for and carrying out FISH analysis are known in the art. Because the SMRT
libraries of the present invention span an entire genome, or portion thereof at high resolution, select SMRT pools from the SMRT libraries are useful as probes for interrogating a subject's chromosome locations using FISH. Typically, FISH
analysis using SMRT pools as probes comprises the steps of 1) labeling selected SMRT
pools with a chromophore or fluorophore, for example using the random primer extension method; 2) hybridizing the labelled DNA with a metaphase nucleus preparation;
and 3) detecting the presence or absence of probe hybridization to the metaphase nucleus preparation. Hybridization of the probe to an inappropriate chromosome or inappropriate region of the chromosome indicates a rearrangement event, while the lack of hybridization indicates a deletion event. The FISH technique is routinely used to identify genomic rearrangement events associated with cancers and other genetic diseases.
The SMRT pools can be used to prepare probes for nucleic acid hybridization techniques. Examples of such hybridization techniques include Southern, Northern and Southwestern blot hybridization. Procedures for labelling such probes are known in the art.
KITS
The present invention additionally provides for kits comprising the SMRT sets, SMRT libraries or SMRT pools described herein. Individual components of the kit can be packaged in separate containers. The clones, SMRT libraries or SMRT
pools can be provided in individual containers or they can be provided in a mufti-well format. The kits may further comprise reagents for preparing and/or re-amplifying the SMRT pools.
The kits can optionally include amplification reagents, reaction components and / or reaction vessels. One or more of the reagents provided in the kit can incorporate a detectable label, or the kit may include reagents for labelling target sequences. One or more of the components of the kit may be lyophilised and the kit may further comprise reagents suitable for the reconstitution of the lyophilised components.

The present invention further provides for kits comprising one or more SMRT
arrays.
These can be provided in a form ready to be applied to a solid substrate or they can be provided as pre-assembled arrays. The kits may additionally contain buffers, labels, and other reagents to facilitate the preparation or use of the arrays, including, for example, buffers and solutions for the preparation of a test sample, extraction of nucleic acids, purification of nucleic acids and the like.
The kits can additionally contain instructions for use, which may be provided in paper form or in computer-readable form, such as a disc, CD, DVD or the like.
The present invention further contemplates that the kits described above may be provided as part of a package that includes computer software to analyse data generated from the use of the kit.
To gain a better understanding of the invention described herein, the following examples are set forth. It should be understood that these examples are for illustrative purposes only. Therefore, they should not limit the scope of this invention in any way.
EXAMPLES
EXAMPLE 1: CLONE SELECTION FOR CONSTRUCTION OF A SAC
CLONE SMRT SET
The human BAC fingerprint-based physical map (15 Nov 2001, McPherson, J.D. et al.
A physical map of the human genome. Nature 409, 934-41 (2001)) generated at Washington University Genome Sequencing Centre was used for the selection of BAC clones for the tiling set. The fingerprint map is a manually curated and mature data set which covers 96% of the genome at an average depth of 1 SX. The redundancy of clone coverage was used to specify desired clone overlap and size characteristics with the goal of achieving representation of every region in the map.
BAC clones were chosen from each of the 726 contigs to provide maximum coverage of the fingerprint map. Clones not assigned to contigs as well as non-canonical (buried, as described in Soderlund, C., Humphray, S., Dunham, A. & French, L.
Gevcome Res 10, 1772-87 (2000), and Soderlund, C., Longden, I. & Mott, R
Comput Appl Biosci 13, 523-35 (1997)) clones were excluded from candidacy. Clone selection exercises were restricted to the readily available RPCI-11 and RPCI-(Osoegawa, K. et al. A bacterial artificial chromosome library for sequencing the complete human genome. Genome Res 11, 483-96 (2001), and Caltech D1/D2 (Lander, E.S. et al. Initial sequencing and analysis of the human genome.
Nature 409, 860-921 (2001) libraries. Clones from these libraries make up 98% of the 406,000 clones in the fingerprint map and 97% of the 146,000 canonical map clones assigned to contigs. The algorithm for clone selection was based on a clone-walking methodology, and each fingerprint map contig was treated independently. For each map contig, the starting set of BAC clones eligible for selection consisted of the ordered set of canonical clones. The order of the clones was previously determined by the process of automated map creation and subsequent manual curation (see Washington University Genome Sequencing Center website).
One of the goals of the selection process was to choose approximately 30,000 clones, thereby balancing resolution, portability and cost of construction. To later assist with positioning the BACs on the genome sequence assembly, those clones with informative BAC end sequence (BES) records [Zhao, S. Human BAC ends. Nucleic Acids Res 28, 129-32 (2000)] containing sufficient non-masked content with unambiguous sequence hits to the August 2001 genomic assembly as noted in Kent, W.J. & Haussler, D. Assembly of the working draft of the human genome with GigAssembler. Gehome Res 11, 1541-8 (2001) were selected preferentially.
Clones with BES hit coordinates that were inconsistent with their position in the 'fingerprint map were avoided in cases where equivalent map coverage could be obtained by selecting another clone. During the selection process, an attempt was also made to enrich the clone set with clones having either existing FISH information (Cheung, V.G. et al. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature 409, 953-8 (2001)) or sequence data.
The BAC clones were selected as follows. Starting from the left end of each map contig, the first canonical clone from the ordered set was always selected.
The next pick was chosen to have as close to, but no fewer than, 4 conserved bands with the previous pick. Conserved bands are defined as bands present in the HindIII

fingerprints of two overlapping clones and in the fingerprints of all clones located between them. Conserved bands emanate from the same DNA and their use minimizes false positives in determining clone overlap, since bands found in multiple adjacent intermedial clones in the ordered map represent the same digest fragment.
Clones smaller than 100 kb or larger than 200 kb, or having fewer than 20 or more than 50 HindIII fragments were chosen only where map coverage could not be provided by other eligible clones. No clones smaller than l5kb or with fewer than 5 sanitized HindIII fragments were chosen. Clones with unique end sequence hits to the assembly were preferred, with clones having both ends aligning to the sequence assembly chosen in priority.
EXAMPLE 2: CLONE VALIDATION AND REPLACEMENT AND
ASSEMBLY OF FINAL SMRT SET
The first round of map-based clone selection as described in Example 1 yielded 29,035 clones representing 99% of the map. All clones were digested using HindIII
and fingerprints were generated as described in Marra, M.A. et al. High throughput fingerprint analysis of large-insert clones. Gehome Res 7, 1072-84 (1997). All validation fingerprints were compared in an automated fashion to those stored in the physical map. The fingerprints in the map are sanitized- all fragments closer than 7 standard mobility' units (this length unit corresponds to a size tolerance of 0.5% at Skb, 3% at 20kb, 5% at 25kb) have been replaced with a single fragment. This sanitization process, motivated by the historical difficulty in determining band copy number for multiplets, artificially lowers the apparent clone size by up to 30%. The fingerprints generated for validation of the BAC clones selected for the tiling set were analyzed using software technology described in Fuhrmann DR, et al. (Genome Res.
2003 13(5):940-53) to identify bands and obtain sizing data. Use of this software obviated the need for sanitization. The fingerprint comparison was made on the basis of the Sulston score, which corresponds to the probability that two fingerprints share similar fragments by chance. Each matching clone fingerprint was assigned a rank, from 1 to 10, indicating the strength of the match with the corresponding clone fingerprint in the map. Clones with rank n had ~-1 map clones which were more similar than corresponding map clone (i.e., had a lower Sulston score reflecting a smaller probability of coincidental overlap). Fingerprints of clones with rank ~3 were visually examined (5,272 clones; including 2,784 clone fingerprints identified by automated analysis as potential mismatches). Fingerprints for 1,978 clones in the set did not match their corresponding fingerprints in the physical map. The discrepancies could be categorized as resulting from clone tracking errors either during the generation of the fingerprint map or in the generation of the rearrayed clone set (1,143 clones), from cross-well contamination, or from situations in which the fingerprint process failed (835 clones).
A second round of clone selection was performed to maintain the coverage represented by the 1,978 failed clones. For each failed clone, neighbouring clones were selected from the map to provide equivalent coverage. In total, 4,531 clones were selected as replacements. These clones were sampled from RPCI-11, RPCI-13 and Caltech-D, in roughly the same proportion as for the final set (87%:8%:5%). An additional 1,258 clones were selected to close gaps larger than lOkb based on the June 2002 UCSC assembly. Approximately 755 of these clones were not in the physical map. A second round of fingerprint verification performed on the replacement clones identified 413 clones that did not match their map fingerprints. These clones were rejected from the set. The final tiling set contained 32,433 clones.
The rearraying of replacement clones from RPCI-11 and RPCI-13 libraries was performed in two steps to minimize manual operational error, well to well cross-contamination and non-growth wells. Clones for each library were first coordinated in 96-well format in the order of plate, row and column. BACs were inoculated, grown in 96-deep well blocks and kept at -80°C until all clones were picked.
The BACs were then condensed into 384-well plates using 96-pin tools.
Of the 32,433 clones making up the final SMRT set, 31,678 are in the fingerprint map, with 31,676 localized to contigs. The remaining 755 clones not found in the fingerprint map were selected to provide coverage of the sequence assembly based on the sequence assembly coordinates of their BES matches. The majority of the clones are from RPCI-11 (92%) with the remaining 2% from RPCI-13 and 6% from Caltech libraries. Based on validation fingerprint data of the tiling set, the average clone size and HindIII fragment counts for each library are 189 kb/46 (RPCI 11), 160 kb/37 (RPCI-13) and 146 kb/35 (Caltech-D). The average sizes of the tiling set members based on BES data are 176 kb for RPCI-11 clones and 140 kb for Caltech-D, indicating that the sizes of the validation fingerprints overestimate the size by 4-7%.
This difference is in part due to vector-insert junction fragments, present in the fingerprints.
During the selection procedure, it was attempted to maximize the number of clones in the tiling set which had either sequence accessions or prior FISH data.
Genbank (Jan 2003, Benson, D.A, et al. GenBank. Nucleic Acids Res 30, 17-20 (2002)) sequence records were available for 8,018 clones in the set. The records indicated 4,967 clones categorized as finished, 2,069 working draft clones, 365 in-progress clones and 569 low-pass clones. BES coordinates were available for 10,213 of the clones (31 %
of clones in the set), providing a localization scaffold. 1,134 clones in the set had previously generated FISH data (Cheung, V.G. et al. Integration of cytogenetic landmarks into the draft sequence of the human genome. Nature 409, 953-8 (2001) available through the Cancer Genome Anatomy Project (CGAP) (Strausberg, R.L., Buetow, I~.H., Greenhut, S.F., Grouse, L.H. & Schaefer, C.F. The cancer genome anatomy project: online resources to reveal the molecular signatures of cancer. Ca~cce~
Invest 20, 1038-50 (2002)).
EXAMPLE 3: DETERMINATION OF MAP COVERAGE FOR THE SMRT
SET
Coverage of the fingerprint map by the clones in the set is more difficult to determine precisely than sequence coverage because the map scale metric is not locally linear with the sequence scale and because not all restriction fragments are detected by the fingerprinting method as noted in Marra, M.A. et al. High throughput fingerprint analysis of large-insert clones. Geuome Res 7, 10 72-84 (1997).
Map coverage was determined by analyzing the overlap distribution between map-adjacent clone picks and total number and depth of consensus band (cbmap) units that were covered by the rearray set. On average, a canonical map clone selected at random from the map overlaps with its best match in the clone set by I 10 kb; 90% of these random clones overlap by more than 80kb with their best matches. Any map clone can therefore be associated with one or more clones from the clone set which provide equivalent map coverage. Each unit on the cbmap unit scale corresponds to a single detected fingerprint fragment and cbmap distances cannot be directly related to a distance in sequence coordinate space. Each clone in a fingerprint map contig is positioned relative to all other clones using cbmap coordinates. Regions in the fingerprint map that do not have representation in the clone tiling set may indicate gaps in coverage.
Out of 31,039 map-adjacent pairs of clones, the average number of conserved bands was 12. The conserved overlap corresponds to 56 kb. Out of the 31,039 pairs, 553 had fewer than 3 conserved bands. These pairs border on map regions in which coverage by RPCI-I 1, RPCI-13 and Caltech-D clones is iow and from where additional clones could not be selected. To determine the overall representation of the map in the clone tiling set, all remaining canonical clones from the map not selected for the rearray were compared to the rearray set. For each map clone, the top 10 hits to the clone tiling set, as ranked by the Sulston score, were extracted and the top hit in the same contig was identified as the closest match. Fingerprints were compared using a standard mobility tolerance of 7. The number of shared bands, overlap and Sulston score between the map clone and its closest match in the rearray was examined.
Using the cbmap scale as discussed above, it was determined that the clones in the tiling set do not cover 7,700 cbmap units (1.3%) out of a total of 609,000 units in the map. These regions of the fingerprint map were comprised solely by clones from libraries other than RPCI-11, RPCI-13 or Caltech-D. A cbmap unit was considered represented in the tiling set if it was spanned by the coordinates of a clone in the tiling set. Using all fingerprint map clones with BES hits, it was determined that the ratio of cbmap units to restriction fragments was 1:1.43. This ratio is not unity because the map was generated using sanitized fingerprints and because fragments are only reliably sized in the range of 0.6-30kb. Using this ratio, it was estimated that the clone tiling set did not cover approximately 11,000 restriction fragments (1.4%) out of a total of about 800,000 fragments in the sequence. This is likely to be an overestimate of the actual gaps because about 20% of fragments are outside of the reliable detection range of the fingerprinting method, because the fingerprints stored in the map were sanitized and the insert therefore may extend over more fragments than appear in the fingerprint and because there are artifactual gaps in the fingerprint map in which some joins between contigs have been recognized by sequence data but not confirmed by fingerprinted clones.
The depth of clone tiling set coverage in the fingerprint map can be approximated by the cbmap unit coverage, with the assumption that the ratio of cbmap units to sequence digest fragments is relatively constant over large map distances. In the fingerprint map, 40% of cbmap units were covered by only one clone in the clone tiling set and another 44% were covered by the overlapping region of two clones. The remaining 15% were covered by 3 or more clones and 1.3% of the cbmap units were not covered. The average coverage depth based on map unit calculations was determined to be 1.8X. A similar analysis of coverage depth carried out with sequence coordinates corroborates this result (see Example 4).
EXAMPLE 4: DETERMINATION OF SEQUENCE COVERAGE FOR THE
SMRT SET
In order to maximize the resolving power of the tiling set, the determination of the precise position on the June 2002 UCSC sequence assembly of every clone in the tiling set was attempted. While 10,213 of the clones in the set were already positioned on the basis of their BES hits, the remaining 22,220 required placement.
Sequence coverage was calculated by first determining precise sequence coordinates for as many clones as possible. The validation fingerprints of the clones were used to localize them to the genome in the following manner: for each clone, the region of the sequence from which the clone was derived was determined using the clone's map neighbours with BES hits. Five left map neighbours and five right map neighbours were identified and their BES hits were used to demarcate a region of the genome.
Only neighbours whose BES hits landed on the same chromosome as the majority of the clones in the map contig were used. The clone's own BES hits were not used in determining the sequence region in case the clone's map position was incorrect or the BES hit coordinates did not reflect the actual position of the clone or were actually associated with another clone (Zhao, S. et al. Human BAC ends quality assessment and sequence analyses. Genomics 63, 321-32 (2000)). Approximately 250 clones were located in 66 contigs which did not contain BES- or assembly-anchored clones.
The clone neighbourhood was enlarged by 1 Mb in both directions to minimize the effect of local inconsistencies. The neighbourhood assembly was digested ivy silico and the fingerprint of a sliding window of 100 fragments, created every 10 fragments, was matched to the clone's fingerprint. For the window position which corresponded to an i~c silico fingerprint that matched the most fragments, a sliding subwindow, having 5 more fragments than the clone, was created every 2 fragments. The fingerprint matching was performed with a tolerance of 2% for fragments < l5kb, 3% for fragments 15-25 kb and 4% for fragments >25kb in size. This tolerance profile approximates a standard mobility tolerance of 7, the cutoff used to generated the fingerprint map. The subwindow which matched the most fragments was used to determine the clone coordinates. The clone was determined to start/end at the first fragment which was part of a 6 matched-fragment run, with no more than 3 unmatched fragments in between. These in silico anchors were accepted only in cases where (i) over 80% of the fragments in the anchor matched the clone's fingerprint, (ii) over 40% of the fragments in the clone matched fragments in the subwindow (in regions of the assembly where sequence is incomplete only a fraction of the insert may match), (iii) the anchor was at least 10 fragments in size and (iv) the anchor could not be larger than the clone by more than 50% of the clone's map length. The average ih silico anchor size was 4711 fragments with 954°/~ matched fragments.
The average number of fragments shared between the clone and its anchor was 848%
(151~26kb). In total, the fingerprint-based method provided an increase in the precision of localization for 29,539 clones.
In order to validate this method, clones which were already localized in the sequence by their BES hits were also localized using the fingerprint based approach.
The fingerprint-based anchoring was evaluated using the 9,463 clones which had both fingerprint-based and BES-based coordinates. On average the interval defined by the sequence coordinates derived from the fingerprint-based approach was 11 kb smaller than the interval defined by the BES hits. This difference is due to the conservative nature of the in silico localization algorithm, which tended to yield coordinates which were subsumed by the interval formed by the BES hits. The average difference in left end, middle position and right end between ih silico and BES coordinates were 211, 814 and -414 kb. On average, 965% of a clone's in silico coordinate interval overlapped with the BES coordinates, and on average 928% of the BES coordinate overlapped with the ih silico interval.
The final sequence coordinate for a clone was taken as the BES coordinates, where available, in preference to the fingerprint-derived coordinates. Where BES
coordinates were not available, fingerprint-derived coordinates were used. Finally, if those could not be located, Golden Path clone assembly coordinates (Kent, W.J. & Haussler, D.
Assembly of the working draft of the human genome with GigAssembler. Genome Res 11, 1541-8 (2001)) were used, if available. Clone assembly coordinates were chosen only if other coordinates were not available because they may significantly underestimate the size of the clone in cases where the entire insert was not sequenced.
In a similar fashion but to a lesser extent, the ih silico coordinates were generated conservatively and may underestimate the extent of the insert of the clone.
To assist in determining coverage by the clones in the tiling set in regions covered by clones for which sequence coordinates could not be determined, the fingerprint map was used in the following manner: for every fingerprint map contig, an undirected graph of overlapping clones was created, separating the contig into strongly connected components. Two clones shared an edge in the same component if they had at least 4 conserved bands and overlapped by more than 5 cbmap units. Clones which did not overlap by 5 cbmap units had to share at least 6 conserved bands. For each strongly connected component, the left most and right most clone with sequence coordinates was used to locate the component within the assembly and this region was considered covered by the tiling set.
In total, either BES, ih silico or assembly coordinates exist for 30,561 clones, including 20,076 clones which had no BES coordinates. For 1,872 clones in the SMRT set, all found in the fingerprint map, precise sequence coordinates could not be unambiguously determined. These clones had no BES data, did not belong to the assembly, and had fingerprints which failed to significantly match the assembly region expected to contain the clone. It is possible that localization for these clones failed due to the existence of genomic regions which are not completely represented in the sequence assembly. Contig gaps in the assembly, which are regions lacking sequence information dividing assembled regions that are not joined by paired-end reads, were not used in the determination of sequence coverage of the clone set. These contig gaps currently represent 0.230 Gb (8%) out of a total assembly size of 3.042 Gb.
Using the 30,561 clones with sequence coordinates, it was determined that 2.788 Gb (>99%) of assembled sequence (June 2002) was represented by the clone set (Figure 2).
Clones not found in the physical map, selected to fill in gaps greater than l Okb, contribute to unique regions totalling 35 Mb towards the total coverage. It was estimated, therefore, that the map itself covers 2.753 Gb, or at least 98% of the assembled sequence. This was greater than the previously predicted coverage of 96%, computed using analysis of chromosomes 21 and 22 (McPherson, J.D. et al. A physical map of the human genome. Nature 409, 934-41 (2001)).
In order to determine the spatial resolving power of the clone set, all unique intersections between clones in the set were computed using their sequence coordinates (Figure 3). This process can be visualized as locating both ends of each clone on the sequence coordinate line, performing this step for each clone in the set, and evaluating the distances between closest end positions. This distance between adjacent ends, which we call a clone cover, defines the smallest resolvable region.
There are a total of 57,876 clone covers with an average cover size of 47 kb and an effective resolution of 77 kb, which is given by a weighted average of the clone cover size, where the weights are given by the fraction of the sequence represented in covers of a given size. This is the highest theoretical resolution achievable with this set in FISH or array-CGH experiments (Pinkel, D. et al. Nat Ge~ret 20, 207-11 (1998), Snijders, A.M. et al. Nat Gehet 29, 263-4 (2001) and Fiegler, H. et al. DNA
microarrays for comparative genomic hybridization based on DOP-PCR
amplification of BAC and PAC clones. Genes Chromosomes Cancer 36, 361-74 (2003)). 95% of the genome represented by the clone set can be resolved at a level of 150 kb, or better.
The average sequence overlap between neighbouring clones is 8344 kb (mean ~
std.
dev.) which corresponds to 5026% of the length of the clones.
The average coverage depth of the clone set based on sequence coordinates is 1.8X.
The ratio of 1X to 2X coverage is approximately 1:1, with 1.077 Gb of the assembly covered by regions spanned by a single clone and 1.141 Gb by regions spanned by two overlapping clones. Coverage at 3X spans 0.350 Gb and deep coverage of 4X+
spans 0.151 Gb. Coverage at high depth in the tiling set occurs in regions where additional clones were added to the set to replace clones that failed validation, as it was not always possible to find a single clone providing equivalent coverage during the replacement process. In. summary, 39% of the genome was covered at IX, 42% of the genome was covered at 2X, 13% of the genome was covered at 3X, 4% of the genome was covered at 4X, 1 % of the genome was covered at SX, and less than 1 % of the genome was covered at greater than 6X. Less than 1 % of the genome was not covered by the tiling set.
Excluding regions in the assembly for which sequence information is not available, such as in contig gaps, there are 729 gaps in sequence coverage in the clone set totaling 24 Mb. The gaps are formed by regions for which coverage cannot be achieved by using RPCI-11, RPCI-13 and Caltech-D clones. Some of these gaps were formed by removing the 413 clones from the set which failed the second round of validation. Replacements for these clones can be added to minimise the gaps in sequence coverage. There are 334 gaps (i.e. 45% of all gaps) smaller than 10 kb which total I .1 Mb. The determination of gaps likely represents an overestimate of the actual gaps in the tiling set because the i~c silieo anchors were conservatively calculated, and because assembly coordinates for clones are typically smaller than their insert size, The representation of telomeric regions was evaluated using 164 unique BAC
telomeric markers from RPCI-11/13 and Caltech-D from the Human Telomere Sequencing and Mapping Project (King, Y. et al. A complete set of human telomeric probes and their clinical application. National Institutes of Health and Institute of Molecular Medicine collaboration. Nat Genet 14, 86-9 (1996)). The SMRT set contains 45 of these telomeric BACs, and remaining BACs overlap with the best match from the clone set at an average of I00 kbp (22 shared fingerprint fragments).
Telomere regions are known in the art to be difficult to isolate and sequence, therefore, precise determination of telomere representation is difficult. As the SMRT
set represents full coverage of the fingerprint map the issue of telomere and centromere coverage may be mitigated, although some genomic regions unclonable in BACs or not present in the map may not be represented in this particular SMRT
set.
Updated Assessment of Tiling Set Sequence Coverage An updated assessment of the sequence coverage of the 32,433 clone tiling set was conducted based on the April 2003 version of the UCSC genome browser database containing the RPCI and Caltech BAC clone libraries. Clones were initially matched to their sequence position on the UCSC map to determine their position and percentage overlap. Each BAC in the tiling set was remapped to its current position on the April 2003 version of the database as indicated by BAC end sequencing.
It was determined that less than 1% of the human genome was not covered by the tiling set.
S I % of the genome was covered by at least single clone and 19% of the genome had three or more clones covering the specific sequence. Clone positions that did not match the fingerprint defined position were remapped accordingly. Sequence coverage gaps were found to contain less than 1 % of the genome.
EXAMPLE 5: HIGH-THROUGHPUT AUTOMATABLE METHOD FOR
ISOLATION OF BAC DNA
BAC genomic DNA corresponding to the 32,433 BAC genomic DNA clones in the SMRT set prepared as described in the previous Examples was prepared as follows:
Bacterial cultures containing BAC DNA genomic clones were grown in 96-well blocks according to standard protocols. Solutions were prepared as follows for each 96 well block. For solution I (GET/RNaseA, 150 ~g/ml), 330 ~.1 of 10 mg/ml RNaseA was mixed with 21.67 ml of cold GET. GET was measured and RNaseA
was added, the cylinder covered with parafilm and inverted several times to mix and stored on ice. Solution II (1.0% SDS/0.2 N NaOH) was prepared as follows. 2.2 ml of 10% w/v SDS was added to an appropriately sized bottle, followed by the addition of 19.4 ml of water, and 0.44 ml NaOH. The bottle was capped and inverted to mix.
Solution II was stored at room temperature.

96 well block containing BAC preparation were thawed for 30 minutes 200 ~1 of cold GET/RNAse was added and the plates sealed with Edge Biosystems clear tape.
Pellets were resuspended completely, 200 ~.1 of solution I was added and allowed to stand for minutes. 200 ~l of cold 3 M KOAc pH5.5 was added and the plates were resealed 5 and shaken at 1100 rpm for 3 minutes. Plates were centrifuged for 45 minutes at 5250g.
400u1 of the lysate supernatant was then aspirated. If pellets lodged on the tips, it was gently rubbed off with a Kimwipe. The lysate was then transferred to the pre-filled collection plate containing 300u1 isopropanol and mixed using a Hydra. The labels were transferred and the plates sealed with an Edge Biosystems plastic sealer.
the plates were then centrifuged for 15 minutes at 2830g.
Hydras were precleaned with ddH20, and the needles gently scrubbed with a brush to remove the white, gummy residue. The Hydras were washed with 2% bleach then with ddH20 4 times. Beckman blocks were washed by removing the pellet with the block washer then using 2% bleach solution to scrub the blocks three times per well with a test-tube brush. The blocks were rinsed thoroughly with tap water then ddH20.
The isopropanol and lysate mixture was removed from the collection plate by rapidly inverting the plate over sink and then gently shaking and blotting on paper towel.
200u1 of 80% ethanol (freshly made from 95% ethanol stock) was added to the DNA
pelletand then removed immediately by gentle shaking. The plate was blotted briefly on a paper towel then the towel changed and the plate allowed to air dry until no drops were present. The plates were spin-dried plates at medium heat and the pellets resuspended in 60u1 sterile ddH20. Plates were sealed and incubated for 10 minutes at 37°C. Plates were stored at 4°C.
Each of the approximately 32,433 BAC clones was fingerprinted after BAC DNA
preparation for verification of clone identity and quality of preparation as described in Gehome Res. 1997 Nov;7(11):1072-84.
EXAMPLE 6: PREPARATION OF SMRT POOLS FROM CLONES IN THE
SMRT SET

The following protocol is summarised in a flow diagram shown in Figure 1A.
All liquid transfer steps involved the use of a HydraPP (Matrix technologies).
DNA
transfer steps employed a 12 channel pipette. For each transfer the syringes were washed by pipetting (3 times) 2% bleach followed by 2 washes of ddH20. At each stage of preparation 3 clones of each 96 well plate were spot-checked by running 2u1 of the sample on a gel to ensure proper DNA concentration and size. In all steps involving heating above room temperature Microseal B (MJ Research) sealing pads were used to control evaporation.
Approximately fifty nanograms of each BAC DNA sample (prepared as in Example 5) was transferred to a 96 well plate and digested for approximately eight hours with 5 U
of MseI (New England Biolabs) in a 40 ~,1 reaction. The reaction mixture was inactivated at 65°C for 10 min. Ten percent of the product was transferred to a new plate and ligated to the primer-linkers. The ligation mixture consisted of the digested DNA, 0.2 ~M primers each of MseI long (5'-AGTGGGATTCCGCATGCTAGT-3', SEQ ID NO:1) and MseI short (5'-TAACTAGCATGC-3', SEQ ID N0:2) (Alpha DNA, Quebec) and 80 U of T4 DNA ligase in NEB ligase buffer (New England Biolabs). The primers were allowed to anneal for 5 minutes at room temperature before addition to the ligation mix. The ligation was performed overnight (12-16 h) at 16°C.
Approximately 2.5 ~,l of the 40 ~,1 ligation mixture was amplified in a 50 ~.1 PCR
reaction (PCR1). The reaction mixture contained the linker-ligated DNA
template, 8 mM MgCl2, 1 mM each dNTPs (Promega), 0.4 ~,M MseI long primer (modified at the 5' end with an amino group), and 5 U of Taq polymerase (Promega, storage buffer B) in Promega PCR buffer. After a 3 minute 95°C denaturation step, the PCR
cycled at 95°C for 1 minute, 55°C for 1 minute, and 72°C for 3 minutes, for 30 cycles. A IO
minute extension at 72°C completed the protocol. 5 ~,1 of PCRl was transferred to 95 ~.1 of ddH20 (a 1 in 20 dilution). The second round of PCR (PCR2) was initiated using 2.5 ~1 of the diluted PCRl product under the same conditions for 35 cycles.
The yield of final PCR product concentration was typically 40-50 ~,g and yielded products from approximately 100-2000 by in length. The PCR products were precipitated by adding 2.5 volumes 100% ethanol and incubating for %2 hr at room temperature and mixed by inverting 3 times sealed with Microseal A (MJ
Research) and pulse spun down. The PCR products were then centrifuged at 27508 for 45 minutes at 4°C. The pellets were washed with 70% ethanol fox 15 minutes, then centrifuged at 27508 for 15 minutes at 4°C. The pellets were air dried at 55 °C for about 30 minutes. The pellet was re-suspended in 25 ~.l distilled water and incubated overnight at room temperature. The final concentration of DNA was quantified using a ND-1000 spectrophotometer (Nanodrop, Delaware). Typical yield for LMPCR was 40-50 fig. All products were subsequently sealed with aluminum foil tape and stored at -20°C or below to inhibit evaporation. Thawing of the plates required resealing the aluminum foil tape to ensure consistent volume then pulse spinning at 8008.
No purification steps were required prior to round 1 or round 2 of PCR
therefore preventing unnecessary liquid transfer between 96-well PCR plates.
This.greatly reduced the risk of contamination and increased the speed of production.
EXAMPLE 7: IDENTIFICATION OF SMRT POOLS BY HIGH
THROUGHPUT END SEQUENCING
The method is a modification of terminal end sequencing, and can be practiced because within each large collection of PCR products in one SMRT pools, a pair of fragments contain BAC vector sequence starting at the MseI cut site and continuing up to the genomic DNA cloning site, followed by a short stretch of unique genomic sequence terminating at the most proximal MseI cut site (see Figure 1B). The SMRT
pools were identified by sequencing as follows.
960 clones from the RPCI-11 or RPCI-13 human BAC libraries were randomly selected. After SMRT pools generation as described in Example 3, 4% of each unpurified SMRT pools was sequenced using the BAC vector T7 primer (5'-TAATACGACTCACTATAGG-3', SEQ ID N0:3) using the following protocol.
2 ~l from a 50 ~l PCR2 was combined with 4 ~.1 Big Dye (Applied Biosystems), 1 ~,1 of 3.2 pmol primer in 10 ~.1 final volume. Following denaturation at 95°C for lmin, thermal cycling was performed for 85 cycles of 95°C l5sec, 50°C
5sec, and 72°C
4min. All steps were ramped at 1 °C/sec using an MJ Research Peltier thermocycler.
The 10 ~,l product can be precipitated using standard ethanol precipitation or run through a Qiagen mini-prep kit (for example PCR Min-elute) for cleaning.
Sequencing reaction products were resolved using an ABI Model 377 or ABI Model 3700 sequencer (Applied Biosystems).
Sequences were analyzed using NCBI BLAST to query the norrredundant (nr) and high throughput genomic sequences (htgs) database of GeneBank v.2.2.5. The FTP
version of BLAST was downloaded and a script written to allow all 960 sequences to query automatically. Expect values (E values) of 0.001 and bit scores of 30 were used as the minimum allowed cut off.
Half (468) of the SMRT pools yielded sequences and 448 of these were matched to specific BAC clone sequences. Twenty matched repetitive sequences, representing multiple GeneBank entries.
Since the SMRT pools were generated via a ligation mediated PCR (LMPCR) protocol involving MseI restriction digested BAC DNA, some of the failed sequence reads may be attributed to the presence of an MseI site downstream of the primer sequence that would truncate primer extension (Figure 4). To obtain a usable sequence return, the MseI restriction site must be a significant distance from the sequencing primer, preferably greater than 50 nucleotides before MseI recognizes the sequence TTAA.
To determine if the probability of identifying the LMPCR product increased with use of the Sp6 primer (5'-ATTTAGGTGACACTATAG-3', SEQ ID N0:4), 83 SMRT
pools were sequenced using the protocol outlined above. Of the 83 SMRT pools sequenced, 64 returned usable sequences and 60 of these were matched to a specific BAC. Four matched repetitive sequences, representing multiple GeneBank entries.
Combining the results from the Sp6 and T7 sequence reads, it was possible to identify 76 of the 83 SMRT pools (91 %).

High throughput SMRT pool sequencing allows identification of 91 % of the clones in a clone set when using both the Sp6 and T7 primers. Sequencing of three clones from a plate with the T7 primer allows an 85% determination of plate identity while using Sp6 or both allows 97% and 99.9%, respectively (Figure 5). For large clone sets the sequencing of all SMRT pools is desirable but may be prohibitive due to the significant cost associated with large scale sequencing. A cost effective alternative, is to sequence three clones per 96 well plate for both forward and reverse BAC
primers.
Direct sequencing of SMRT pools verified all 96 well plates in the test set.
The ability to sequence unrefined PCR products and the requirement of only 4%
of the SMRT pools makes direct end sequencing of SMRT pools an effective means of verifying array spotting solution.
EXAMPLE 8: IDENTIFICATION OF SMRT POOLS BY SOUTHERN
HYBRIDIZATION
The use of Southern analysis to verify BAC clones for array construction has previously been described (Osoegawa I~, Mammoser AG, Wu C, Frengen E, Zeng C, Catanese JJ, de Jong PJ: A bacterial artificial chromosome library for sequencing the complete human genome. Genome Res 2001,11:483-496). This method was modified, as described below, for the identification of SMRT pools. DNA was prepared from overnight cultures of BAC clones. Two hundred nanograms of HindIII
digested BAC DNA fragments were separated by electrophoresis on a 1 % agaxose gel.
The separated fragments were transferred to a Hybond-N+ membrane as recommended by the manufacturer (Amersham). One microlitre of SMRT pools (~1 fig) was labeled with a32P-dATP using the RadPrime random priming system (Invitrogen). The labeled probes were precipitated in ethanol with (or without) 50 ~g Cot-1 DNA (Invitrogen) and redissolved in 15 ~.1 of hybridization solution (50%
formamide, 2X SSC, 10% dextran sulfate, 4% SDS). The probe was denatured at 80°C for 10 min and allowed to cool to 37°C for 2 h before addition to the prehybridized membrane. Hybridization was performed at 65°C overnight in the presence of 0.5 ~g/~1 of sheared herring sperm DNA (Invitrogen). Washes were performed at 65°C with BufFer 1 (5 mg/ml BSA, 0.5 mM EDTA, 40 mM
NaaHP04 (pH 7.2), 5% SDS) followed by Buffer 2 (2 mM EDTA, 80 mM NazHP04 (pH 7.2), 2% SDS). Autoradiographs were generated from phosphoimager plates and analyzed using the STORM 860 system (Amersham).
Hybridization of the SMRT pools to the HindIII digested BAC clone allowed accurate identification (Figure 6). For example, the SMRT pools derived from the BAC
clone RP 11-104F 13 hybridized to the correct BAC detecting all HindIII fragments showing complete representation but did not hybridize to the RP11-104F14, excluding the common vector bands (Figure 6C). However, in the absence of Cot-1 DNA the SMRT
pools cross hybridized to multiple fragments on the wrong clone digest due to the presence of repetitive elements (Figure 6B). Southern analysis therefore requires the presence of Cot-1 DNA increasing the cost associated with this assay.
EXAMPLE 9: IDENTIFICATION OF SMRT POOLS BY FLUORESCENCE
IN SITU HYBRIDIZATION (FISH) Selected SMRT pools were identified by FISH using metaphase chromosomes. Two microlitres of SMRT pools (~2 fig) was labelled by random priming overnight in the presence of 2 nmol of Cy3-dCTP, Cy5-dCTP (Perkin Elmer), FITC-dUTP, or Texas Red-dUTP using the BioPrime kit (Invitrogen) as per manufacturer directions.
The labeled probe was purified using a Sephadex G-50 column, combined with 21 ~,g of Cot-1 DNA and precipitated with ethanol. The labeled probe was then resuspended in 80 ~l of hybridization buffer (50% formamide, 2X SSC, 10% dextran sulfate, 0.1%
Tween-20, 10 mM Tris pH 7.4) and denatured for 5 min at 100°C. The metaphase slide was dehydrated through a series of 70%, 80%, and 100% ethanol washes for min each, denatured in 70% formamide in 0.6X SSC for 2 min at 70°C and processed through the same ethanol series at -20°C and allowed to dry. Thirty-five microlitres of probe was then added to the slide and hybridized overnight at 37°C.
Images were processed with Qcapture (Q-imaging, Vancouver) with a Zeiss Axioscope microscope.
The results of this experiment are shown in Figure 7. Metaphase FISH analysis allowed mapping of the SMRT pools to a chromosomal region but did not provide positive identification. This raises uncertainty when verifying a large clone set since many SMRT pools will map to the same genomic location within the resolution of FISH on metaphase chromosomes. One concern is if the BAC contains elements which map to multiple areas in the genorne, a BAC may hybridize to multiple chromosomal regions even when Cot-1 blocked.
EXAMPLE 10: PREPARATION OF A WHOLE GENOME SMRT ARRAY
A SMRT array was constructed consisting of the 32,433 BAC clone SMRT set prepared as described in Examples 1-4.The SMRT pools to be spotted on the array were prepared from the SMRT set as described in Example 5 and identified as described in Example 6. each of the final SMRT pools was redissolved in 75 ~.1 of 0.83X MSP printing solution (Telechem), then denatured by boiling for 5 minutes in a PCR thermocycler, and rearrayed for robotic printing in triplicate using a VersArray ChipWriter Pro (BioRad). This arrayer used a 12 ~ 4 array of SMP2.5 Stealth Micro Spotting Pins (TelechemlArrayIT) depositing DNA spots of 0.8 n1 at approximately 1 ~g ~l'1 at'133-~m distances. The entire set of 32,433 SMRT pool solutions was spotted in triplicate onto two aldehyde-coated slides.
EXAMPLE 11: ARRAY SENSITIVITY OF WHOLE GENOME SMRT
ARRAY
To assess the sensitivity of the whole genome SMRT array, the well-characterized EBV-transformed lymphoma cell line TAT-1 (test) was hybridized to normal male genomic DNA (reference). TAT-1 lymphoma cell DNA was prepared as follows.
400 ng of test and reference DNA was labelled separately with Cyanina-3 and Cyanine-5 dCTPs according to a random priming protocol previously described (Garnis, C., Baldwin, C., Zhang, L., Rosin, M.P. & Lam, W.L. Use of complete coverage array CGH to define copy number alterations on chromosome 3p in oral squamous cell carcinomas. Cancer Res. 63, 8582-8585 (2003)). Before hybridization, the DNA probes were combined and purified using ProbeQuant Sephadex G-50 Columns (Amersham) to remove unincorporated nucleotides. 200 ~,g of human Cot-DNA (Invitrogen) was added, the mixture was precipitated and resuspended in 100 ~1 of DIG Easy hybridization solution (Roche) containing sheared herring sperm DNA
(Sigma-Aldrich) and yeast tRNA (Calbiochem). The probe was denatured at 85 °C for min and repetitive sequences were blocked at 45 °C for 1 h before hybridization.
Prehybridization was carried out in the same buffer. The probe mixture was applied to 5 the slide surface, the coverslips were fixed and slides were incubated at 42 °C for 36 h. The arrays were washed five times for 5 min each in 0.1 ~ saline sodium citrate, 0.1% SDS at room temperature with agitation. Each array was rinsed repeatedly in 0.1 ~ saline sodium citrate and dried by centrifugation.
Genomic regions containing BCL2 (18q21) and MYC (8q24) in TAT-1 were 10 previously shown to have a twofold copy-number increase by FISH analysis (Denyssevych, T. et al. Establishment and comprehensive analysis of a new human transformed follicular lymphoma B cell line, Tat 1. Leukerrcia 16, 276-283 (2002)).
These previously reported amplifications were detected at both loci, and their boundaries were delineated (Figure 8). Boundaries of amplification on chromosome 8 were between BAC clone RPl 1-143H8 at 8q22.2 and RP11-263C20 at 8q24.13.
Boundaries of amplification on chromosome 18 were between BAC clone RP11-159K14 at 18q21.32 and RP11-565D23 at 18q23. These data illustrate the detection sensitivity of array CGH.
EXAMPLE 12: RESOLUTION OF SMRT ARRAY COMPARED TO
CONVENTIONAL CGH
To demonstrate the resolving power of the whole genome SMRT array, the loge ratio profile of lung cancer cell line H526 (Levin, N.A. et al. Identification of frequent novel genetic alterations in small cell lung carcinoma. Cancer Res. 54, 5086-(1994); Girard, L. et al. Genome-wide allelotyping of lung cancer identifies new regions of allelic loss, differences between small cell lung cancer and non-small cell lung cancer, and loci clustering. Cancer Res. 60, 4894-4906 (2000).) was compared to the previously published conventional chromosomal CGH data (see URLs). Test (H526 cell line genomic) and reference (normal male genomic) DNA were labelled and hybridized to the whole genome tiling resolution array as described in Example 11 (Figure 9a).

The results are shown in Figure 9. All patterns of gains and losses were matched, including large changes (e.g., the amplification of 7q and 8q and loss of the entire chromosome 10) and complex changes (e.g., the multiple amplifications on chromosome l and the multiple deletions on chromosome 4). Notably, conventional chromosomal CGH identified a highly amplified region on the telomeric end of chromosome arm 2p, apparently covering approximately one-fourth of the whole chromosome. However, the whole genome tiling resolution array analysis showed this amplification to be precisely localized to a 1.3-Mb fragment at 2p24.3, bordered by BAC clones RP11-351F4 and RP11-701010, which contains the MYCNoncogene.
The resolving power of this whole-genome array enabled the definition of breakpoints to within single BAC clones. For example, the deletion breakpoint on chromosome arm 3p was localized to between BAC clones RP11-63205 and RP11-594F16 at 3p21.1 (Figure 9b). This finding was also confirmed by FISH analysis (Figure 9c).
EXAMPLE 13: COMPARISON OF SMRT ARRAY WITH PREVIOUS

To compare the whole genome tiling resolution array against current array CGH
technology, the colorectal cancer cell line COL0320 (Quinn, L.A., Moore, G.E., Morgan, R.T. & Woods, L.I~. Cell lines from human colon carcinoma with unusual cell products, double minutes, and homogeneously staining regions. Cancer Res.
39, 4914-4924 (1979).) was profiled. This cell line has been characterized in two previous array CGH studies (Snijders, A.M. et al. Assembly of microarrays for genome-wide measurement of DNA copy number. Nat. Genet. 29, 263-264 (2001);
Wessendorf, S. et al. Automated screening for genomic imbalances using matrix-based comparative genomic hybridization. Lab. I~west. 82, 47-60 (2002).) The amplification at 8q24 in the 1VIYC region identified by these studies was also confirmed using the whole genome resolution tiling array. Furthermore, this array system further defined this segmental copy-number increase precisely to a 1.9-Mb region bordered by BAC clones RP 11-81 OD23 and RP 11-294P7 (Figure 10).
A detailed analysis of the COL0320 profile identified new microamplifications on chromosome alms 13q, 15q, 16p and 22q (Figure 11), which were not detected by the two previous high-resolution CGH studies. For example, a 300-kb microamplification at 13q12.2 was identified, containing only three genes (according to University of California Santa Cruz Genome Browser, April 2003 Freeze):
caudal type homeobox transcription factor 2 (CDX2), insulin promoter factor 1 (IPFI ) and GS homeobox 1 (GSHI ; Figure 12a). CDX2 is a transcription factor expressed in the intestine and altered in colorectal cancers (Kim, S. et al. PTEN and TNF-alpha regulation of the intestinal-specific Cdx-2 homeobox gene through a PI3K, PKB/Akt, and NF-kappaB-dependent pathway. Gastroevcterology 123, 1163-1178 (2002).).
FISH analysis verified this microamplification and showed that it was within a homogeneously staining region (Figure 12b). These findings illustrate the usefulness of a tiling resolution BAC array for comprehensive assessment of genomic integrity.
EXAMPLE 14: IDENTIFICATION OF MINUTE REGIONS OF
ALTERATION BY SMRT ARRAY ANALYSIS
In addition to microamplifications, small deletions were also detected in a number of tumor cell lines. For example, a 1.25-Mb deletion was detected, containing the gene CDKN2A (also called p1 ~ in lymphoma cell line Z138C at 9p21.3 (Figure 13a).
Deletion of CDKN2A occurs in approximately one-half of mantle cell lymphoma tumors as detected by FISH (Dreyling, M.H. et al. Alterations of the cyclin D1/pl6-pRB pathway in mantle cell lymphoma. Cahce~ Res. 57, 4608-4614 (1997)). This deletion is bordered by RP11-328C2 and RP11-275H17 (Figure 13a). Submegabase-sized microdeletions can be accurately mapped in a single whole-genome array CGH
experiment. This is made possible by the overlapping clone coverage and their distribution on the array. A notable example is a 240-kb deletion at 7q22.3 in the breast cancer cell line BT474, containing PRKAR2B, a regulatory kinase, and HBPI , a G1 inhibitory kinase regulated by p38 MAP kinase (Xiu, M. et al. The transcriptional repressor HBP1 is a target of the p38 mitogen-activated protein kinase pathway in cell cycle regulation. Mol. Cell. Biol. 23, 8890-8901 (2003). ) (Figure 13b). Such microdeletions have not been reported previously. Figure 14 shows a further example of a microdeletion detected using the whole genome tiling resolution array.
This figure shows a microdeletion at 6q24.3-ter in the HCC15 adenocarcinoma cell line.
The average array logz ratio for this deletion was -0.85 versus an ideal expected two copy deletion ratio of -1Ø Previous data confirms a deletion at this locus (Girard, L.
et al. Genome-wide allelotyping of lung cancer identifies new regions of allelic loss, differences between small cell lung cancer and non-small cell lung cancer, and loci clustering. Cancer Res. 60, 4894--4906 (2000)).
Examples 12 -I4 show how small, previously unidentified alterations that have the potential to contribute to disease may easily be identified in a single whole genome SMRT array experiment.
EXAMPLE 15: SMRT ARRAY IMAGING AND ANALYSIS
Hybridized slides were imaged using a CCD-based imaging system (Arrayworx eAuto, Applied Precision) and analyzed with SoftWoRx Tracker Spot Analysis software.
The ratios of the triplicate spots were averaged and standard deviations (s.d.) were calculated. All spots with s.d. >0.075 or signal-to-noise ratios <20 were removed from the analysis. Custom viewing software (SeeGH) was used to visualize all data as loge ratio plots where each dot represents one BAC.
Reference male versus reference female hybridization detected no unexpected gains or losses, and random variability of loge ratios were not observed (Figure 15).
Furthermore, owing to overlapping clone coverage, a single clone with aberrant signal ratio would not be considered an amplification or deletion. Finally, since the clones are not spotted in the order of their map position, adjacent clones are distributed throughout the array.
EXAMPLE 16: GENERATION OF A HIGH RESOLUTION SMRT SET, SMRT LIBRARY AND SMRT ARRAY REPRESENTING THE 8q21-24 REGION OF THE HUMAN GENOME
166 BAC clones were selected from the RPCI-11 library. All BAC clones were mapped to the tiling assembly of The International Human Genome Mapping Consortium (2001). In addition, clones were referenced to the Nov 2002 assembly of the UCSC Genome Browser based on their end sequences. The centromeric position was estimated based on BLAST search of the termini of the BAC sequence retrieved from GenBank, which may or may not reach the ends, against the UCSC sequence assembly. To construct a BAC array, a contig map of the 8q21-24 region was built S using FPC (finger printed contigs) software (available from the SangerInstitute website) and the fingerprint data of 400,000 human BAC clones (The International Human Genome Mapping Consortium, 2001). Relative positions of the BAGs were confirmed by cross referencing sequence based BAC contig assemblies [the Ensembl Genome Browser (Clamp et al., 2003. Nucleic Acids Research 31:38-42), the NCBI
Map Viewer (Wheeler D, et al. 2003. Nucleic Acids Research 31:28-33) and the UCSC Genome Browser (Kent et al., 2002. Genome Research 12:996-1006;
Karolchik et al., 2003. Nucleic Acids Research 31:51-S4)]. The positions of all microsatellite markers were verified by BLAST alignment against corresponding BACs. A contiguous set of 166 BAC clones spanning a ~S2 Mbp region over 8q21-1 S was selected from the RPCI-11 library (see Table 1). Hind III fingerprints of each BAC were compared with ih silico FPC fingerprints to verify BAC identity.
Table 1: BAC Clones used to Prepare a High Resolution SMRT set Representing 8q21-24 (RP11 Name) 478b1S 23b3 S7Sc14 701e3 6SSf3 276112 678p19 411n11 746017 . SSOd23 88j8 388k12 38117 397d14 789c10 SS6m8 70Sf1 638114 336d3 167h19 662e23 678p19 30j11 627b21 499k24 S6S13 772p22 S88i19 789018 804a7 lSj4 674p23 103p22 700g17 706b10 166h20 106c12 464k11 788e17 4SS112 27c8 1S8a13 682f2 108g16 680g2 S87h10 81211 272k10 193j8 2707 61j1 811118 8lbS 4S8n4 S62d1 802k12 63Sj23 6S9a24 367c1S 293h22 76k18 106e2 381b23 67n21 407115 563e23 346h21 1105 30p9 261n12 44n12 97d1 790j24 429m12 I58kI

400j2 41916 122x21 812b1 100dIS

281d17 109c19 419120 72mS 1S6o22 S7Sk20 12k18 696p8 164h21 622011 312m14 72815 24e9 4k16 64SeI0 342n20 lSOp21 238110 77Sb1S r 382a18 313116 229k21 11408 22x24 234a3 326m20 3So14 5803 760h22 18k20 I43h8 1c8 45x11 534kI3 26e5 61616 619f16 3x12 S67n24 374b17 321e7 132e3 11x18 SSla23 324f11 1S7g3 171013 S9a1 414d17 S66j8 7S3m19 437b2 621b3 711b6 284h18 403h17 8Sm22 393k19 11Sm9 724b14 S21i2 619019 124cS 78Sh20 6SaS

703fZ0 191h13 9p20 398g24 642x1 486h20 165x15 558b2 94x24 37S114 113d3 19110 616h10 262017 39Sg23 280k14 784f18 619c16 680f23 S36k17 140b16 BAC DNA was isolated using the Nucleo Bond Plasmid Maxi Kit (BD Biosciences, Palo Alto, CA) and subjected to a linker mediated PCR procedure as described in Example 6 in order to produce sufficient material for array construction.
Briefly, Msel-digested BAC DNA was ligated to linkers (5'-AGTGGGATTCCGCATGCTAGT-3' [SEQ ID NO:1] and 5'-TAACTAGCATGC-3' [SEQ ID N0:2]) and amplified by PCR. An aliquot of each PCR product was further amplified to produce sufficient DNA for spotting.
The amplified DNA, dissolved in a 20% DMSO solution, was denatured by boiling for min and rearrayed for robotic printing in triplicate using a VersArray ChipWriter 10 Pro (BioRad, Mississauga, ON, Canada) with Stealth Micro Spotting Pins (TelechemlArrayIT SMP2.5, Sunnyvale, CA, USA) producing 100 ~,m features at ~,m spacing on slides coated with amine substrates. Linker-mediated PCR-amplified male human genomic DNA samples (Novagen, Madison, WI, USA) were spotted on the array in order to allow normalization of the hybridization signal intensities between dyes. The DNA was covalently bonded to the slides by baking and UV
crosslinking. Slides were washed to remove unbound DNA. 100 ng of test and reference (normal diploid male) genomic DNA were separately labeled using Cyanine 3 and Cyanine 5 dCTPs respectively using the BioPrime DNA labeling system (Invitrogen, Burlington, ON). Blocking of repetitive sequences was accomplished by incubation with 100 ~.g of human Cot-1 DNA.
Arrays were pre-hybridized at 42°C with DIG Easy hybridization solution (Roche, Mississauga, ON) containing 1% BSA and 2 ~,g/~,L sheared herring sperm DNA.
Denatured probes in hybridization buffer containing 6 ~,g/~,L yeast tRNA were applied to the array and hybridized at 42°C for 36 hours. Arrays were washed repeatedly with O.1XSSC/0.1% SDS in the daxk at room temperature. Hybridized arrays were imaged using a CCD based imaging system and analyzed using the Softworx array analysis program (Arrayworx eAuto, API, Issaquah, WA). Spot signal data for each channel was normalized by applying a scale factor which balanced the signal intensities of the human genomic DNA control spots. Additionally, 96 randomly selected clones scattered throughout the genome are included as control spots. The average signal ratios and standard deviation for each triplicate spot set were calculated and displayed as a plot of the normalized cy5/cy3 log 2 signal ratio versus relative tiling path position. A loge signal ratio of zero at a spot represents equivalent copy number between a sample and reference DNA. An amplicon was defined as a region of clones with a local average signal ratio above the baseline.
EXAMPLE 17: USE OF THE HIGH RESOLUTION SMRT ARRAY
REPRESENTING THE 8q21-24 REGION OF THE HUMAN GENOME TO
DETECT AMPLIFICATIONS IN GENOMIC DNA DERIVED FROM ORAL
TUMOURS
Archival dysplasia and tumor samples for array CGH and microsatellite analysis (formalin-fixed paraffin-embedded tissue blocks) were obtained from the British Columbia Oral Biopsy Service and the diagnoses were confirmed by an oral pathologist. Dysplastic, tumor and adjacent stromal cells were microdissected from hematoxylin-and eosin-stained sections and DNA was isolated as described previously (Zhang et al., (1997). Am. J. Pathol., 151, 323-327). Samples for gene expression analysis were collected within 15 minutes of surgery and frozen in liquid nitrogen.
RNA was extracted from microdissected OSCCs and a panel of normal oral epithelium samples from individuals without cancer (Chomczynski P and Sacchi N.
(1987). Anal. Biochem., 162, 156-159). Test and reference DNA samples were labelled and hybridized to the array, and the slides were analyzed as described in Example 16.
Array CGH experiments detected amplifications occurring within 8q21-24 in oral cancer. At least two separate regions of amplification were observed centromeric to the amplification at 8q24, which contains MYC. Figure 16 illustrates three patterns of alteration: no amplification (Fig. 16B); amplification of the entire tiling set (8q21-24) (Fig. 16C); and multiple amplifications at 8q22, separate from 8q24 (Fig. 16D
and E).
Alterations at 8q22 were confirmed by microsatellite analysis.
Of 22 formalin-fixed paraffin-embedded oral squamous cell carcinomas (OSSCs) that were microdissected and analysed using the microarray described above, array CGH
analysis detected genetic alteration in 18 cases. No alteration in copy number was observed in four of the cases, while five cases showed an increase in copy number for the entire region (all 166 BACs). The remaining 13 cases showed two distinct regions of amplification at both 8q22 and 8q24. The amplification at 8q24 contains the MYC
oncogene. The alignment of the 8q22 alterations in the 13 cases delineated a minimal region of alteration (MRA) of approximately 5.3 Mbp (Figure 17A and B) with a centromeric boundary at BAC RP11-346H21 and a telomeric boundary at BAC RPl 1 680F23. Figure 17c shows an'array CGH profile of case 574T. Four cancer-related genes reside in the 8q22 region, although, none have been previously associated with oral cancer. This example demonstrates the use of high-resolution array CGH to identify a recurrent 5.3 Mbp amplified region at 8q22 that is present in oral tumours.
EXAMPLE 18: USE OF A HIGH RESOLUTION SMRT ARRAY
REPRESENTING THE 8q21-24 REGION OF THE HUMAN GENOME TO

BRONCHIAL CARCINOMA IN SITU LESIONS
Array construction, hybridization and analysis were carried out as described in Example 16. One hundred nanograms of microdissected bronchial carcinoma in situ (CIS) sample and reference (normal diploid male) genomic DNA were separately labeled using cyanine 3 and cyanine 5 dCTPs, respectively, and used to probe the arrays.
Eight formalin-fixed paraffin-embedded archival CIS specimens were microdissected and analyzed. Of the eight CIS samples profiled one showed no significant copy number changes and one showed amplification of the entire tiling set. The remaining six samples showed segmental copy number alterations. Amplification of only the region containing MYC (8q24) was observed in one of the samples while the other five showed segmental copy number increase and decreases in addition to the region at 8q24 containing MYC. Figure 18 shows two examples of alterations distinct from MYC. CIS 59 shows two distinct regions, one at 8q21 (centered at BAC clone 575C14) and the other at 8q24 (centered at BAC clone RP11-382A18). CIS 60 shows at least three distinct regions of alteration two of which align with those observed in CIS 59 in addition to a separate region at 8q22 (centered at BAC clone RP11-35014).
The region observed at 8q22 coincides with that observed as an early event in oral cancer progression Garnis C, et al. Novel regions of amplification on 8q distinct from the MYC locus and frequently altered in oral dysplasia and cancer, Genes Chromosomes Cancer 2004 Jan;39(1):93-8. Amplification on chromosome 8q is a common event in cancer. However, high resolution array CGH as described above allows for the delineation of multiple regions of alteration on 8q in oral tumors.
EXAMPLE 19: PREPARATION OF A SMRT SET REPRESENTING THE
6q16-q21 REGION OF THE HUMAN GENOME
A comprehensive BAC clone map spanning the region 6q16.2 through 6q21 was constructed, and a minimal filing set of 43 RP-11 BACs was identified from the fingerprinted contigs database (International Human Genome Mapping Consortium, 2001) (see Table 2). This map was confirmed using the April 2003 version of the sequenced-based UCSC genome browser. Markers used in previous studies for the identification of regions of loss have been mapped onto this resource. A
series of 13 BAC clones spaced at intervals across this contig were used as FISH probes to determine the approximate size and location of 6q deletions in the selected follicular lymphoma cases. BAC, PAC (P 1-derived artificial chromosome), and YAC (yeast artificial chromosome) clones from additional libraries were also used as FISH
probes, for a total of 17 probes. Figure 19 shows the relative locations of all FISH
probes. The 43 RP-11 BAC clones can be used as a SMRT set for the generation of SMRT pools using the method described in Example 6, which are suitable for use as FISH probes (see, Henderson et al.,Genes, Chromosomes ~ Cancer 40:60-65 (2004)).
Table 2. BAC Clones used to Prepare a Minimal SMRT Set spanning the Region 6q16.2 to 6q21 N0758C21 * N034026*

N0262H21 * N074I07 N0798N02* I N0559D02 NOl 17M4 N0560B22 NO 11 SF22 * N043 8N24 N0061P22* N0398H08*

NO 14E21 * N0685G11 *

N0274C 15 N0009N 15 *

N0016I22 N0660M21 *

N0423D 17 N0431I16*

N0006M9 NO 114N01 *

NOSSSI23* N005I020 EXAMPLE 20: USE OF SMRT POOLS FOR FISH ANALYSIS
The SMRT pools generated using the methods of the invention can be used.as FISH
probes (see Figure 20, which shows the results from a standard FISH
hybridization protocol. (GibcoBRL Part # Y01393) using a SMRT pool product from BAC clone 127C12 (Green) and 619019 (Red). Briefly, two microlitres of each SMRT pool (~2 fig) was labelled by random priming overnight in the presence of 2 nmol of Cy3-dCTP, Cy5-dCTP (Perkin Elmer), FITC-dUTP, or Texas Red-dUTP using the BioPrime kit (Invitrogen) as per manufacturer's directions. The labelled probe was purified using a Sephadex G-50 column, combined with 21 ~g of Cot-1 DNA and precipitated with ethanol. The labelled probe was then resuspended in 80 ~.1 of a hybridization buffer containing 50% formamide, 2X SSC, 10% dextran sulfate, 0.1%
Tween-20, 10 mM Tris pH 7.4, and denatured for 5 min at 100°C. The metaphase slide was dehydrated through a series of 70%, 80%, and 100% ethanol washes for min each, denatured in 70% formamide in 0.6X SSC for 2 min at 70°C and processed through the same ethanol series at -20°C and allowed to dry. Thirty-five microlitres of probe was then added to the slide and hybridized overnight at 37°C.
Images were processed with Qcapture (Q-imaging, Vancouver) with a Zeiss Axioscope microscope.
EXAMPLE 21: USE OF SMRT ARRAYS TO ANALYZE DNA
METHYLATION
Changes in the methylation of genomic DNA in a lymphoblast cell line derived from a normal individual were detected using the general procedure depicted in Figure 22.
Methylated-DNA-specific probes were created and labeled with Cy5 and the reference DNA was labeled with Cy3. The methylated and reference DNA was hybridized to a whole genome SMRT array.
The results are shown in Figure 21. The boxed region in this figure shows where Elongih A3, a reported imprinted gene, is hypermethylated in a lymphoblast cell line derived from a normal individual. This result illustrates the usefulness of SMRT
arrays in detecting epigenetic changes in the genome, such as methylation.
The disclosure of all patents, publications, including published patent applications, and database entries referenced in this specification are specifically incorporated by reference in their entirety to the same extent as if each such individual patent, publication, and database entry were specifically and individually indicated to be incorporated by reference.
The embodiments of the invention being thus described, it will be obvious that the same may be varied in many ways. Such variations are not to be regarded as a departure from the spirit and scope of the invention, and all such modifications as would be obvious to one skilled in the art are intended to be included within the scope of the following claims.

DEMANDES OU BREVETS VOLUMINEUX
LA PRESENTE PARTIE DE CETTE DEMANDE OU CE BREVETS
COMPRI~:ND PLUS D'UN TOME.
CECI EST L,E TOME 1 DE 2 NOTE: Pour les tomes additionels, veillez contacter 1e Bureau Canadien des Brevets.
JUMBO APPLICATIONS / PATENTS
THIS SECTION OF THE APPLICATION / PATENT CONTAINS MORE
THAN ONE VOLUME.

NOTE: For additional valumes please contact the Canadian Patent Office.

Claims (30)

1. A method of preparing a submegabase resolution library comprising a collection of synthetic nucleic acid fragment pools, each of said synthetic nucleic acid fragment pools corresponding to a genomic clone, said method comprising the steps of:
a) selecting a set of genomic clones from at least one library of genomic clones, each of said clones comprising a genomic insert, wherein between about 17 by and about 1,500 bp of the sequence of the genomic insert of at least 95% of the clones in the set overlaps with the sequence of the genomic insert of an adjacent genomic clone; and b) preparing a synthetic nucleic acid fragment pool from each genomic clone in the set by fragmenting the genomic clone to produce nucleic acid fragments; and amplifying said fragments to generate a synthetic nucleic acid pool.
2. The method according to claim 1, wherein the genomic clones in the set comprise a genomic insert between about 15kB and about 300 kB in size.
3. A submegabase resolution library comprising a collection of synthetic nucleic acid fragment pools, wherein said library is prepared by the method according to claim 1 or 2.
4. An array comprising one or more submegabase resolution library according to claim 3.
5. A method of preparing a submegabase resolution tiling set of genomic clones representing at least a portion of a genome, said method comprising selecting a set of genomic clones from at least one library of genomic clones representing said genome, each of said clones containing a genomic insert, wherein between about 17 bp and about 1,500 bp of the sequence of the genomic insert of at least 95% of the clones in the set overlaps with the sequence of the genomic insert of an adjacent genomic clone.
6. The method according to claim 5, wherein said tiling set comprises greater than 4,000 genomic clones.
7. The method according to claim 5, wherein said tiling set covers more than mB of said genome.
8. A method of preparing a synthetic nucleic acid fragment pool from a genomic clone comprising:
(a) preparing genomic clone DNA;
(b) fragmenting genomic clone DNA to produce DNA fragments; and;
(c) amplifying said DNA fragments to generate a SMRT pool, wherein step b) or step c) comprises one or more dilution-processing steps.
9. The method according to claim 8, further comprising:
(d) confirming the identity of each synthetic nucleic acid fragment pool.
10. A high throughput method for determining the identity of a genomic clone having a genomic insert, said method comprising the steps of:
(a) preparing a solution comprising at least 20 fmol of said genomic clone, a primer labelled with a detectable label and amplification reagents;
(b) submitting said solution to between 65 and 100 cycles of thermal amplification to provide an amplified solution;
(c) submitting said amplified solution to sequence analysis to determine a sequence of at least 17 base pairs in length of said genomic insert; and (d) comparing said sequence to a reference database in order to determine the identity of said genomic clone.
11. The method according to claim 10 further comprising the step of fragmenting said genomic clone to provide a plurality of clone fragments prior to step (a).
12. The method according to claim 11 further comprising the step of amplifying said plurality of clone fragments prior to step (a).
13. An array providing a representation of a tiling set of genomic clones, said array comprising a plurality of pools of synthetic nucleic acid fragments deposited on one or more solid support, wherein each pool is derived from one of said genomic clones and is present at one or more distinct locations on said one or more solid support, and wherein between 17 bp and 1,500 bp of the sequence of the genomic insert of at least 95% of the clones in said tiling set overlaps with the sequence of the genomic insert of an adjacent genomic clone
14. The array according to claim 13, wherein the genomic clones in the set comprise a genomic insert between about 15kB and about 300 kB in size.
15. The array according to claim 13 or 14, wherein said tiling set comprises at least 4,000 genomic clones.
16. The array according to any one of claims 13, 14 or 15, wherein said array represents an entire genome.
17. The array according to any one of claims 13, 14 or 15, wherein said array represents more than one genome.
18. The array according to any one of claims 13, 14, 15, 16 or 17, wherein said array further comprises one or more nucleic acids selected from the group of:
viral DNA, plasmid DNA, oligonucleotides and reference DNA.
19. A method of preparing an array comprising the steps of:
(a) selecting a set of genomic clones from at least one library of genomic clones, each of said clones containing a genomic insert, wherein between 17 bp and 1,500 bp of the sequence of the genomic insert of at least 95%
of the clones in the set overlaps with the sequence of the genomic insert of an adjacent genomic clone;
(b) preparing a synthetic nucleic acid fragment pool from each genomic clone in the set by fragmenting the genomic clone to produce nucleic acid fragments; and amplifying said fragments to generate a synthetic nucleic acid pool, and (c) depositing each of said synthetic nucleic acid pools onto a solid support at one or more distinct locations.
20. Use of one or more submegabase resolution library according to claim 3 to prepare an array.
21. Use of the submegabase resolution library according to claim 3 to prepare one or more probes.
22. Use of the array according to any one of claims 13, 14, 15, 16, 17 or 18 for comparative genome hybridization analysis.
23. The use according to claim 22, wherein said comparative genome hybridization analysis is used to detect genetic alterations, epigenetic changes, evolutionary genomic changes.
24. The use according to claim 23, wherein said epigenetic changes are genomic DNA methylation.
25. The use according to claim 23, wherein said genetic alteration is selected from the group of segmental polymorphisms, chromosomal rearrangements and translocations.
26. The use according to claim 22 in combination with chromatin immunoprecipitation.
27. The use according to claim 26 for detection of chemical modification of chromosomes, identification of chromosome targets of DNA binding proteins or chromatin remodelling proteins, or for identification of genomic sites at which DNA binding proteins interact.
28. Use of the array according to any one of claims 13, 14, 15, 16, 17 or 18 for the diagnosis of disease, determination of predisposition to disease, determination of resistance to treatment, or to enable the selection of a treatment regime.
29. Use of the array according to any one of claims 13, 14, 15, 16, 17 or 18 for the analysis of gene expression.
30. Use of the array according to any one of claims 13, 14, 15, 16, 17 or 18 for the identification of novel genes.
CA002570068A 2003-06-12 2004-06-14 Methods for preparation of a library of submegabase resolution tiling pools and uses thereof Abandoned CA2570068A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US32026603P 2003-06-12 2003-06-12
US60/320,266 2003-06-12
PCT/CA2004/000859 WO2004111267A2 (en) 2003-06-12 2004-06-14 Methods for preparation of a library of submegabase resolution tiling pools and uses thereof

Publications (1)

Publication Number Publication Date
CA2570068A1 true CA2570068A1 (en) 2004-12-23

Family

ID=33551152

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002570068A Abandoned CA2570068A1 (en) 2003-06-12 2004-06-14 Methods for preparation of a library of submegabase resolution tiling pools and uses thereof

Country Status (2)

Country Link
CA (1) CA2570068A1 (en)
WO (1) WO2004111267A2 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2003303396A1 (en) 2002-12-23 2004-07-22 Agilent Technologies, Inc. Comparative genomic hybridization assays using immobilized oligonucleotide features and compositions for practicing the same
US8321138B2 (en) 2005-07-29 2012-11-27 Agilent Technologies, Inc. Method of characterizing quality of hybridized CGH arrays
US20110301862A1 (en) * 2010-06-04 2011-12-08 Anton Petrov System for array-based DNA copy number and loss of heterozygosity analyses and reporting
CN114842911B (en) * 2022-06-21 2022-09-20 深圳市睿法生物科技有限公司 Gene detection process optimization method and device based on precise medical treatment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000053811A1 (en) * 1999-03-11 2000-09-14 Orion Genomics, Llc Genome chips and optical transcript mapping
US20030087231A1 (en) * 2000-05-19 2003-05-08 Albertson Donna G. Methods and compositions for preparation of a polynucleotide array

Also Published As

Publication number Publication date
WO2004111267A2 (en) 2004-12-23
WO2004111267A3 (en) 2005-06-09

Similar Documents

Publication Publication Date Title
CA2804450C (en) 3-d genomic region of interest sequencing strategies
JP2004524044A (en) High-throughput genome analysis method using microarray with restriction site tag
CN108699598A (en) By composition and method that the cytimidine for determining modification is sequenced
US20110045462A1 (en) Digital analysis of gene expression
CN105238859B (en) A kind of method for obtaining chicken full-length genome high density SNP marker site
WO2001057272A2 (en) Human genome-derived single exon nucleic acid probes useful for analysis of gene expression in human placenta
EP2121977A2 (en) Circular chromosome conformation capture (4c)
WO2000040755A2 (en) Method for accelerating identification of single nucleotide polymorphisms and alignment of clones in genomic sequencing
KR20140024378A (en) Method and product for localised or spatial detection of nucleic acid in a tissue sample
WO2000047767A1 (en) Oligonucleotide array and methods of use
US20020094518A1 (en) Determining signal transduction pathways
AU2005238489A1 (en) Kits and reagents for use in diagnosis and prognosis of genomic disorders
US20040023237A1 (en) Methods for genomic analysis
US20040014086A1 (en) Regulome arrays
WO2005079357A2 (en) Nucleic acid representations utilizing type iib restriction endonuclease cleavage products
Khorasani et al. A first generation physical map of the medaka genome in BACs essential for positional cloning and clone-by-clone based genomic sequencing
CA2570068A1 (en) Methods for preparation of a library of submegabase resolution tiling pools and uses thereof
US20090263798A1 (en) Method For Identification Of Novel Physical Linkage Of Genomic Sequences
AU2003275377A1 (en) Subtelomeric dna probes and method of producing the same
US20070003929A1 (en) Method for identifying, analyzing and/or cloning nucleic acid isoforms
JP2002532070A (en) Arrays and methods for analyzing nucleic acid sequences
US20040029161A1 (en) Methods for genomic analysis
Lombardi et al. Expression analysis of subtractively enriched libraries (EASEL): a widely applicable approach to the identification of differentially expressed genes
JP2005304481A (en) Fixative array of genomic dna and method for detecting chromosomal aberration and disease caused by the aberration using the array
JP2004512494A (en) Method and apparatus for estimating, confirming and displaying functional information derived from a genome sequence

Legal Events

Date Code Title Description
EEER Examination request
FZDE Discontinued