WO2009073167A2 - Identification and diagnosis of pulmonary fibrosis using mucin genes, and related methods and compositions - Google Patents

Identification and diagnosis of pulmonary fibrosis using mucin genes, and related methods and compositions Download PDF

Info

Publication number
WO2009073167A2
WO2009073167A2 PCT/US2008/013271 US2008013271W WO2009073167A2 WO 2009073167 A2 WO2009073167 A2 WO 2009073167A2 US 2008013271 W US2008013271 W US 2008013271W WO 2009073167 A2 WO2009073167 A2 WO 2009073167A2
Authority
WO
WIPO (PCT)
Prior art keywords
muc5ac
protein
gene
sequence
mucin
Prior art date
Application number
PCT/US2008/013271
Other languages
French (fr)
Other versions
WO2009073167A3 (en
Inventor
David A. Schwartz
Lauranell Burch
Mark P. Steele
Aretha J. Herron
Kevin Brown
Marvin I. Schwarz
James E. Lloyd
Marcy Speer
Original Assignee
The Government Of The United States Of America As Represented By The Secretary Of The Department Of Health And Human Services
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Government Of The United States Of America As Represented By The Secretary Of The Department Of Health And Human Services filed Critical The Government Of The United States Of America As Represented By The Secretary Of The Department Of Health And Human Services
Publication of WO2009073167A2 publication Critical patent/WO2009073167A2/en
Publication of WO2009073167A3 publication Critical patent/WO2009073167A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/136Screening for pharmacological compounds
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/172Haplotypes

Definitions

  • FIELD This disclosure relates to genetic analysis and screening for identification and diagnosis of pulmonary fibrosis.
  • it relates to use of variation in MUC5AC, a mucin gene, to identify and/or diagnosis individuals having or at risk for developing pulmonary fibrosis.
  • IIP interstitial pneumonias
  • UPF interstitial pneumonia
  • IPF OMIMl 78500
  • Interstitial lung disease also results from environmental exposures such as inhalation of fibrogenic dusts or air-borne organic antigens including exposures such as coal dust, wood or metal dust, mold, silica, and cigarette smoke.
  • Latent herpesvirus infections have been associated with an increased risk of this disease.
  • Smoking has long been considered an important risk factor for the development of IPF, and we have shown that cigarette smoking is also a risk factor in the development of FIP. It is likely that complex interactions between genes and environmental exposures are involved in the development of IIP. Identifying the underlying genetic risks may help to focus studies on environmental exposures, disease pathogenesis, and targeted interventions.
  • mucin is a protective glycoprotein in the airway that can prevent or inhibit the development of pulmonary fibrosis.
  • Genetic variations are identified, in the sequences encoding MUC5 AC (a primary mucin secreted in the normal human airway) that are associated with risk or development of the disease.
  • Important SNPs in MUC5AC are displayed in table 1 below that are associated with either FIP or IPF in our initial resquencing of this gene and 7 of the most promising SNPs were subsequently validated in an independent cohort (table 2 below).
  • table 1 A comprehensive table of all MUC5AC SNPs identified by resequencing is presented in the claims section of this document. All P values have been corrected for gender.
  • Amino Acid Position from UNIProtKB/Swiss-Prot MUC5ACJiuman (P98088). t P ⁇ 0.05; Fisher's exact test (two-tailed) % P ⁇ O.01 ; Fisher's exact test (two-tailed)
  • Re-Sequencing study populations 69 family-independent FIP cases, 96 unrelated IPF cases, and 54 spouse controls.
  • MAF minor allele frequency for re-sequencing cohort.
  • Amino acid position from UniProtKB/Swiss-Prot MUC5AC HUMAN (P98088) t P ⁇ 0.05; Fisher's exact test (two-tailed) $ P ⁇ O.01 ; Fisher's exact test (two-tailed)
  • Validation study populations 88 family-independent FIP cases, 136 unrelated IPF cases, and 54 spouse controls MAF: Minor allele frequency reported for both cohorts combined
  • Figure 1 shows multipoint LOD scores across the genome (chromosomes 1-22 and X) for all 82 families ( Figure IA) and specifically on chromosomes 10, 1 1, and 12 for three diagnostic categories (all families-dashed, homogenous families-blue, and heterogeneous families-green) for pulmonary fibrosis ( Figure IB).
  • Figure 2 shows a fine mapping LOD score graph for Chromosome 1 1, using >200 markers ⁇ 5 cM, in 242 individuals.
  • FIG 3 is a graph showing the SNP Map (LD tagged) of Chromosome 11 in an association study. The study involved 150 individuals with familial pulmonary fibrosis (FPF), 167 individuals with idiopathic pulmonary fibrosis (IPF), and 237 control individuals. Key: FPF vs. Controls ( ⁇ ); IPF vs. Controls ( ⁇ ).
  • Figure 4 is a graphic representation of the MUC5AC gene (solid boxes are exons), showing SNPs and Indels, based on analysis of 69 individuals with FIP, 96 individuals with IPF, and 54 control individuals. Non-synonymous SNPs, intronic Indels, and coding Indels are noted. Figure 5. Generation of Muc5ac deficient mice and their response to bleomycin.
  • mice were anesthetized, suspended by their upper incisors, and I 5 2, and 3 ⁇ m diameter yellow-green fluorescent microspheres (Fluospheres, Invitrogen-Molecular Probes, Carlsbad, CA) were instilled intratracheally using a Microsprayer (Perm Century, Philadelphia, PA). 100,000 fluospheres of each diameter were instilled in a 25 ⁇ l volume microspray. To isolate fluospheres after instillation, mice were euthanized by exsanguination under anesthesia, and the lungs and tracheae were removed and minced into 1-2 mm pieces. These were vortexed in 1 ml PBS, and sequentially extracted over 100 ⁇ m and 40 ⁇ m nylon mesh filters.
  • Extracted microspheres were then measured by quantitative flow cytometry. For this, 10,000 4 ⁇ m red fluospheres were added to 2 ml of filtrate, and the total numbers of yellow-green fluospheres 1-3 ⁇ m during the period in which 8,000 red Fluospheres were also counted.
  • Panel B particle elimination is calculated as the percent clearance of particles deposited at time zero. Data are presented as means ⁇ SEM from 4-6 mice per group. Data were analyzed by Student's t-test. Asterisk denotes a p-value of 0.05 compared to 15 min clearance data from bleomycin and PBS challenged WT and Muc5ac+I- mice and PBS challenged Muc5ac-I- mice. In Panels C-F, lung sections of Muc5ac-I- and Muc5ac+/- mice were stained for caspase 3, 21 days after bleomycin instillation.
  • Nucleic acid and/or amino acid sequences discussed or referenced herein are referred to by way of accession number from a public repository. It is understood that the corresponding sequence is incorporated by reference herein based on the sequence of that accession number in the referenced public database as of the date of filing of this provisional application. DETAILED DESCRIPTION
  • Double-stranded DNA has two strands, a 5' -> 3' strand, referred to as the plus strand, and a 3' -> 5' strand (the reverse complement), referred to as the minus strand. Because RNA polymerase adds nucleic acids in a 5' -> 3' direction, the minus strand of the DNA serves as the template for the RNA during transcription. Thus, the RNA formed will have a sequence complementary to the minus strand and identical to the plus strand (except that U is substituted for T).
  • Antisense molecules are molecules that are specifically hybridizable or specifically complementary to either RNA or the plus strand of DNA.
  • Sense molecules are molecules that are specifically hybridizable or specifically complementary to the minus strand of DNA.
  • Antigene molecules are either antisense or sense molecules directed to a dsDNA target.
  • Array An arrangement of molecules, particularly biological macromolecules (such as polypeptides or nucleic acids) or biological samples (such as tissue sections) in addressable locations on a substrate, usually a flat substrate such as a membrane, plate or slide. The array may be regular (arranged in uniform rows and columns, for instance) or irregular. The number of addressable locations on the array can vary, for example from a few (such as three) to more than 50, 100, 200, 500, 1000, 10,000, or more.
  • a "microarray” is an array that is miniaturized to such an extent that it benefits from microscopic examination for evaluation.
  • each arrayed molecule e.g., oligonucleotide
  • sample more generally, a "feature" of the array
  • each feature is addressable, in that its location can be reliably and consistently determined within at least two dimensions on the array surface.
  • location of each feature is usually assigned to a sample at the time when it is spotted onto or otherwise applied to the array surface, and a key may be provided in order to correlate each location with the appropriate feature.
  • ordered arrays are arranged in a symmetrical grid pattern, but samples could be arranged in other patterns (e.g., in radially distributed lines, spiral lines, or ordered clusters).
  • Arrays are computer readable, in that a computer can be programmed to correlate a particular address on the array with information (such as identification of the arrayed sample and hybridization or binding data, including for instance signal intensity).
  • information such as identification of the arrayed sample and hybridization or binding data, including for instance signal intensity.
  • the individual spots on the array surface will be arranged regularly, for instance in a Cartesian grid pattern, that can be correlated to address information by a computer.
  • sample application spot on an array may assume many different shapes.
  • spot refers generally to a localized deposit of nucleic acid or other biomolecule, and is not limited to a round or substantially round region.
  • substantially square regions of application can be used with arrays, as can be regions that are substantially rectangular (such as a slot blot-type application), or triangular, oval, irregular, and so forth.
  • shape of the array substrate itself is also immaterial, though it is usually substantially flat and may be rectangular or square in general shape.
  • Binding or interaction An association between two substances or molecules, such as the hybridization of one nucleic acid molecule to another (or itself).
  • Various methods can be used to detect binding of molecules, many of which are known to those of ordinary skill in the art. Specific examples of binding or interaction are described herein.
  • a labeled nucleic acid molecule binds to (interacts with) an immobilized nucleic acid molecule (probe) in one or more features of the array.
  • a labeled target molecule "binds" to a nucleic acid molecule in a spot on an array if, after incubation of the (labeled) target molecule (usually in solution or suspension) with or on the array for a period of time (usually 5 minutes or more, for instance 10 minutes, 20 minutes, 30 minutes, 60 minutes, 90 minutes, 120 minutes or more, for instance over night or even 24 hours), a detectable amount of that molecule associates with a nucleic acid feature of the array to such an extent that it is not removed by being washed with a relatively low stringency buffer (e.g., higher salt (such as 3 x SSC or higher), room temperature washes).
  • a relatively low stringency buffer e.g., higher salt (such as 3 x SSC or higher), room temperature washes.
  • Washing can be carried out, for instance, at room temperature, but other temperatures (either higher or lower) also can be used.
  • Targets will bind probe nucleic acid molecules within different features on the array to different extents, based at least on sequence homology, and the term "bind" encompasses both relatively weak and relatively strong interactions. Thus, some binding will persist after the array is washed in a more stringent buffer (e.g., lower salt (such as about 0.5 to about 1.5 x SSC), 55-65° C washes).
  • a more stringent buffer e.g., lower salt (such as about 0.5 to about 1.5 x SSC), 55-65° C washes).
  • probe and target molecules are both nucleic acids
  • binding of the test or reference molecule to a feature on the array can be discussed in terms of the specific complementarity between the probe and the target nucleic acids.
  • protein-based arrays where the probe molecules are or comprise proteins, and/or where the target molecules are or comprise proteins, and arrays comprising nucleic acids to which proteins/peptides are bound, or vice versa.
  • Biological Sample This term is intended to include tissues, cells and biological fluids, including biological fluids containing cells, that are isolated from a subject, as well as tissues, cells and fluids present within a subject.
  • cDNA A DNA molecule lacking internal, non-coding segments (e.g., introns) and regulatory sequences that determine transcription. By way of example, cDNA may be synthesized in the laboratory by reverse transcription from messenger RNA extracted from cells.
  • Complementarity and percentage complementarity Molecules with complementary nucleic acids form a stable duplex or triplex when the strands bind,
  • Complementarity is the degree to which bases in one nucleic acid strand base pair with the bases in a second nucleic acid strand. Complementarity is conveniently described by percentage, /. e. the proportion of nucleotides that form base pairs between two strands or within a specific region or domain of two strands.
  • oligonucleotide For example, if 10 nucleotides of a 15- nucleotide oligonucleotide form base pairs with a targeted region of a DNA molecule, that oligonucleotide is said to have 66.67% complementarity to the region of DNA targeted.
  • DNA deoxyribonucleic acid
  • DNA is a long chain polymer that contains the genetic material of most living organisms (the genes of some viruses are made of ribonucleic acid (RNA)).
  • the repeating units in DNA polymers are four different nucleotides, each of which includes one of the four bases (adenine, guanine, cytosine and thymine) bound to a deoxyribose sugar to which a phosphate group is attached.
  • Triplets of nucleotides (referred to as codons) code for each amino acid in a polypeptide, or for a stop signal.
  • codons code for each amino acid in a polypeptide, or for a stop signal.
  • the term "codon” is also used for the corresponding (and complementary) sequences of three nucleotides in the mRNA into which the DNA sequence is transcribed.
  • enriched means that the concentration of a material is at least about 2, 5, 10, 100, or 1000 times its natural concentration (for example), advantageously at least 0.01% by weight. Enriched preparations of about 0.5%, 1%, 5%, 10%, and 20% by weight are also contemplated.
  • EST Expressed Sequence Tag: A partial DNA or cDNA sequence, typically of between 200 and 2000 sequential nucleotides, obtained from a genomic or cDNA library, prepared from a selected cell, cell type, tissue or tissue type, organ or organism, which corresponds to an mRNA of a gene found in that library. An EST is generally a DNA molecule sequenced from and shorter than the cDNA from which it is obtained.
  • Fibrosis Formation or development of excess fibrous connective tissue in an organ or tissue as a reparative or reactive process, as opposed to formation of fibrous tissue as a normal constituent of an organ or tissue.
  • Fibrosis-related diseases include, but are not limited to: cystic fibrosis of the pancreas and lungs; endomyocardial fibrosis, idiopathic myocardiopathy; idiopathic interstitial pneumonia; idiopathic pulmonary fibrosis; cryptogenic organizing pneumonia; non specific interstitial pneumonia; acute interstitial pneumonia; hypersensitivity pneumonitis; familial interstitial pneumonia; respiratory bronchiolitis interstitial lung disease; desquamative interstitial lung disease; and diffuse parenchymal lung disease.
  • Fluorophore A chemical compound, which when excited by exposure to a particular wavelength of light, emits light (/. e. , fluoresces), for example at a different wavelength. Fluorophores can be described in terms of their emission profile, or "color.” Green fluorophores, for example Cy3, FITC, and Oregon Green, are characterized by their emission at wavelengths generally in the range of 515-540 ⁇ . Red fluorophores, for example Texas Red, Cy5 and tetramethylrhodamine, are characterized by their emission at wavelengths generally in the range of 590-690 ⁇ .
  • fluorophores examples include for instance: 4-acetamido-4'-isothiocyanatostilbene- 2,2'disulfonic acid, acridine and derivatives such as acridine and acridine isothiocyanate, 5- (2'-aminoethyl)aminonaphthalene-l -sulfonic acid (EDANS), 4-amino-N-[3- vinylsulfonyl)phenyl]naphthalimide-3,5 disulfonate (Lucifer Yellow VS), N-(4-anilino-l- naphthyl)maleimide, anthranilamide, Brilliant Yellow, coumarin and derivatives such as coumarin, 7-amino-4-methylcoumarin (AMC, Coumarin 120), 7-amino-4- trifluoromethylcouluarin (Coumaran
  • rhodamine and derivatives such as 6-carboxy-X- rhodamine (ROX), 6-carboxyrhodamine (R6G), lissamine rhodamine B sulfonyl chloride, rhodamine (Rhod), rhodamine B, rhodamine 123, rhodamine X isothiocyanate, sulforhodamine B, sulforhodamine 101 and sulfonyl chloride derivative of sulforhodamine 101 (Texas Red); N,N,N',N'-tetramethyl-6-carboxyrhodamine (TAMRA); tetramethyl rhodamine; tetramethyl rhodamine isothiocyanate (TRITC); riboflavin; rosolic acid and terbium chelate derivatives.
  • ROX 6-carboxy-X- rhod
  • fluorophores include GFP (green fluorescent protein), LissamineTM, diethylaminocoumarin, fluorescein chlorotriazinyl, naphthofluorescein, 4,7- dichlororhodamine and xanthene and derivatives thereof. Other fluorophores known to those skilled in the art may also be used.
  • High throughput genomics Application of genomic or genetic data or analysis techniques that use microarrays or other genomic technologies to rapidly identify large numbers of genes or proteins, or distinguish their structure, expression or function from normal or abnormal cells or tissues, or from cells or tissues of subjects with known or unknown phenotype and/or genotype.
  • Human Cells Cells obtained from a member of the species Homo sapiens.
  • the cells can be obtained from any source, for example peripheral blood, urine, saliva, tissue biopsy, surgical specimen, amniocentesis samples and autopsy material. From these cells, genomic DNA, mRNA, cDNA, RNA, and/or protein can be isolated.
  • Hybridization Nucleic acid molecules that are complementary to each other hybridize by hydrogen bonding, which includes Watson-Crick, Hoogsteen or reversed Hoogsteen hydrogen bonding between complementary nucleotide units.
  • adenine and thymine are complementary nucleobases that pair through formation of hydrogen bonds.
  • “Complementary” refers to sequence complementarity between two nucleotide units. For example, if a nucleotide unit at a certain position of an oligonucleotide is capable of hydrogen bonding with a nucleotide unit at the same position of a DNA or RNA molecule, then the oligonucleotides are complementary to each other at that position.
  • oligonucleotide and the DNA or RNA are complementary to each other when a sufficient number of corresponding positions in each molecule are occupied by nucleotide units which can hydrogen bond with each other.
  • “Complementary” is a term that indicates a sufficient degree of complementarity such that stable and specific binding occurs between an oligonucleotide and the DNA or RNA (or PNA) target.
  • An oligonucleotide need not be 100% complementary to its target nucleic acid sequence to be specifically hybridizable.
  • An oligonucleotide is specifically hybridizable when binding of the oligonucleotide to the target DNA or RNA molecule interferes with the normal function of the target DNA or RNA, and there is a sufficient degree of complementarity to avoid non-specific binding of the oligonucleotide to non-target sequences under conditions in which specific binding is desired, for example under physiological conditions in the case of in vivo assays, or under conditions in which the assays are performed.
  • Hybridization conditions resulting in particular degrees of stringency will vary depending upon the nature of the hybridization method of choice and the composition and length of the hybridizing DNA used. Generally, the temperature of hybridization and the ionic strength (especially the Na + concentration) of the hybridization buffer will determine the stringency of hybridization. Calculations regarding hybridization conditions required for attaining particular degrees of stringency are discussed by Sambrook et al. in Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press (1989), chapters 9 and 1 1 , herein incorporated by reference.
  • In vitro amplification Techniques that increase the number of copies of a nucleic acid molecule in a sample or specimen.
  • An example of in vitro amplification is the polymerase chain reaction, in which a biological sample collected from a subject is contacted with a pair of oligonucleotide primers, under conditions that allow for the hybridization of the primers to nucleic acid template in the sample.
  • the primers are extended under suitable conditions, dissociated from the template, and then re-annealed, extended, and dissociated to amplify the number of copies of the nucleic acid.
  • the product of in vitro amplification may be characterized by electrophoresis, restriction endonuclease cleavage patterns, oligonucleotide hybridization or ligation, and/or nucleic acid sequencing, using standard techniques.
  • in vitro amplification techniques include strand displacement amplification (see U.S. Patent No. 5,744,311); transcription-free isothermal amplification (see U.S. Patent No. 6,033,881); repair chain reaction amplification (see WO 90/01069); ligase chain reaction amplification (see EP-A-320 308); gap filling ligase chain reaction amplification (see U.S. Patent No. 5,427,930); coupled ligase detection and PCR (see U.S. Patent No. 6,027,889); and NASBATM RNA transcription-free amplification (see U.S. Patent No. 6,025,134).
  • Idiopathic Interstitial Pneumonia A group of lung diseases (including idiopathic pulmonary fibrosis, nonspecific interstitial pneumonia, respiratory bronchiolitis interstitial lung disease, desquamative interstitial pneumonia, and acute interstitial pneumonia), affecting the alveolar epithelium, pulmonary capillary endothelium, basement membrane, perivascular and perilymphatic tissues.
  • the term IIP is used to distinguish these diseases from obstructive airways diseases. Most types of IIP involve fibrosis, but this is not essential; indeed fibrosis is often a later feature. Hence the term pulmonary fibrosis has fallen out of favor.
  • Isolated An "isolated" biological component (such as a nucleic acid molecule, protein or organelle) has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, i.e., other chromosomal and extra-chromosomal DNA and RNA, proteins and organelles.
  • Nucleic acids and proteins that have been "isolated” include nucleic acids and proteins purified by standard purification methods. The term also embraces nucleic acids and proteins prepared by recombinant expression in a host cell as well as chemically synthesized nucleic acids.
  • Label Detectable marker or reporter molecules, which can be attached to nucleic acids. Typical labels include fluorophores, radioactive isotopes, ligands, chemiluminescent agents, metal sols and colloids, and enzymes. Methods for labeling and guidance in the choice of labels useful for various purposes are discussed, e.g. , in Sambrook et al. , in
  • Mucins A family of large, heavily glycosylated proteins. Some mucins are membrane-bound due to the presence of a hydrophobic membrane-spanning domain that favors retention in the plasma membrane, while others are secreted on mucosal surfaces and saliva. Mucin genes encode mucin monomers that are synthesized as rod-shape apomucin cores that are post-translationally modified by heavy glycosylation. The amino- and carboxy- terminal regions of mucins are lightly glycosylated, but rich in cysteines, which are proposed to be involved in establishing disulfide linkages within and between mucin monomers.
  • the central region of mucins is formed of multiple tandem repeats of 10 to 80 residue sequences, in which up to half of the amino acids are serine or threonine.
  • This region of the protein becomes post-translationally modified (glycosylated) with hundreds of O-linked oligosaccharides.
  • N-linked oligosaccharides are also found on mucins, but much less abundantly.
  • the dense glycosylation of mucins gives them considerable water-holding capacity, and makes them resistant to proteolysis. See also Perez-Vilar & Hill ("Mucin Family of Glycoproteins", Encyclopedia of Biological Chemistry (Lennarz & Lane, EDs.) Academic Press/Elsevier, Oxford, 2004, vol. 2, pp 758-764).
  • Mucins are secreted as massive aggregates of proteins with molecular masses of roughly 1 to 10 million Da. Within these aggregates, monomers are linked to one another mostly by non-covalent interactions, although intermolecular disulfide bonds may also play a role in this process. At least 19 human mucin genes have been distinguished by cDNA cloning: MUCl, 2, 3A, 3B, 4, 5AC, 5B, 6-9, 11-13, and 15-19.
  • the major secreted airway mucins are MUC5AC and MUC5B, while MUC2 is secreted mostly in the intestine but also in the airway (AH et al, Otolaryngol Head Neck Surg. 133(3):423-428, 2005).
  • Mutation Any change of the DNA sequence within a gene or chromosome, including specifically changes in non-coding regions of a chromosome, for instance changes in or near regulatory regions of genes. In some instances, a mutation will alter a characteristic or trait (phenotype), but this is not always the case. Types of mutations include base substitution point mutations (e.g., transitions or transversions), deletions, and insertions. Missense mutations are those that introduce a different amino acid into the sequence of the encoded protein; nonsense mutations are those that introduce a new stop codon.
  • mutations can be in-frame (not changing the frame of the overall sequence) or frame shift mutations, which may result in the misreading of a large number of codons (and often leads to abnormal termination of the encoded product due to the presence of a stop codon in the alternative frame).
  • This term specifically encompasses variations that arise through somatic mutation, for instance those that are found only in disease cells, but not constitutionally, in a given individual. Examples of such somatically-acquired variations include the point mutations that frequently result in altered function of various genes that are involved in development of cancers. This term also encompasses DNA alterations that are present constitutionally, that alter the function of the encoded protein in a readily demonstrable manner, and that can be inherited by the children of an affected individual. In this respect, the term overlaps with "polymorphism,” as discussed herein, but generally refers to the subset of constitutional alterations.
  • Nucleic acid A deoxyribonucleotide or ribonucleotide polymer in either single or double stranded form, and unless otherwise limited, encompassing known analogues of natural nucleotides that hybridize to nucleic acids in a manner similar to naturally occurring nucleotides.
  • Nucleic acid array An arrangement of nucleic acids (such as DNA or RNA) in assigned locations on a matrix, such as that found in cDNA arrays, or oligonucleotide arrays.
  • Nucleic acid molecules representing genes Any nucleic acid, for example DNA (intron or exon or both), cDN A or RNA, of any length suitable for use as a probe or other indicator molecule, and that is informative about the corresponding gene.
  • Nucleotide includes, but is not limited to, a monomer that includes a base linked to a sugar, such as a pyrimidine, purine or synthetic analogs thereof, or a base linked to an amino acid, as in a peptide nucleic acid (PNA).
  • a nucleotide is one monomer in a polynucleotide.
  • a nucleotide sequence refers to the sequence of bases in a polynucleotide.
  • Oligonucleotide A linear single-stranded polynucleotide sequence ranging in length from 2 to about 5,000 bases, for example a polynucleotide (such as DNA or RNA) which is at least 6 nucleotides, for example at least 10, 12, 15, 18, 20, 25, 50, 100, 200, 1,000, or even 5,000 nucleotides long. Oligonucleotides are often synthetic but can also be produced from naturally occurring polynucleotides. An oligonucleotide analog refers to moieties that function similarly to oligonucleotides but have non-naturally occurring portions.
  • oligonucleotide analogs can contain non-naturally occurring portions, such as altered sugar moieties or inter- sugar linkages, such as a phosphorothioate oligodeoxynucleotide.
  • Functional analogs of naturally occurring polynucleotides can bind to RNA or DNA, and include peptide nucleic acid (PNA) molecules. Such analog molecules may also bind to or interact with polypeptides or proteins.
  • PNA peptide nucleic acid
  • a first nucleic acid sequence is operably linked with a second nucleic acid sequence when the first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence.
  • a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence.
  • operably linked DNA sequences are contiguous and, where necessary to join two protein-coding regions, in the same reading frame.
  • ORF Open reading frame
  • PNA Peptide Nucleic Acid
  • PNA An oligonucleotide analog with a backbone comprised of monomers coupled by amide (peptide) bonds, such as amino acid monomers joined by peptide bonds.
  • Pharmaceutically acceptable carriers The pharmaceutically acceptable carriers useful with compositions provided herein are conventional. By way of example, Martin, in Remington 's Pharmaceutical Sciences, published by Mack Publishing Co., Easton, PA, 19th Edition, 1995, describes compositions and formulations suitable for pharmaceutical delivery of the molecules and agents, including but not limited to nucleotides and proteins, herein disclosed.
  • parenteral formulations usually comprise injectable fluids that include pharmaceutically and physiologically acceptable fluids such as water, physiological saline, balanced salt solutions, aqueous dextrose, glycerol or the like as a vehicle.
  • pharmaceutically and physiologically acceptable fluids such as water, physiological saline, balanced salt solutions, aqueous dextrose, glycerol or the like as a vehicle.
  • physiologically acceptable fluids such as water, physiological saline, balanced salt solutions, aqueous dextrose, glycerol or the like
  • solid compositions e.g. , powder, pill, tablet, or capsule forms
  • conventional non-toxic solid carriers can include, for example, pharmaceutical grades of mannitol, lactose, starch, or magnesium stearate.
  • compositions to be administered can contain minor amounts of non-toxic auxiliary substances, such as wetting or emulsifying agents, preservatives, and pH buffering agents and the like, for example sodium acetate or sorbitan monolaurate.
  • non-toxic auxiliary substances such as wetting or emulsifying agents, preservatives, and pH buffering agents and the like, for example sodium acetate or sorbitan monolaurate.
  • Polymorphism Variant in a sequence of a gene, usually carried from one generation to another in a population. Polymorphisms can be those variations (nucleotide sequence differences) that, while having a different nucleotide sequence, produce functionally equivalent gene products, such as those variations generally found between individuals, different ethnic groups, geographic locations.
  • the term polymorphism also encompasses variations that produce gene products with altered function, /. e. , variants in the gene sequence that lead to gene products that are not functionally equivalent. This term also encompasses variations that produce no gene product, an inactive gene product, or increased or increased activity gene product.
  • Polymorphisms can be referred to, for instance, by the nucleotide position at which the variation exists, by the change in amino acid sequence caused by the nucleotide variation, or by a change in some other characteristic of the nucleic acid molecule or protein that is linked to the variation (e.g., an alteration of a secondary structure such as a stem-loop, or an alteration of the binding affinity of the nucleic acid for associated molecules, such as polymerases, RNases, and so forth).
  • a change in some other characteristic of the nucleic acid molecule or protein that is linked to the variation e.g., an alteration of a secondary structure such as a stem-loop, or an alteration of the binding affinity of the nucleic acid for associated molecules, such as polymerases, RNases, and so forth.
  • Nucleic acid probes and primers can be readily prepared based on the nucleic acid molecules provided as indicators of susceptibility to pulmonary fibrosis or a related disease, condition or disorder. It is also appropriate to generate probes and primers based on fragments or portions of these nucleic acid molecules, particularly in order to distinguish between and among different alleles and haplotypes within a single gene. Also appropriate are probes and primers specific for the reverse complement of these sequences, as well as probes and primers to 5' or 3' regions.
  • a probe comprises an isolated nucleic acid attached to a detectable label or other reporter molecule.
  • Typical labels include radioactive isotopes, enzyme substrates, co-factors, ligands, chemiluminescent or fluorescent agents, haptens, and enzymes. Methods for labeling and guidance in the choice of labels appropriate for various purposes are discussed, e.g., in Sambrook et al. (In Molecular Cloning: A Laboratory Manual, CSHL, New York, 1989) and Ausubel et al. (In Current Protocols in Molecular Biology, John Wiley & Sons, New York, 1998).
  • Primers are short nucleic acid molecules, for instance DNA oligonucleotides 10 nucleotides or more in length. Longer DNA oligonucleotides may be about 15, 20, 25, 30 or 50 nucleotides or more in length. Primers can be annealed to a complementary target DNA strand by nucleic acid hybridization to form a hybrid between the primer and the target DNA strand, and then the primer extended along the target DNA strand by a DNA polymerase enzyme. Primer pairs can be used for amplification of a nucleic acid sequence, e.g., by the polymerase chain reaction (PCR) or other in vitro nucleic-acid amplification methods known in the art.
  • PCR polymerase chain reaction
  • Amplification primer pairs for instance, for use with polymerase chain reaction amplification
  • probes and primers can be selected that comprise at least 20, 23, 25, 30, 35, 40, 45, 50 or more consecutive nucleotides of a gene or sequence discussed herein.
  • isolated nucleic acid molecules that comprise specified lengths of nucleotide sequences, for instance sequences from MUC5AC or another gene, EST or non- coding sequence at 1 lpter. Such molecules may comprise at least 10, 15, 20, 23, 25, 30, 35, 40, 45 or 50 or more (e.g., at least 100, 150, 200, 250, 300 and so forth) consecutive nucleotides of these sequences or more. These molecules may be obtained from any region of the disclosed sequences (e.g., a specified nucleic acid may be apportioned into halves or quarters based on sequence length, and isolated nucleic acid molecules may be derived from the first or second halves of the molecules, or any of the four quarters, etc.). A cDNA or other encoding sequence also can be divided into smaller regions, e.g. about eighths, sixteenths, twentieths, fiftieths, and so forth, with similar effect.
  • Another mode of division is to divide a protein encoding sequence based on the regions of the sequence that are relatively more or less homologous to equivalent other sequences, such as homologous proteins from other species, or other proteins from a protein family.
  • Nucleic acid molecules may be selected that comprise at least 10, 15, 20, 25, 30, 35, 40, 50, 100, 150, 200, 250, 300 or more consecutive nucleotides of any of these or other portions of a nucleic acid molecule (such as one encoding MUC5AC or another gene, EST, or corresponding cDNA found in the 1 lpter region described herein) or a specific allele thereof, such as those disclosed herein.
  • a nucleic acid molecule such as one encoding MUC5AC or another gene, EST, or corresponding cDNA found in the 1 lpter region described herein
  • a specific allele thereof such as those disclosed herein.
  • representative nucleic acid molecules might comprise at least 10 consecutive nucleotides of a nucleic acid sequence shown in any one of the sequences in the gene region of transcripts designated as XM_001714774.1 (MUC5AC) and more particularly any 10 consecutive nucleotides overlapping one of the SNPs illustrated in any of these sequences. More particularly, probes and primers in some embodiments are selected so that they overlap or reside adjacent to at least one of the SNPs indicated in the Sequence Listing or the tables of MUC2 (table 5) and MUC5AC (table 6) SNPs included in this application. Purified: The term purified does not require absolute purity; rather, it is intended as a relative term.
  • a purified nucleic acid preparation is one in which the specified protein is more enriched than the nucleic acid is in its generative environment, for instance within a cell or in a biochemical reaction chamber.
  • a preparation of substantially pure nucleic acid may be purified such that the desired nucleic acid represents at least 50% of the total nucleic acid content of the preparation.
  • a substantially pure nucleic acid will represent at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, or at least 95% or more of the total nucleic acid content of the preparation.
  • a substantially pure protein or peptide will represent at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, or at least 95% or more of the total protein content of the preparation.
  • a recombinant nucleic acid is one that has a sequence that is not naturally occurring or has a sequence that is made by an artificial combination of two otherwise separated segments of sequence. This artificial combination can be accomplished by chemical synthesis or, more commonly, by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques.
  • Regulatory Sequences or Elements refer generally to a class of DNA sequences that influence or control expression of genes. Included in the term are promoters, enhancers, locus control regions, boundary elements/insulators, silencers, Matrix attachment regions (also referred to as scaffold attachment regions), repressor, transcriptional terminators, origins of replication, centromeres, and meiotic recombination hotspots. Promoters are sequences of DNA near the 5' end of a gene that act as a binding site for RNA polymerase, and from which transcription is initiated. Enhancers are control elements that elevate the level of transcription from a promoter, usually independently of the enhancer's orientation or distance from the promoter.
  • Locus control regions confer tissue- specific and temporally regulated expression to genes to which they are linked. LCRs function independently of their position in relation to the gene, but are copy-number dependent. It is believed that they function to open the nucleosome structure, so other factors can bind to the DNA. LCRs may also affect replication timing and origin usage. Insulators (also know as boundary elements) are DNA sequences that prevent the activation (or inactivation) of transcription of a gene, by blocking effects of surrounding chromatin. Silencers and repressors are control elements that suppress gene expression; they act on a gene independently of their orientation or distance from the gene. Matrix attachment regions (MARs), also known as scaffold attachment regions, are sequences within DNA that bind to the nuclear scaffold.
  • MARs Matrix attachment regions
  • MARs mediate higher-order, looped structures within chromosomes.
  • Transcriptional terminators are regions within the gene vicinity that RNA polymerase is released from the template. Origins of replication are regions of the genome, during DNA synthesis or replication phases of cell division, that begin the replication process of DNA. Meiotic recombination hotspots are regions of the genome that recombine more frequently than the average during meiosis.
  • RNA A typically linear polymer of ribonucleic acid monomers, linked by phosphodiester bonds. Naturally occurring RNA molecules fall into three classes, messenger (mRNA, which encodes proteins), ribosomal (rRNA, components of ribosomes), and transfer (tRNA, molecules responsible for transferring amino acid monomers to the ribosome during protein synthesis).
  • mRNA messenger
  • rRNA ribosomal
  • tRNA transfer molecules responsible for transferring amino acid monomers to the ribosome during protein synthesis.
  • Total RNA refers to a heterogeneous mixture of all three types of RNA molecules.
  • Sequence identity The similarity between two nucleic acid sequences, or two amino acid sequences, is expressed in terms of the similarity between the sequences, otherwise referred to as sequence identity. Sequence identity is frequently measured in terms of percentage identity (or similarity or homology); the higher the percentage, the more similar the two sequences are. Homologs or orthologs of nucleic acid or amino acid sequences will possess a relatively high degree of sequence identity when aligned using standard methods. This homology will be more significant when the orthologous proteins or nucleic acids are derived from species which are more closely related (e.g., human and chimpanzee sequences), compared to species more distantly related (e.g., human and C. elegans sequences).
  • orthologs are at least 50% identical at the nucleotide level and at least 50% identical at the amino acid level when comparing human orthologous sequences.
  • Methods of alignment of sequences for comparison are well known.
  • Various programs and alignment algorithms are described in: Smith & Waterman, Adv. Appl. Math. 2:482, 1981 ; Needleman & Wunsch, J. MoI. Biol. 48:443, 1970; Pearson & Lipman, Proc. Natl. Acad. ScL USA 85:2444, 1988; Higgins & Sharp, Gene, 73:237-44, 1988; Higgins & Sharp, CABIOS 5:151-3, 1989; Corpet et al. , Nuc. Acids Res.
  • NCBI Basic Local Alignment Search Tool (BLAST) (Altschul et al, J. MoI. Biol. 215:403-10, 1990) is available from several sources, including the National Center for Biotechnology Information (NCBI, Bethesda, MD) and on the Internet, for use in connection with the sequence analysis programs blastp, blastn, blastx, tblastn and tblastx. Each of these sources also provides a description of how to determine sequence identity using this program.
  • Homologous sequences are typically characterized by possession of at least 60%, 70%, 75%, 80%, 90%, 95% or at least 98% sequence identity counted over the full length alignment with a sequence using the NCBI Blast 2.0, gapped blastp set to default parameters. Queries searched with the blastn program are filtered with DUST (Hancock and Armstrong, Comput. Appl. Biosci. 10:67-70, 1994). It will be appreciated that these sequence identity ranges are provided for guidance only; it is entirely possible that strongly significant homologs could be obtained that fall outside of the ranges provided.
  • nucleic acid sequences that do not show a high degree of identity may nevertheless encode similar amino acid sequences, due to the degeneracy of the genetic code. It is understood that changes in nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid sequences that all encode substantially the same protein.
  • An alternative indication that two nucleic acid molecules are closely related is that the two molecules hybridize to each other under stringent conditions, as described under "specific hybridization.”
  • SNP Single Nucleotide Polymorphism
  • Representative SNPs in MUC5AC are displayed in the tables 1 and 2 above.
  • Specific binding agent An agent that binds substantially only to a defined target.
  • a protein-specific binding agent binds substantially only the specified protein.
  • the term "X-protein specific binding agent” includes anti-X protein antibodies (and functional fragments thereof) and other agents (such as soluble receptors) that bind substantially only to the X protein (where "X" is a specified protein, or in some embodiments a specified domain or form of a protein, such as a particular allelic form of a protein).
  • Anti-X protein antibodies may be produced using standard procedures described in a number of texts, including Harlow and Lane ⁇ Antibodies, A Laboratory Manual, CSHL, New York, 1988). The determination that a particular agent binds substantially only to the specified protein may readily be made by using or adapting routine procedures.
  • One suitable in vitro assay makes use of the Western blotting procedure (described in many standard texts, including Harlow and Lane ⁇ Antibodies, A Laboratory Manual, CSHL, New York, 1988)).
  • Western blotting may be used to determine that a given protein binding agent, such as an anti-X protein monoclonal antibody, binds substantially only to the X protein. Shorter fragments of antibodies can also serve as specific binding agents.
  • Fabs, Fvs, and single-chain Fvs that bind to a specified protein would be specific binding agents.
  • These antibody fragments are defined as follows: (1) Fab, the fragment which contains a monovalent antigen-binding fragment of an antibody molecule produced by digestion of whole antibody with the enzyme papain to yield an intact light chain and a portion of one heavy chain; (2) Fab', the fragment of an antibody molecule obtained by treating whole antibody with pepsin, followed by reduction, to yield an intact light chain and a portion of the heavy chain; two Fab' fragments are obtained per antibody molecule; (3) (Fab') 2 , the fragment of the antibody obtained by treating whole antibody with the enzyme pepsin without subsequent reduction; (4) F(ab') 2 , a dimer of two Fab' fragments held together by two disulfide bonds; (5) Fv, a genetically engineered fragment containing the variable region of the light chain and the variable region of the heavy chain expressed as two chains; and (6) single chain antibody (“SCA”)
  • Specific hybridization refers to the binding, duplexing, or hybridizing of a molecule only or substantially only to a particular nucleotide sequence when that sequence is present in a complex mixture (e.g. total cellular DNA or RNA). Specific hybridization may also occur under conditions of varying stringency.
  • Hybridization conditions resulting in particular degrees of stringency will vary depending upon the nature of the hybridization method of choice and the composition and length of the hybridizing DNA used. Generally, the temperature of hybridization and the ionic strength (especially the Na+ concentration) of the hybridization buffer will determine the stringency of hybridization. Calculations regarding hybridization conditions required for attaining particular degrees of stringency are discussed by Sambrook et al. (In: Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, New York, 1989 ch. 9 and 11). By way of illustration only, a hybridization experiment may be performed by hybridization of a DNA molecule to a target DNA molecule which has been electrophoresed in an agarose gel and transferred to a nitrocellulose membrane by Southern blotting (Southern, J. MoI. Biol.
  • Traditional hybridization with a target nucleic acid molecule labeled with [ 32 P] -dCTP is generally carried out in a solution of high ionic strength such as 6 x SSC at a temperature that is 20-25° C below the melting temperature, T m , described below.
  • a solution of high ionic strength such as 6 x SSC at a temperature that is 20-25° C below the melting temperature, T m , described below.
  • hybridization is typically carried out for 6-8 hours using 1-2 ng/ml radiolabeled probe (of specific activity equal to 10 9 CPM/ ⁇ g or greater).
  • the nitrocellulose filter is washed to remove background hybridization. The washing conditions should be as stringent as possible to remove background hybridization but to retain a specific hybridization signal.
  • T m represents the temperature (under defined ionic strength, pH and nucleic acid concentration) at which 50% of the probes complementary to the target sequence hybridize to the target sequence at equilibrium. Because the target sequences are generally present in excess, at T m 50% of the probes are occupied at equilibrium.
  • the T m of such a hybrid molecule may be estimated from the following equation (Bolton and McCarthy, Proc. Natl. Acad. ScL USA 48: 1390, 1962):
  • Stringent conditions may be defined as those under which DNA molecules with more than 25%, 15%, 10%, 6% or 2% sequence variation (also termed "mismatch") will not hybridize. Stringent conditions are sequence dependent and are different in different circumstances. Longer sequences hybridize specifically at higher temperatures. Generally, stringent conditions are selected to be about 5° C lower than the thermal melting point T m for the specific sequence at a defined ionic strength and pH. An example of stringent conditions is a salt concentration of at least about 0.01 to 1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and a temperature of at least about 30° C for short probes (e.g. 10 to 50 nucleotides). Stringent conditions can also be achieved with the addition of destabilizing agents such as formamide. For example, conditions of 5 X SSPE (750 mM NaCl, 50 mM Na Phosphate, 5 mM EDTA, pH 7.4) and a temperature of 25-30° C are suitable for allele- specific probe hybridizations
  • Hybridization 5x SSC at 65°C for 16 hours Wash twice: 2x SSC at room temperature (RT) for 15 minutes each
  • Hybridization 5x-6x SSC at 65°C-70°C for 16-20 hours Wash twice: 2x SSC at RT for 5-20 minutes each
  • Hybridization 6x SSC at RT to 55 0 C for 16-20 hours Wash at least twice: 2x-3x SSC at RT to 55°C for 20-30 minutes each.
  • a perfectly matched probe has a sequence perfectly complementary to a particular target sequence.
  • the test probe is typically perfectly complementary to a portion (subsequence) of the target sequence.
  • the term "mismatch probe” refers to probes whose sequence is deliberately selected not to be perfectly complementary to a particular target sequence.
  • Transcription levels can be quantitated absolutely or relatively. Absolute quantitation can be accomplished by inclusion of known concentrations of one or more target nucleic acids (for example control nucleic acids or with a known amount the target nucleic acids themselves) and referencing the hybridization intensity of unknowns with the known target nucleic acids (for example by generation of a standard curve).
  • target nucleic acids for example control nucleic acids or with a known amount the target nucleic acids themselves
  • Subject Living, multicellular vertebrate organisms, a category that includes both human and veterinary subjects for example, mammals, birds and primates.
  • a transformed cell is a cell into which has been introduced a nucleic acid molecule by molecular biology techniques.
  • transformation encompasses all techniques by which a nucleic acid molecule might be introduced into such a cell, including transfection with viral vectors, transformation with plasmid vectors, and introduction of naked DNA by electroporation, lipofection, and particle gun acceleration.
  • a nucleic acid molecule as introduced into a host cell, thereby producing a transformed host cell.
  • a vector may include nucleic acid sequences that permit it to replicate in a host cell, such as an origin of replication.
  • a vector may also include one or more selectable marker genes and other genetic elements known in the art.
  • SNPs Single nucleotide polymorphisms within the coding region and introns of MUC5 AC are demonstrated to be linked to increased likelihood of IIP.
  • SNPs single nucleotide polymorphisms
  • haplotypes near or in MUC5AC that partially predict susceptibility to IIP.
  • Sections of non-coding nucleic acid identified herein, particularly those identified herein as including a variant can be tested for functionality or changes in functionality between two or more alleles.
  • segments of DNA can be amplified separately from individuals homozygous for risk alleles and from individuals homozygous for non-risk alleles.
  • Each segment is cloned upstream of a reporter gene (such as luciferase), the resulting constructs transfected into various cell lines, such as lung cells and other cells, and the relative amount of luciferase reporter expression compared. If there is a significant difference between the levels of luciferase expression between the constructs, this indicates that the SNP(s) in that segment likely affect expression of the corresponding mucin or another linked or associated gene.
  • a reporter gene such as luciferase
  • Additional possible susceptibility SNPs in the region defined herein also can be identified. By way of example, this can be done by surveying public databases of SNPs, and by sequencing DNA from subjects affected with IIP (or another fibroproliferative condition or disease involving fibrosis of the small airways) and from controls. These SNPs can then be tested for evidence of association with IIP disease status and intermediate quantitative traits by genotyping cases and controls, for instance using methods like those described herein. SNPs that show the strongest evidence for association may be better candidates for the causative SNP. This genotype data can also be used to test haplotypes for evidence of association with disease, to help determine whether as yet unidentified SNPs may be more strongly associated.
  • the findings reported herein can be further corroborated by collecting and testing additional case-control samples for evidence of association of the identified SNPs and haplotype(s) with IIP and other conditions or diseases involving fibrosis of the small airways.
  • the locations of all the identified SNPs can be compared to segments of DNA conserved across species, because SNPs located in these segments are believed to be more likely to be affect gene expression or function.
  • SNPs found to be linked to susceptibility to IIP affect the ability of protein(s) to bind to the surrounding segment of DNA.
  • Methods for determining binding are well known to those of ordinary skill in the art, including but not limited to methods described herein.
  • variant elements are useful as markers, for instance to identify genetic material as being derived from a particular individual or in making assessments regarding the propensity of an individual to develop a particular disorder or condition ⁇ e.g., IPF, etc), the ability of an individual to respond to a certain course of treatment, or in other diagnostic or prognostic and other methods described in more detail herein.
  • Genetic material nucleic acids such as genomic DNA, RNA, and cDNA
  • suitable for use in such methods can be generated or derived from a variety of sources.
  • nucleic acid molecules preferably genomic DNA
  • Cells can be obtained from biological samples, for instance from tissue samples or from bodily fluid samples that include cells (e.g., blood, urine, semen, exudates, or saliva). Detection methods of the disclosure can be used to detect variant elements in DNA in a biological sample in intact cells (for instance, using in situ hybridization) or in extracted DNA (for instance, using Southern blot hybridization).
  • the variants (including individual SNPs and haplotypes) described herein are useful as markers or indicators in a variety of different methods. They can be used, for instance, in diagnostic and prognostic assays, and in monitoring clinical trials for the purposes of predicting outcomes of developing or ongoing therapeutic or treatment regimens. The results of such methods can be used to develop or recommend a course of prophylactic treatment for an individual who is identified as having a specific SNP or combination of SNPs (or a haplotype), to prescribe or develop a course of therapy after identification that a subject has or suffers from a disease or disorder, or to alter or adapt an ongoing therapeutic regimen. Certain embodiments therefore include diagnostic methods for detecting one or more
  • SNPs or a haplotype in a biological sample to thereby determine whether a subject is at risk of developing a disorder or disease or condition linked to one or more of the SNPs or the haplotypes described herein, or whether the subject is afflicted with the disease, condition or disorder.
  • the subject methods also can be used to determine whether a subject is at risk for passing on the susceptibility to develop a disease, condition or disorder to their offspring.
  • fibrosis in particular fibrosis of the lung, such as particularly fibrosis of the small airways (including asthma and chronic obstructive pulmonary disease), including for instance IIP such as familial or idiotypic pulmonary fibrosis.
  • IIP such as familial or idiotypic pulmonary fibrosis.
  • SNP sequences or haplotypes can be assayed in a biological sample from a subject.
  • Such assays can be used for prognostic, diagnostic, or predictive purpose to prophylactically or therapeutically treat an individual prior to or after the onset of a disorder, disease or condition (such as IIP) associated with one or more of the SNPs/haplotypes described herein, specifically those located in at 1 lpter.
  • a disorder, disease or condition such as IIP
  • nucleotide variants including individual SNPs and haplotypes
  • the nucleotide variants also can be used for generating polynucleotide reagents. Methods are also provided for identifying or screening for compounds useful for treating or influencing or preventing a disease, disorder or condition associated with a SNP or haplotype located in at 1 lpter.
  • EXAMPLE 1 Familial Interstitial Pneumonia is linked to Chromosomes 11, 10, and 12
  • This example describes the identification of regions of interest in the genome relevant to FIP and/or IIP, and evaluation of smoking as a covariate.
  • IPF idiopathic pulmonary fibrosis
  • a maximum multipoint lod score of 3.34 was identified at Dl IS 1318, incorporating a 8.8 cM region bounded by Dl 1S4046 and Dl ISl 760.
  • a second linkage peak spans approximately 15 cM and is bounded by D 1OS 1751 and D 1OS 1664 (maximum multipoint LOD score of 2.07 at D 1OS 1649).
  • Regions on chromosomes 10, 1 1, and 12 are identified as containing genes contributing to FIP. Moreover, linkage on chromosome 1 1 is influenced by cigarette smoking, and linkage on chromosome 12 is influenced by disease phenotype.
  • IIP interstitial pneumonias
  • UPF interstitial pneumonia
  • IPF OMIM 178500
  • Interstitial lung disease has been associated with a variety of genetic diseases with a known inheritance pattern such as Hermansky-Pudlak syndrome (DePinho & Kaplan, Medicine (Baltimore) 64:192-202, 1985, neurofibromatosis (Riccardi, N. Engl. J. Med. 305: 1617-1627, 1981), tuberous sclerosis (Makle et al, Chest 538-540,
  • Interstitial lung disease also results from environmental exposures such as inhalation of fibrogenic dusts or air-borne organic antigens including exposures such as coal dust, wood or metal dust, mold, silica, and cigarette smoke (Marshall et al, Thorax 55:143-146, 2000; Mullen et al, J. Occup. Environ. Med. 40:363-367, 1998; Baumgartner et al, Am. J. Epidemiol. 152:307-315, 2000). Latent herpesvirus infections have been associated with an increased risk of this disease (Turner- Warwick, Thorax 53 Suppl 2:S3-S9, 1998; Ferri et al, BrJ. Rheumatol.
  • FIP familial interstitial pneumonia
  • a toll-free number (877-487-441 1) was established to facilitate subject participation.
  • Family, Ascertainment, and Phenotyping Three sites in the United States National Jewish Medical and Research Center Denver, CO; Vanderbilt University, Nashville, TN; and Duke University Medical Center, Durham, NC) were established to identify subjects with FIP, and to enroll and phenotype probands and family members.
  • a diagnosis of FIP required the presence of 2 or more cases of probable or definite IIP in individuals related within three degrees.
  • ATS American Thoracic Society
  • ERS European Respiratory Society
  • Diagnostic categories were unaffected, possible affected, probable affected, and definite affected. Unaffected was defined as no evidence of interstitial lung disease on chest radiograph, DLCO > 80% predicted, and a dyspnea level of 0 or 1 using the ATS dyspnea scale.
  • affected was defined as either surgical lung biopsy or autopsy evidence of an IIP with an appropriate clinical history. Lung biopsy samples were classified by one of us (TAS) according to revised criteria for the diagnosis of IIPs (Travis et al. , Pneumonias. Amer J Respir Crit Care Med 2:277-304, 2002). Probably affected was defined as bilateral reticular abnormalities associated with honeycombing on HRCT. If honeycombing was absent, bibasilar reticular abnormalities, with or without ground glass opacities in the absence of other explanations for interstitial abnormalities on HRCT, plus either dyspnea of grade 2 or greater or a DLCO ⁇ 80% also met the definition.
  • Mendelian pedigree inconsistencies were identified using PEDCHECK (O'Connell & Weeks, Am.J.Hum.Genet. 63:259-266, 1998) and checked by laboratory technicians who were blinded to the pedigree structure. Further verification of inter and intra-familial genetic relationships was performed using RELPAIR (Epstein et al. , Am.J.Hum.Genet 67:1219-1231 , 2000; Boehnke & Cox, Am.J.Hum.Genet. 61 :423-429, 1997) at the beginning of the study using the first 50 genotyped markers and then later using all 887 genotyped markers. Linkage Analysis
  • Linkage analysis was performed in a series of 82 multiplex families. Eighty of the 111 families described in our clinical description (Steele et al., Am J Respir Crit Care Med 172: 1 146-1 152, 2006) were included in the genomic screen; the remainder of the 1 1 1 families were excluded from the genomic screen because of lack of DNA or lack of informativeness for linkage analysis. Two newly ascertained families, identified using the identical ascertainment strategies as the first series of families, were also included in this linkage analysis.
  • Map order was verified using Map-O-Mat (Matise et al, Am.J.Hum.Genet. 73:271- 284, 2003). Marker allele frequencies were estimated from the data using all individuals (Broman, Genet Epidemiol. 20:307-315, 2001). Multipoint LOD scores > 2.0 were considered to be interesting. Approximate 95% confidence intervals were determined using the one-LOD- score-down method. To evaluate our families for genetic heterogeneity, an ordered subset analysis (OSA) (Hauser et al, Genet Epidemiol. 27:53-63, 2004) was applied to the 82 families in the genomic screen.
  • OSA ordered subset analysis
  • the OSA approach uses a family-specific continuous covariate to rank families according to the covariate value.
  • This approach may identify subsets of the data that are more homogeneous than others, thereby potentially identifying regions of linkage evidence previously unrecognized.
  • the evidence for an increased linkage signal in the subset of families is assessed statistically using permutation.
  • Non-parametric multipoint family-specific LOD scores calculated in the genomic screen of the full set of 82 pedigrees were used as input to the computer program. The potential contributions of disease age-at-onset, and smoking exposure to disease risk were evaluated.
  • disease age-at-onset was defined as an average patient- reported age of first recognition of breathlessness or, when not available, age-at-first diagnostic CT scan was used as a surrogate.
  • family-specific variable was defined as the proportion of affected individuals within a family who were current or former smokers among those who had smoking history data available.
  • Age-at-diagnosis is defined as self-report of age-at-first-recognized breathlessness; or, when not available, age at first diagnostic HRCT scan. 3p-value for homogeneous vs. heterogeneous ⁇ 0.03
  • genotyped markers Genetic analysis of the first 50 genotyped markers identified 2 individuals with incorrect gender and 2 individuals who were genetically inconsistent with reported pedigree structure. These four individuals were eliminated from the genomic screen. Additionally, three individuals formerly reported as siblings were identified as half-siblings. The 82 families included 559 genotyped individuals (202 affected). Error checking of the remaining genotyping demonstrated > 99.5% accuracy in genotypes when compared against internal controls.
  • a second region of interest was identified on chromosome 10, where the maximum multipoint LOD score of 2.1 occurred at D 1OS 1649.
  • the region of interest on chromosome 10 has an approximate 95% confidence interval that spans 15 cM and is bounded by
  • FIP familial interstitial pneumonia
  • genomic screens in complex diseases depend on large datasets of multiplex pedigrees; however, genomic screens of as few as 31-80 pedigrees have identified regions of interest and ultimately genes in many complex diseases (Namjou et al, Arthritis Rheum. AfrlVhl-lWi, 2002; Winn et al, Science 308:1801-1804, 2005; Rampersaud et al, J.Med.Genet. 42(12):940-946, 2005; Wang et al, Science 302:1578-1581, 2003; Ashley- Koch et al, Neurosci.Lett. 379:199-204, 2005; Blanton et al, J.Med.Genet.
  • the initial genomic screen in late-onset Alzheimer disease included only 31 multiplex pedigrees (Pericak-Vance et al, AmJ. Hum. Genet. 48:1034-1050, 1991) and identified a region of interest on chromosome 19 that was subsequently found to harbor APOE, a major susceptibility gene for Alzheimer disease.
  • small sample sizes in genomic screens can identify regions of interest when the genetic effect is strong.
  • FIP susceptibility gene that is imprinted may represent one explanation for the apparent autosomal dominant inheritance with reduced penetrance seen in FIP in some pedigrees, although many of our pedigrees are inconsistent with an imprinting hypothesis.
  • genes in this region are either known or putative growth related genes including the insulin precursor, insulin like growth factor 2, tumor suppressor genes TSSC4, cyclin-dependent kinase inhibitor 1C (p57), and FGF receptor activating protein 1.
  • the Ro/SSA sicca syndrome antigen is in the region, and Sjogren's syndrome is known to be associated with fibrosing interstitial lung disease.
  • Matrix metalloproteinase 26, MMP26 is also in the interval.
  • MMP7 Matrix metalloproteinases are involved in tissue remodeling and wound healing and are, therefore, attractive biological candidates.
  • MMP7 has been identified as a differentially expressed gene in pulmonary fibrosis, and targeted deletion of MMP7 is protective against bleomycin- induced lung injury and pulmonary fibrosis (Marzuola, Nature 417:679, 2002).
  • identification of a gene or genes influencing the development of IIP could be of critical importance in developing novel approaches to prevention and treatment. For instance, identification of a gene predisposing to IIP may enable pre-symptomatic genetic counseling, ealier disease recognition and treatment, and gene-targeted interventions. Our finding that affected individuals with less smoking tend to have a stronger contribution to genetic linkage at l lpter - underscores the importance of considering environmental risk factors in FIP and other genetic diseases.
  • EXAMPLE 2 Identification of SNPs in Mucin 5AC (MUC 5 AC) associated with pulmonary fibrosis.
  • rs7944723 is a synonymous SNP (Pro4205Pro) in the mucin 2 gene, MUC 2, that maps to a region harboring 4 mucin genes (telomere to centromere: MUC6, MUC2, MUC5AC, and MUC5E). While there are recombination hotspots located between MUC6 and MUC2, and within the proximal portion of MUC5 B (Rousseau, K., et al. Allelic association and recombination hotspots in the mucin gene (MUC) complex on chromosome 1 Ipl 5.5.
  • SNPs of MUC5 AC resulted in P values ⁇ 0.05 in either FIP or IPF cases compared to controls. While only 1 of the 7 SNPs in MUC2 was significant in both FIP and IPF, 11 of the 32 SNPs in MUC5AC were significant in both FIP and IPF.
  • Table 5 MUC2 variants identified by re-sequencing.
  • Muc5ac deficient mice showed more connective tissue deposition by Masson- trichrome staining (Figure 5F) and significantly more lung collagen by sircoll assay (Figure 5G) compared to either of the wild type founder strains or the Muc ⁇ ac heterozygous littermates. Furthermore, mucous metaplasia, as determined by Muc5ac protein expression, was detected by immunohistochemistry in bronchial cells of C57BL/6 mice following bleomycin instillation (Figure IH). In contrast, there was negligible Muc5ac staining in bronchial or alveolar tissue of saline-treated C57BL/6 mice ( Figure 5H).
  • MucSac deficient mice had significantly more apoptotic bronchial epithelial cells and apoptotic cells in fibrotic regions of the lung than heterozygotes (Muc5ac-/+) ( Figure 6C-F).
  • MUCSAC is a major mucin gene in human airway epithelia (Rose, M. C. & Voynow, J. A. Respiratory tract mucin genes and mucin glycoproteins in health and disease. Physiol Rev 86, 245-278, 2006), we used an shRNA targeted to MUC 5AC in a human airway epithelial cell line (NCI H-292) to examine the effect of MUCSAC silencing on WNT and TGF- ⁇ /BMP signaling pathways, as both pathways have previously been shown to be involved in lung fibroproliferation. Treatment of cells with MUCSAC shRNA reduced expression of MUC5AC mRNA by approximately 75%.
  • PCR arrays specific for the WNT and TGF- ⁇ /BMP signaling pathways showed enhanced expression of several profibrotic mediators over cells transfected with control plasmid.
  • fibroblast growth factor 4 FGF4
  • FOXNl transcription factor 4
  • SFRP4 frizzled-related protein
  • TGF- ⁇ l related genes were also enhanced significantly, including a six-fold increase in both BMP binding endothelial regulator (BMPER), a molecule that regulates signaling of the TGF- ⁇ superfamily, and cyclin-dependent kinase inhibitor 2B (CDKN2B), a SMAD pathway target gene.
  • BMPER BMP binding endothelial regulator
  • CDKN2B cyclin-dependent kinase inhibitor 2B
  • GSC goosecoid
  • Foxhl recruits Gsc to negatively regulate Mixll expression during early mouse development. EMBO J 26, 3132-3143, 2007) and migration (Niehrs, C, Keller, R., Cho, K. W.
  • MUC5 AC glycoprotein in FIP usual interstitial pneumonia (UIP) pathology
  • IPFAJIP fibrotic nonspecific interstitial pneumonia
  • COP cryptogenic organizing pneumonia
  • Fibrotic interstitial processes including FIP/UIP, IPFAJIP, and fibrotic NSIP, show collections of MUC5AC-positive extracellular mucus within fibrotic airspaces, despite their peripheral distribution. While some of the MUC5 AC-positive mucus may have been secreted from proximal airways, there is also MUC5AC-positive staining in patches of metaplastic epithelium lining honeycomb cysts. The airspace MUC5AC- containing mucus permeates the regions of fibrosis and has physical contact with the pneumocyte epithelium, denuded regions secondary to injury, and the underlying stromal tissue. We also stained fibrotic and non-fibrotic human lungs for MUC2, but did not indentify significant staining.
  • MUC5AC is regulated by TGF- ⁇ , EGF, IL-13, STAT6, HIF-I, and the p42/44 MAPK pathway
  • our findings indicate that MUC5AC expression affects expression of genes in the WNT and TGF- ⁇ /BMP signaling pathways involved in lung fibroproliferation, suggesting that MUC 5 AC may specifically regulate critical genes involved in the development of FIP and fibrotic forms of IIP.
  • the specific mechanism(s) by which polymorphisms in a secreted gel mucin, such as MUC5AC, may enhance the fibroproliferative response remain unclear.
  • MUC5AC farnesoid fibrotic lung disease
  • fibrosing IIPs there is a loss of type I alveolar epithelia, and although type II alveolar epithelia proliferate, the type II alveolar epithelia do not re-epithelialize the alveolus. While this may be caused by aberrant WNT signaling with inhibition of ⁇ -catenin, it is also possible that failure of re- epithelialization is a result of accelerated apoptosis in the distal lung.
  • telomeres are known to enhance apoptosis.
  • reduced expression of telomerase or rare loss of function mutations in telomerase genes results in shortened telomeres that markedly diminishes the half-life of stem cells (Hao, L. Y., et al. Short telomeres, even in the presence of telomerase, limit tissue renewal capacity. Cell 123, 1121-1131, 2005) and are reported to be associated with fibrosing IIP.
  • MUC 5 AC regulates expression of numerous genes in the WNT and TGF- ⁇ /BMP signaling pathways in human airway epithelial cells, and found that MUC5AC is expressed at the site of fibroproliferation in humans with fibrotic forms of either FIP or IIP.
  • a suitable genomic DNA- containing sample from a subject is obtained and the DNA extracted using conventional techniques.
  • a blood sample, a buccal swab, a hair follicle preparation, or a nasal aspirate is used as a source of cells to provide the DNA sample; similarly, a surgical specimen, biopsy, or other biological sample containing genomic DNA could be used.
  • tumor biopsies or tumor DNA found in plasma or other blood products can serve as a source.
  • the extracted DNA is then subjected to amplification, for example according to standard procedures.
  • the allele of the single base-pair mutation is determined by conventional methods including manual and automated fluorescent DNA sequencing, primer extension methods (Nikiforov, et al., Nucl Acids Res. 22:4167-4175, 1994), oligonucleotide ligation assay (OLA) (Nickerson et al. , Proc. Natl. Acad. ScL USA 87:8923-8927, 1990), allele-specific PCR methods (Rust et al., Nucl. Acids Res.
  • Combinations of the SNPs identified in MUC5AC SNP could be used identify individuals at risk for pulmonary fibrosis either in families with at least one case of pulmonary fibrosis or even in the general population.
  • other variations and mutations of these genes can be detected that may be associated with variable predisposition to development of pulmonary fibrosis or likelihood of having pulmonary fibrosis, and used in combination with the disclosed mucin SNPs, to predict the probability that a subject will develop pulmonary fibrosis or another disease involving fibrosis of the lung parenchyma or small airways.
  • SNPs in MUC5AC could also be used for purposes of early diagnosis. Patients with
  • IIP are often diagnosed very late in the course of their disease. In fact, most patients with IIP are diagnosed 3-5 years prior to their death from this crippling disease. However, respiratory symptoms arise much earlier and in these subjects, SNPs in MUC5AC could be used to identify individuals that need more aggressive testing and follow up because of their higher risk of IIP.
  • the SNPs of the present disclosure can ultimately be utilized for the development of personalized treatment for this disease.
  • the value of identifying individuals who carry a susceptible allele of mucin i.e., individuals who are heterozygous or homozygous for an allele that contains the MUC5AC polymorphisms listed above; or any combination thereof; or another sequence variation in one or proximal to one of the variable regions indicated herein) is that these individuals could then initiate customized therapies (such as specific drug therapies that replace or supplement the function of the variant mucin), or undergo more aggressive treatment of the condition, and thereby beneficially alter its course.
  • Sequences surrounding and overlapping single base-pair mutations and deletions and insertions in a mucin gene can be useful for a number of gene mapping, targeting, and detection procedures.
  • genetic probes can be readily prepared for hybridization and detection of the SNPs identified in MUC5AC.
  • probe sequences may be greater than about 8 or more oligonucleotides in length and possess sufficient complementarity to distinguish between the variant sequence and the reference, for instance, between the A (at position chrl 1 :1144294 in the MUC5AC susceptible allele) and corresponding G (in the reference allele).
  • sequences surrounding and overlapping any of the specifically disclosed SNPs can be utilized in allele specific hybridization procedures.
  • a similar approach can be adopted to detect other mucin sequence variations. Sequences surrounding and overlapping a mucin variation, or any portion or subset thereof that allows one to identify the variant, are highly useful.
  • another embodiment provides a genetic marker predictive of the one or more of AADDOl 112371.1 1 1838, rs34474233, and rs34815853 of 3VIUC5AC, comprising a partial sequence of the human genome including at least about 10 contiguous nucleotide residues such as those shown and discussed herein, and sequences complementary therewith.
  • Single nucleotide alterations can be detected by a variety of techniques in addition to merely sequencing the target sequence.
  • Constitutional single nucleotide alterations can arise either from new germline mutations, or can be inherited from a parent who possesses a SNP or mutation in their own germline DNA.
  • the techniques used in evaluating either somatic or germline single nucleotide alterations include hybridization using allele specific oligonucleotides (ASOs) (Wallace et al, CSHL Symp. Quant. Biol. 51 :257-261, 1986; Stoneking et al., Am. J. Hum. Genet.
  • Allele-specif ⁇ c oligonucleotide hybridization involves hybridization of probes to the sequence, stringent washing, and signal detection.
  • Other new methods include techniques that incorporate more robust scoring of hybridization. Examples of these procedures include the ligation chain reaction (ASOH plus selective ligation and amplification), as disclosed in Wu and Wallace ⁇ Genomics 4:560-569, 1989); mini- sequencing (ASOH plus a single base extension) as discussed in Syvanen (Meth. MoI. Biol. 98:291-298, 1998); and the use of DNA chips (miniaturized ASOH with multiple oligonucleotide arrays) as disclosed in Lipshutz et al. ⁇ BioTechniques 19:442-447, 1995).
  • ASOH with single- or dual-labeled probes can be merged with PCR, as in the 5'-exonuclease assay (Heid et al, Genome Res. 6:986-994, 1996), or with molecular beacons (as in Tyagi and Kramer, Nat. Biotechnol. 14:303-308, 1996).
  • DASH dynamic allele-specific hybridization
  • a target sequence is amplified by PCR in which one primer is biotinylated.
  • the biotinylated product strand is bound to a streptavidin-coated microtiter plate well, and the non-biotinylated strand is rinsed away with alkali wash solution.
  • An oligonucleotide probe, specific for one allele, is hybridized to the target at low temperature. This probe forms a duplex DNA region that interacts with a double strand- specific intercalating dye.
  • the dye When subsequently excited, the dye emits fluorescence proportional to the amount of double-stranded DNA (probe-target duplex) present.
  • the sample is then steadily heated while fluorescence is continually monitored. A rapid fall in fluorescence indicates the denaturing temperature of the probe-target duplex.
  • T m melting temperature
  • Oligonucleotides specific to normal or allelic sequences can be chemically synthesized using commercially available machines. These oligonucleotides can then be labeled radioactively with isotopes (such as 32 P) or non-radioactively, with tags such as biotin (Ward and Langer et al, Proc. Natl. Acad. Sci. USA 78:6633-6657, 1981), and hybridized to individual DNA samples immobilized on membranes or other solid supports by dot-blot or transfer from gels after electrophoresis.
  • isotopes such as 32 P
  • tags such as biotin
  • MUC5 AC can influence the pulmonary fibrosis susceptibility of a subject
  • the oligonucleotide ligation assay (OLA), as described at Nickerson et al. ⁇ Proc. Natl. Acad. Sci. USA 87:8923-8927, 1990), allows the differentiation between individuals who are homozygous versus heterozygous for alleles indicated herein.
  • OLA oligonucleotide ligation assay
  • This feature allows one to rapidly and easily determine whether an individual is homozygous for at least one tyrosine kinase activating mutation, which condition is linked to a relatively high predisposition to developing neoplastic disease and/or an increased likelihood of having a tumor.
  • OLA can be used to determine whether a subject is homozygous for either of these mutations.
  • one well is used for the determination of the presence of the major allele in the MUC5AC gene that contains a G at nucleotide position chrl 1 : 1144294 (numbering from Human Genome Build 36) and a second well is used for the determination of the presence of the minor allele in the same gene that contains an A at that nucleotide position in the alternate allele sequence.
  • the results for an individual who is heterozygous for the mutation will show a signal in each of the G and A wells.
  • An alternative method of diagnosing mucin variation, gene amplification, or deletion as well as abnormal mucin ⁇ e.g., MUC5AC) expression is to quantitate the level of mucin protein in an individual.
  • Such evaluations can be performed, for example, in lysates prepared from cells, in fresh or frozen cells, in cells that have been smeared or touched on glass slides and then either fixed and/or dried, or in cells that have been fixed, embedded ⁇ e.g., in paraffin), and then prepared as histological sections on glass slides.
  • mucins including particularly MUC5AC
  • mucus membranes including but not limited to fluids of the oropharyngeal tract, such as sputum.
  • samples may be taken from, for instance, bronchoalveolar lavage (BAL), sputum, and induced sputum samples.
  • BAL bronchoalveolar lavage
  • sputum sputum
  • induced sputum samples sputum samples.
  • Oropharyngeal tract fluids can be acquired through conventional techniques, including sputum induction, bronchoalveolar lavage (BAL), and oral washing. Obtaining a sample from oral washing involves having the subject gargle with an amount normal saline for about 10-30 seconds and then expectorate the wash into a sample cup.
  • This diagnostic tool would also be useful for detecting reduced levels of the mucin protein that result from, for example, mutations in the promoter regions of the MUC5AC gene or mutations within the coding region of the gene that produced truncated, non- functional or unstable polypeptides, as well as from deletions of a portion of or the entire respective mucin gene.
  • amplification of a mucin-encoding sequence may be detected as an increase in the expression level of mucin protein.
  • Such an increase in protein expression may also be a result of an up-regulating mutation in the promoter region or other regulatory or coding sequence within the mucin gene, or by virtue of a point mutation within the coding sequence, which protects the mucin protein from degradation.
  • Localization and/or coordination of MUC5AC expression can also be examined using known techniques, such as isolation and comparison of mucin from collected fractions, including specific mucus membranes, or from specific cell or tissue types, or at specific time points after an experimental manipulation.
  • Demonstration of reduced or increased mucin protein levels in comparison to such expression in a control cell (e.g., normal, as in taken from a subject not suffering from a fibrotic disease, such as pulmonary fibrosis), would be an alternative or supplemental approach to the direct determination of mucin gene deletion, amplification or mutation status by the methods outlined above and equivalents.
  • Any standard immunoassay format e.g., ELISA, western blot, or RIA assay
  • ELISA ELISA
  • western blot or RIA assay
  • Altered mucin (e.g., MUC5AC) polypeptide expression may be indicative of an abnormal biological condition related to fibrosis, in particular pulmonary fibrosis, and/or a predilection to development of pulmonary fibrosis.
  • Immunohistochemical techniques may also be utilized for mucin polypeptide or protein detection.
  • a tissue sample or swab or swipe may be obtained from a subject, and a section or portion thereof stained for the presence of mucin (or a particular mucin) using a specific binding agent (e.g., anti-mucin antibody) and any standard detection system (e.g., one which includes a secondary antibody conjugated to horseradish peroxidase).
  • a specific binding agent e.g., anti-mucin antibody
  • any standard detection system e.g., one which includes a secondary antibody conjugated to horseradish peroxidase.
  • a biological sample of the subject for instance a mouse or a human
  • biological samples may be obtained from sputum, bronchoavleolar lavage fluid, a lung biopsy specimen, exhaled breath, possibly glycosylated products of mucin that might be present in the serum, and so forth.
  • Quantitation of mucin protein can be achieved by immunoassay and compared to levels of the protein found in control cells (e.g., healthy, non-neoplastic cells of the same lineage or type as those under evaluation, or from a patient known not to have a neoplastic disease).
  • control cells e.g., healthy, non-neoplastic cells of the same lineage or type as those under evaluation, or from a patient known not to have a neoplastic disease.
  • a significant (e.g., 10% or greater) reduction in the amount of a mucin protein in the cells or mucus sample of a subject compared to the amount of that mucin protein found in a comparative normal sample could be taken as an indication that the subject may have deletions or mutations in the respective mucin gene, whereas a significant (e.g., 10% or greater) increase would indicate that a duplication (amplification), or mutation that increases the stability of the mucin protein or mRNA, may have occurred.
  • Deletion, mutation, and/or amplification within a mucin encoding sequence, and substantial under- or over-expression of mucin protein may be indicative of fibrotic disease (such as pulmonary fibrosis) and/or a predilection to develop fibrosis.
  • EXAMPLE 8 Expression of MUC 5AC or Other Protein Variant Polypeptides, or a Reporter Polypeptide under Control of a Variant Regulatory Sequence
  • proteins such as a mucin variant protein
  • purification of proteins can be performed using standard laboratory techniques, though techniques are preferentially adapted to be fitted to express the mucin proteins. Examples of such method adaptations are discussed or referenced herein.
  • purified protein may be used for functional analyses, antibody production, diagnostics, and patient therapy.
  • DNA sequences of the mucin variant cDNAs and regulatory regions, or gene or EST sequences contained within the genomic region described herein can be manipulated in studies to understand the expression of the gene and the function of its product.
  • Variant or allelic forms of a human MUC5AC gene may be isolated based upon information contained herein, and may be studied in order to detect alteration in expression patterns in terms of relative quantities, tissue specificity and functional properties of the encoded mucin variant protein (e.g., influence on mucus production, formation or resistance to pulmonary fibrosis, and so forth).
  • Partial or full-length cDNA sequences, which encode for the subject protein may be ligated into bacterial expression vectors. Methods for expressing large amounts of protein from a cloned gene introduced into Escherichia coli (E.
  • coli or more preferably baculovirus/Sf9 cells may be utilized for the purification, localization and functional analysis of proteins.
  • fusion proteins consisting of amino terminal peptides encoded by a portion of a gene native to the cell in which the protein is expressed (e.g., an E. coli lacZ or trpE gene for bacterial expression) linked to a variant protein may be used to prepare polyclonal and monoclonal antibodies against these proteins. Thereafter, these antibodies may be used to purify proteins by immunoaffinity chromatography, in diagnostic assays to quantitate the levels of protein and to localize proteins in tissues and individual cells by immunofluorescence.
  • Intact native protein may also be produced in large amounts for functional studies. Methods and plasmid vectors for producing fusion proteins and intact native proteins in culture are well known in the art, and specific methods are described in Sambrook et al. (In Molecular Cloning: A Laboratory Manual, Ch. 17, CSHL, New York, 1989). Such fusion proteins may be made in large amounts, are easy to purify, and can be used to elicit antibody response. Native proteins can be produced in bacteria by placing a strong, regulated promoter and an efficient ribosome-binding site upstream of the cloned gene. If low levels of protein are produced, additional steps may be taken to increase protein production; if high levels of protein are produced, purification is relatively easy.
  • Vectors suitable for the production of intact native proteins include pKC30 (Shimatake and Rosenberg, Nature 292: 128, 1981), pKK 177-3 (Amann and Brosius, Gene 40: 183, 1985) and pET-3 (Studiar and Moffatt, J. MoI. Biol. 189: 113, 1986).
  • Fusion proteins may be isolated from protein gels, lyophilized, ground into a powder and used as an antigen.
  • the DNA sequence can also be transferred from its existing context to other cloning vehicles, such as other plasmids, bacteriophages, cosmids, animal viruses and yeast artificial chromosomes (YACs) (Burke et al, Science 236:806-812, 1987).
  • YACs yeast artificial chromosomes
  • vectors may then be introduced into a variety of hosts including somatic cells, and simple or complex organisms, such as bacteria, fungi (Timberlake and Marshall, Science 244: 1313-1317, 1989), invertebrates, plants (Gasser and Fraley, Science 244: 1293, 1989), and animals (Pursel et al, Science 244: 1281-1288, 1989), which cell or organisms are rendered transgenic by the introduction of the heterologous cDNA.
  • somatic cells such as bacteria, fungi (Timberlake and Marshall, Science 244: 1313-1317, 1989), invertebrates, plants (Gasser and Fraley, Science 244: 1293, 1989), and animals (Pursel et al, Science 244: 1281-1288, 1989), which cell or organisms are rendered transgenic by the introduction of the heterologous cDNA.
  • the cDNA sequence may be ligated to heterologous promoters, such as the simian virus (SV) 40 promoter in the pSV2 vector (Mulligan and Berg, Proc. Natl. Acad. Sci. USA 78:2072-2076, 1981), and introduced into cells, such as monkey COS-I cells (Gluzman, Cell 23:175-182, 1981), to achieve transient or long-term expression.
  • SV simian virus
  • the stable integration of the chimeric gene construct may be maintained in mammalian cells by biochemical selection, such as neomycin (Southern and Berg, J. MoI. Appl. Genet. 1 :327-341, 1982) and mycophenolic acid (Mulligan and Berg, Proc. Natl. Acad. Sci. USA 78:2072-2076, 1981).
  • DNA sequences can be manipulated with standard procedures such as restriction enzyme digestion, fill-in with DNA polymerase, deletion by exonuclease, extension by terminal deoxynucleotide transferase, ligation of synthetic or cloned DNA sequences, site- directed sequence-alteration via single-stranded bacteriophage intermediate or with the use of specific oligonucleotides in combination with PCR or other in vitro amplification.
  • the cDNA sequence (or portions derived from it) or a mini gene (a cDNA with an intron and its own promoter) may be introduced into eukaryotic expression vectors by conventional techniques. These vectors are designed to permit the transcription of the cDNA in eukaryotic cells by providing regulatory sequences that initiate and enhance the transcription of the cDNA and ensure its proper splicing and polyadenylation. Vectors containing the promoter and enhancer regions of the S V40 or long terminal repeat (LTR) of the Rous Sarcoma virus and polyadenylation and splicing signal from SV40 are readily available (Mulligan et al, Proc. Natl. Acad. Sci.
  • LTR long terminal repeat
  • the level of expression of the cDNA can be manipulated with this type of vector, either by using promoters that have different activities (for example, the baculo virus pAC373 can express cDNAs at high levels in S. frugiperda cells (Summers and Smith, In Genetically Altered Viruses and the
  • some vectors contain selectable markers such as the gpt (Mulligan and Berg, Proc. Natl. Acad. ScL USA 78:2072-2076, 1981) or neo (Southern and Berg. J. MoI Appl. Genet. 1 :327-341, 1982) bacterial genes. These selectable markers permit selection of transfected cells that exhibit stable, long-term expression of the vectors (and therefore the cDNA).
  • the vectors can be maintained in the cells as episomal, freely replicating entities by using regulatory elements of viruses such as papilloma (Sarver et al, MoI. Cell Biol. 1 :486, 1981) or Epstein-Barr (Sugden et al, MoI Cell Biol.
  • the vectors are introduced into the recipient cells as pure DNA (transfection) by, for example, precipitation with calcium phosphate (Graham and vander Eb, Virology 52:466, 1973) or strontium phosphate (Brash et al, MoI. Cell Biol. 7:2013, 1987), electroporation (Neumann et al, EMBO J 1 :841, 1982), lipofection (Feigner et al, Proc. Natl. Acad. Sci USA 84:7413, 1987), DEAE dextran (McCuthan et al, J. Natl Cancer Inst.
  • the cDNA, or fragments thereof can be introduced by infection with virus vectors.
  • Systems are developed that use, for example, retroviruses (Bernstein et al, Gen. Engr'g 7:235, 1985), adenoviruses (Ahmad et al, J. Virol. 57:267, 1986), or Herpes virus (Spaete et al, Cell 30:295, 1982).
  • Protein encoding sequences can also be delivered to target cells in vitro via non-infectious systems, for instance liposomes.
  • the expression vectors containing MUC5AC sequence or cDNA can be introduced into human cells, mammalian cells from other species or non-mammalian cells as desired.
  • the choice of cell is determined by the purpose of the treatment.
  • monkey COS cells Gluzman, Cell 23: 175-182, 1981
  • Chinese hamster ovary CHO
  • mouse NIH 3T3 fibroblasts or human fibroblasts or lymphoblasts may be used.
  • the present disclosure thus encompasses recombinant vectors that comprise all or part of MUC5AC variant gene or cDNA sequences, or a regulatory sequence thereof, for expression in a suitable host.
  • the DNA is operatively linked in the vector to an expression control sequence in the recombinant DNA molecule so that a polypeptide can be expressed, or the regulatory sequence is operatively linked to a reporter gene.
  • the expression control sequence may be selected from the group consisting of sequences that control the expression of genes of prokaryotic or eukaryotic cells and their viruses and combinations thereof.
  • the expression control sequence may be specifically selected from the group consisting of the lac system, the trp system, the tac system, the trc system, major operator and promoter regions of phage lambda, the control region of fd coat protein, the early and late promoters of SV40, promoters derived from polyoma, adenovirus, retrovirus, baculovirus and simian virus, the promoter for 3-phosphoglycerate kinase, the promoters of yeast acid phosphatase, the promoter of the yeast alpha-mating factors and combinations thereof.
  • the host cell which may be transfected with the vector of this disclosure, may be selected from the group consisting of E. coli, Pseudomonas, Bacillus subtilis, Bacillus stearothermophilus or other bacilli; other bacteria; yeast; fungi; insect; mouse or other animal; or plant hosts; or human tissue cells. It is appreciated that for mutant or variant MUC5 AC DNA sequences, similar systems are employed to express and produce the mutant product.
  • fragments of these proteins can be expressed essentially as detailed above. Such fragments include individual mucin protein domains or sub-domains, as well as shorter fragments such as peptides. Protein fragments having therapeutic properties may be expressed in this manner also, including for instance substantially soluble fragments.
  • Monoclonal or polyclonal antibodies may be produced to either a wildtype or reference protein or specific allelic forms of these proteins, for instance particular portions that contain a differential amino acid encoded by a SNP and therefore may provide a distinguishing epitope, for instance antibodies produced to a mucin protein or peptide.
  • antibodies raised (generated) against these proteins or peptides would specifically detect the protein or peptide with which the antibodies are generated. That is, an antibody generated to a specified target protein or a fragment thereof would recognize and bind that protein and would not substantially recognize or bind to other proteins found in target cells, for instance human cells.
  • an antibody is specific for (or measurably preferentially binds to) an epitope in a variant protein (e.g., an allele ofMUC5AC as described herein) versus the reference protein, or vice versa.
  • a variant protein e.g., an allele ofMUC5AC as described herein
  • the determination that an antibody specifically detects a target protein or form of the target protein is made by any one of a number of standard immunoassay methods; for instance, the western blotting technique (Sambrook et ah, In Molecular Cloning: A Laboratory Manual, CSHL, New York, 1989).
  • a given antibody preparation (such as one produced in a mouse) specifically detects the target protein by western blotting
  • total cellular protein is extracted from human cells (for example, lymphocytes) and electrophoresed on a sodium dodecyl sulfate-polyacrylamide gel.
  • the proteins are then transferred to a membrane (for example, nitrocellulose) by western blotting, and the antibody preparation is incubated with the membrane. After washing the membrane to remove non-specifically bound antibodies, the presence of specifically bound antibodies is detected by the use of an anti-mouse antibody conjugated to an enzyme such as alkaline phosphatase.
  • alkaline phosphatase substrate 5-bromo-4-chloro-3-indolyl phosphate/nitro blue tetrazolium results in the production of a dense blue compound by immunolocalized alkaline phosphatase.
  • Antibodies that specifically detect the target protein will, by this technique, be shown to bind to the target protein band (which will be localized at a given position on the gel determined by its molecular weight). Non-specific binding of the antibody to other proteins may occur and may be detectable as a weak signal on the Western blot. The non-specific nature of this binding will be recognized by one skilled in the art by the weak signal obtained on the Western blot relative to the strong primary signal arising from the specific antibody-target protein binding.
  • Substantially pure mucin protein or protein fragment (peptide) suitable for use as an immunogen may be isolated from the transfected or transformed cells as described above, or using equivalent well known techniques. Concentration of protein or peptide in the final preparation is adjusted, for example, by concentration on an Amicon filter device, to the level of a few micrograms per milliliter. Monoclonal or polyclonal antibody to the protein can then be prepared as follows:
  • Monoclonal antibody to epitopes of the target protein identified and isolated as described can be prepared from murine hybridomas according to the classical method of Kohler and Milstein ⁇ Nature 256:495-497, 1975) or derivative methods thereof. Briefly, a mouse is repetitively inoculated with a few micrograms of the selected protein over a period of a few weeks. The mouse is then sacrificed, and the antibody-producing cells of the spleen isolated. The spleen cells are fused by means of polyethylene glycol with mouse myeloma cells, and the excess un-fused cells destroyed by growth of the system on selective media comprising aminopterin (HAT media).
  • HAT media aminopterin
  • the successfully fused cells are diluted and aliquots of the dilution placed in wells of a microtiter plate where growth of the culture is continued.
  • Antibody-producing clones are identified by detection of antibody in the supernatant fluid of the wells by immunoassay procedures, such as ELISA, as originally described by Engvall (Meth. Enzymol. 70:419-439, 1980) and derivative methods thereof. Selected positive clones can be expanded and their monoclonal antibody product harvested for use. Detailed procedures for monoclonal antibody production are described in Harlow and Lane ⁇ Antibodies, A Laboratory Manual, CSHL, New York, 1988).
  • Polyclonal antiserum containing antibodies to heterogeneous epitopes of a single protein can be prepared by immunizing suitable animals with the expressed protein, which can be unmodified or modified to enhance immunogenicity. Effective polyclonal antibody production is affected by many factors related both to the antigen and the host species. For example, small molecules tend to be less immunogenic than others and may require the use of carriers and adjuvant. Also, host animals vary in response to site of inoculations and dose, with either inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng level) of antigen administered at multiple intradermal sites appear to be most reliable.
  • Booster injections can be given at regular intervals, and antiserum harvested when antibody titer thereof, as determined semi-quantitatively, for example, by double immunodiffusion in agar against known concentrations of the antigen, begins to fall. See, for example, Ouchterlony et al. (In Handbook of Experimental Immunology, Wier, D. (ed.) chapter 19. Blackwell, 1973). Plateau concentration of antibody is usually in the range of about 0.1 to 0.2 mg/ml of serum (about 12 ⁇ M). Affinity of the antisera for the antigen is determined by preparing competitive binding curves, as described, for example, by Fisher ⁇ Manual of Clinical Immunology, Ch. 42, 1980).
  • a third approach to raising antibodies against a specific protein or peptide is to use one or more synthetic peptides synthesized on a commercially available peptide synthesizer based upon the predicted amino acid sequence of the protein or peptide.
  • Polyclonal antibodies can be generated by injecting these peptides into, for instance, rabbits or mice.
  • Antibodies may be raised against proteins and peptides by subcutaneous injection of a DNA vector that expresses the desired protein or peptide, or a fragment thereof, into laboratory animals, such as mice. Delivery of the recombinant vector into the animals may be achieved using a hand-held form of the Biolistic system (Sanford et al, Paniculate Sci. Technol. 5:27-37, 1987) as described by Tang et al. ⁇ Nature 356: 152-154, 1992).
  • Expression vectors suitable for this purpose may include those that express a protein-encoding sequence (for instance, a protein encoding a mucin, such as MUC5AC) under the transcriptional control of either the human ⁇ -actin promoter or the cytomegalovirus (CMV) promoter.
  • a protein-encoding sequence for instance, a protein encoding a mucin, such as MUC5AC
  • CMV cytomegalovirus
  • Antibody preparations prepared according to these protocols are useful in quantitative immunoassays which determine concentrations of antigen-bearing substances in biological samples; they are also used semi-quantitatively or qualitatively to identify the presence of antigen in a biological sample; or for immunolocalization of the specified protein.
  • antibodies e.g., mucin-specif ⁇ c monoclonal antibodies
  • Antibodies with a desired binding specificity can be commercially humanized (Scotgene, Scotland, UK; Oxford Molecular, Palo Alto, CA).
  • Antibodies can be produced that specifically recognize protein variants (and peptides derived therefrom).
  • production of antibodies (and fragments and engineered versions thereof) that recognize at least one variant protein with a higher affinity than they recognize a corresponding protein is beneficial, as the resultant antibodies can be used in analysis, diagnosis and treatment ⁇ e.g., inhibition or enhancement of protein action, such as for instance inhibition or enhancement of a biological activity of MUC5AC), as well as in study and examination of the proteins themselves.
  • such regions include any peptide (usually four or more amino acids in length) that overlaps with one or more of SNP-encoded variants in a coding sequence described herein. Longer peptides also can be used, and in some instances will produce a stronger or more reliable immunogenic response. Thus, it is contemplated in some embodiments that more than four amino acids are used to elicit the immune response, for instance, at least 5, at least 6, at least 8, at least 10, at least 12, at least 15, at least 18, at least 20, at least 25, or more, such as 30, 40, 50, or even longer peptides. Also, it will be understood by those of ordinary skill that it is beneficial in some instances to include adjuvants and other immune response enhancers, including passenger peptides or proteins, when using peptides to induce an immune response for production of antibodies.
  • Embodiments are not limited to antibodies that recognize epitopes containing the actual mutation identified in each variant. Instead, it is contemplated that variant-specific antibodies also may each recognize an epitope located anywhere throughout the specified variant molecule, which epitopes are changed in conformation and/or availability because of the mutation. Antibodies directed to any of these variant-specific epitopes are also encompassed herein.
  • Kits which contain the necessary reagents for determining the presence or absence of variation(s) in a mucin-encoding sequence, such as probes or primers specific for the MUC5AC gene or a variable region therein, such as those regions indicated by the gene region of transcripts designated as XM_001714774.1 (MUC5 AC).
  • Such kits can be used with the methods described herein to determine whether a subject is predisposed to pulmonary fibrosis or development of fibrosis of the small airways, or whether the subject is expected to respond to one or another therapy, such as a mucin supplement or replacement therapy.
  • the provided kits may also include written instructions. The instructions can provide calibration curves or charts to compare with the determined ⁇ e.g., experimentally measured) values.
  • Oligonucleotide probes and primers can be supplied in the form of a kit for use in detection of a predisposition to pulmonary fibrosis in a subject.
  • a kit for use in detection of a predisposition to pulmonary fibrosis in a subject an appropriate amount of one or more of the oligonucleotide primers is provided in one or more containers.
  • the oligonucleotide primers may be provided suspended in an aqueous solution or as a freeze-dried or lyophilized powder, for instance.
  • the container(s) in which the oligonucleotide(s) are supplied can be any conventional container that is capable of holding the supplied form, for instance, microfuge tubes, ampoules, or bottles.
  • pairs of primers may be provided in pre-measured single use amounts in individual, typically disposable, tubes or equivalent containers.
  • a mucin variation e.g., a SNP in MUC5 AC
  • the sample to be tested for the presence of a mucin variation can be added to the individual tubes and amplification carried out directly.
  • each oligonucleotide primer supplied in the kit can be any appropriate amount, depending for instance on the market to which the product is directed. For instance, if the kit is adapted for research or clinical use, the amount of each oligonucleotide primer provided would likely be an amount sufficient to prime several PCR amplification reactions. Those of ordinary skill in the art know the amount of oligonucleotide primer that is appropriate for use in a single amplification reaction. General guidelines may for instance be found in Innis et al. (PCR Protocols, A Guide to Methods and Applications, Academic Press, Inc., San Diego, CA, 1990), Sambrook et al. (In Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, New York, 1989), and Ausubel et al. (In Current Protocols in Molecular Biology, Greene Publ. Assoc, and Wiley- Intersciences, 1992).
  • a kit may include more than two primers, in order to facilitate the in vitro amplification of mucin sequences, for instance the MUC5AC gene or the 5' or 3' flanking region thereof.
  • kits may also include the reagents necessary to carry out nucleotide amplification reactions, including, for instance, DNA sample preparation reagents, appropriate buffers (e.g., polymerase buffer), salts (e.g., magnesium chloride), and deoxyribonucleotides (dNTPs). Kits may in addition include either labeled or unlabeled oligonucleotide probes for use in detection of mucin variant sequence(s). In certain embodiments, these probes will be specific for a potential mutation that may be present in the target amplified sequences.
  • appropriate buffers e.g., polymerase buffer
  • salts e.g., magnesium chloride
  • dNTPs deoxyribonucleotides
  • sequences for such a probe will be any sequence that includes one or more of the identified polymorphic sites, particularly nucleotide positions that overlap with the variants shown herein, such that the sequence the probe is complementary to a polymorphic site and the surrounding mucin encoding sequence.
  • kits for use in the amplification reactions.
  • the design of appropriate positive control sequences is well known to one of ordinary skill in the appropriate art.
  • Kits similar to those disclosed above for the detection of mucin sequence variations directly can be used to detect mucin mRNA expression, such as over- or under-expression.
  • Such kits include an appropriate amount of one or more oligonucleotide primers for use in, for instance, reverse transcription PCR reactions, similarly to those provided above with art- obvious modifications for use with RNA amplification.
  • kits for detection of altered expression of MUC5AC mRNA may also include some or all of the reagents necessary to carry out RT-PCR in vitro amplification reactions, including, for instance, RNA sample preparation reagents (including e.g., an RNase inhibitor), appropriate buffers (e.g., polymerase buffer), salts (e.g., magnesium chloride), and deoxyribonucleotides (dNTPs).
  • RNA sample preparation reagents including e.g., an RNase inhibitor
  • appropriate buffers e.g., polymerase buffer
  • salts e.g., magnesium chloride
  • dNTPs deoxyribonucleotides
  • kits may in addition include either labeled or unlabeled oligonucleotide probes for use in detection of the in vitro amplified target sequences.
  • the appropriate sequences for such a probe will be any sequence that falls between the annealing sites of the two provided oligonucleotide primers, such that the sequence the probe is complementary to is amplified during the PCR reaction.
  • these probes will be specific for a potential mutation that may be present in the target amplified sequences, for instance specific for the single nucleotide polymorphism AADDOl 112371.1_1 1838, rs34474233, or rs34815853 (all in MUC5AC). Additional SNPs are described herein. It may also be advantageous to provide in the kit one or more control sequences for use in the RT-PCR reactions. The design of appropriate positive control sequences is well known to one of ordinary skill in the appropriate art.
  • kits may be provided with the necessary reagents to carry out quantitative or semi-quantitative Northern analysis of mucin mRNA.
  • kits include, for instance, at least one mucin-specific oligonucleotide for use as a probe.
  • This oligonucleotide may be labeled in any conventional way, including with a selected radioactive isotope, enzyme substrate, co-factor, ligand, chemiluminescent or fluorescent agent, hapten, or enzyme.
  • probes will be specific for a potential mutation that may be present in the target amplified sequence, such as the mutations disclosed herein.
  • Kits for the detection of mucin protein expression are also encompassed.
  • Such kits may include at least one target protein specific binding agent (e.g., a polyclonal or monoclonal antibody or antibody fragment that specifically recognizes the mucin protein) and may include at least one control (such as a determined amount of mucin protein, or a sample containing a determined amount of mucin protein).
  • the target protein specific binding agent and control may be contained in separate containers.
  • the mucin protein expression detection kits may also include a means for detecting mucin:binding agent complexes, for instance the agent may be detectably labeled. If the detectable agent is not labeled, it may be detected by second antibodies or protein A for example, which may also be provided in some kits in one or more separate containers. Such techniques are well known.
  • kits may include instructions for carrying out the assay. Instructions will allow the tester to determine whether MUC5AC expression levels are elevated. Reaction vessels and auxiliary reagents such as chromogens, buffers, enzymes, etc. may also be included in the kits.
  • kits for Detection of Homozygous versus Heterozygous Allelism are provided. kits that allow differentiation between individuals who are homozygous versus heterozygous for the AADDOl 112371.1_11838, rs34474233, or rs34815853 SNP (all in MUC5AC); or any combination thereof. Additional SNPs are described herein.
  • kits provide the materials necessary to perform oligonucleotide ligation assays (OLA), as described at Nickerson et al. (Proc. Natl. Acad. ScL USA 87:8923- 8927, 1990).
  • these kits contain one or more microtiter plate assays, designed to detect mutation(s) in the mucin sequence(s) of a subject, as described herein.
  • kits may include instructions for carrying out the assay. Instructions will allow the tester to determine whether a mucin allele is homozygous or heterozygous. Reaction vessels and auxiliary reagents such as chromogens, buffers, enzymes, etc. may also be included in the kits. It may also be advantageous to provide in the kit one or more control sequences for use in the OLA reactions. The design of appropriate positive control sequences is well known to one of ordinary skill in the appropriate art.
  • EXAMPLE 11 Screening Assays for Compounds that Modulate Expression or Activity of a Target (such as MUC 5 AC)
  • the following assays are designed to identify compounds that interact with (e.g., bind to) a variant form of a MUC5AC, compounds that interact with (e.g., bind to) intracellular proteins that interact with such a variant form, compounds that interfere with the interaction of MUC5 AC with transmembrane or intracellular proteins involved in signal transduction, and to compounds which modulate the activity of MUC5AC (i.e., modulate the level of gene expression) or modulate the level of activity of a variant form of MUC5AC. Assays may additionally be utilized which identify compounds which bind to MUC5AC regulatory sequences (e.g., promoter sequences) and which may modulate gene expression. See, e.g., Platt, J. Biol.
  • Chem. 269:28558-28562, 1994 can be used to identify compounds that interact in any of the ways listed above with another gene, regulatory sequence, gene corresponding with an EST, or protein encoded thereby, from the region of 1 lpter described herein as being linked to susceptibility to pulmonary fibrosis.
  • the compounds which may be screened in accordance with the disclosure include, but are not limited to peptides, antibodies and fragments thereof, and other organic compounds (e.g., peptidomimetics, small molecules) that bind to one or more variant sequences (including variant regulatory sequences or encoding sequences) as described herein and either mimic the activity triggered by the natural ligand (i.e., agonists) or inhibit the activity triggered by the natural ligand (i.e., antagonists); as well as peptides, antibodies or fragments thereof, and other organic compounds that mimic the a variant (or a portion thereof) and bind to and "neutralize" natural ligand.
  • organic compounds e.g., peptidomimetics, small molecules
  • Such compounds may include, but are not limited to, peptides such as, for example, soluble peptides, including but not limited to members of random peptide libraries; (see, e.g., Lam et al, Nature 354:82-84, 1991 ; Houghten et al, Nature 354:84-86, 1991), and combinatorial chemistry-derived molecular library made of D- and/or L- configuration amino acids, phosphopeptides (including, but not limited to, members of random or partially degenerate, directed phosphopeptide libraries; see, e.g., Songyang et al, Cell 72:767-778, 1993), antibodies (including, but not limited to, polyclonal, monoclonal, humanized, anti- idiotypic, chimeric or single chain antibodies, and Fab, F(ab') 2 and Fab expression library fragments, and epitope-binding fragments thereof), and small organic or inorganic molecules.
  • peptides such as, for example,
  • Other compounds which can be screened in accordance with the disclosure include but are not limited to small organic molecules that are able to gain entry into an appropriate cell and affect the expression of MUC5AC gene or some other gene involved in a related signal transduction pathway (e.g., by interacting with the regulatory region or transcription factors involved in gene expression); or such compounds that affect the activity of a variant MUC5 AC or the activity of some other intracellular factor involved in the signal transduction pathway.
  • Computer modeling and searching technologies permit identification of compounds, or the improvement of already identified compounds, that can modulate expression or activity of a variant target protein. Having identified such a compound or composition, the active/binding/effector sites or regions are identified.
  • Such active sites typically might be ligand binding sites, such as the interaction domains of a molecule with a variant MUC5AC itself or a sequence encoding the protein or regulating the expression thereof, or the interaction domains of a molecule with a specific allelic variant in comparison to the interaction domains of that molecule with another variant of the protein.
  • the active site can be identified using methods known in the art including, for example, from the amino acid sequences of peptides, from the nucleotide sequences of nucleic acids, or from study of complexes of the relevant compound or composition with its natural ligand. In the latter case, chemical methods can be used to find the active site by finding where on the factor the complexed ligand is found. Next, the three dimensional geometric structure of the active site is determined. This can be done by known methods can determine a complete molecular structure. On the other hand, solid or liquid phase NMR can be used to determine certain intra-molecular distances. Any other experimental method of structure determination can be used to obtain partial or complete geometric structures, such as high resolution electron microscopy.
  • the geometric structures may be measured with a complexed ligand, natural or artificial, which may increase the accuracy of the active site structure determined.
  • the structure of the specified target protein is compared to that of a "variant" of the specified protein and, rather than solve the entire structure, the structure is solved for the protein domains that are changed.
  • the methods of computer based numerical modeling can be used to complete the structure or improve its accuracy.
  • Any recognized modeling method may be used, including parameterized models specific to particular biopolymers such as proteins or nucleic acids, molecular dynamics models based on computing molecular motions, statistical mechanics models based on thermal ensembles, or combined models.
  • standard molecular force fields representing the forces between constituent atoms and groups, are necessary, and can be selected from force fields known in physical chemistry.
  • the incomplete or less accurate experimental structures can serve as constraints on the complete and more accurate structures computed by these modeling methods.
  • candidate modulating compounds can be identified by searching databases containing compounds along with information on their molecular structure. Such a search seeks compounds having structures that match the determined active site structure and that interact with the groups defining the active site. Such a search can be manual, but is preferably computer assisted. These compounds found from this search are potential MUC5AC modulating compounds. Alternatively, these methods can be used to identify improved modulating compounds from an already known modulating compound or ligand. The composition of the known compound can be modified and the structural effects of modification can be determined using the experimental and computer modeling methods described above applied to the new composition. The altered structure is then compared to the active site structure of the compound to determine if an improved fit or interaction results. In this manner systematic variations in composition, such as by varying side groups, can be quickly evaluated to obtain modified modulating compounds or ligands of improved specificity or activity.
  • the structure of a specified protein or nucleic acid sequence, such as a regulatory sequence, is compared to that of a variant protein or sequence (encoded by a different allele of the same protein, or a variant non-coding nucleic acid sequence such as a regulatory sequence containing one or more SNPs). Then, potential inhibitors (or enhancers) are designed that bring about a structural change in the reference form so that it resembles the variant form. Or, potential mimics are designed that bring about a structural change in the variant form so that it resembles another variant form, or the form of the reference receptor.
  • potential inhibitors or enhancers
  • potential mimics are designed that bring about a structural change in the variant form so that it resembles another variant form, or the form of the reference receptor.
  • the inhibitors, enhancers, or mimics may influence the binding of one or more other proteins to the nucleic acid sequence, for instance in a way that affects the transcription of an encoding sequence that is operably linked to that nucleic acid sequence.
  • Further experimental and computer modeling methods useful to identify modulating compounds based upon identification of the active sites of compounds, various variants of MUC5 AC, regulatory regions thereof, and other sequences or proteins encoded for in the region of 1 lpter described herein, and related transduction and transcription factors will be apparent to those of skill in the art.
  • Examples of molecular modeling systems are the CHARMM and QUANTA programs (Polygen Corporation, Waltham, Mass.). CHARMM performs the energy minimization and molecular dynamics functions.
  • QUANTA performs the construction, graphic modeling and analysis of molecular structure. QUANTA allows interactive construction, modification, visualization, and analysis of the behavior of molecules with each other.
  • Compounds identified via assays such as those described herein may be useful, for example, in elaborating the biological function of a variant MUC5 AC gene product, and for designing therapeutic molecules useful in the diagnosis and/or treatment of pulmonary fibrosis.
  • EXAMPLE 12 In vitro Screening Assays for Compounds that Bind to a Nucleotide Variant
  • In vitro systems may be used to identify compounds capable of interacting with ⁇ e.g., binding to) a variant protein or nucleic acid sequence including one or more of the SNPs described herein.
  • Compounds identified using such systems may be useful, for example, in modulating the activity of "wild type” (reference) and/or "variant” gene products (such as MUC5AC); may be useful in elaborating the biological function of such proteins; may be utilized in screens for identifying compounds that disrupt normal protein-protein or protein- nucleic acid interactions; may in themselves disrupt such interactions; or may be used to study or characterize the regulation of gene expression, for instance expression of MUC5AC or a reporter protein linked to a regulatory sequence from MUC5AC or another gene or EST from l lpter.
  • One type of assay that can be used to identify compounds that bind to a variant molecule involves preparing a reaction mixture of a variant molecule and a test compound under conditions and for a time sufficient to allow the two components to interact and bind, thus forming a complex which can be removed and/or detected in the reaction mixture.
  • the molecular species used can vary depending upon the goal of the screening assay.
  • the full length protein e.g., MUC5AC
  • a soluble truncated portion thereof or a fusion protein containing a variant peptide fused to a protein or polypeptide that affords advantages in the assay system (e.g., labeling, isolation of the resulting complex, etc.)
  • advantages in the assay system e.g., labeling, isolation of the resulting complex, etc.
  • compounds that interact with a nucleic acid sequence such as a regulatory or putative regulatory sequence
  • oligonucleotides corresponding to a variant sequence containing at least one SNP position as discussed herein
  • fusion nucleic acid molecules containing a variant sequence can be used.
  • the screening assays can be conducted in a variety of ways.
  • one method to conduct such an assay involves anchoring a variant molecule (such as a protein, polypeptide, peptide or fusion protein, or nucleic acid) or the test substance(s), onto a solid phase and detecting variant molecule/test compound complexes anchored on the solid phase at the end of the reaction.
  • the variant molecule(s) may be anchored onto a solid surface, and the test compound(s), which is not anchored, may be labeled, either directly or indirectly.
  • microtiter plates may conveniently be utilized as the solid phase.
  • the anchored component may be immobilized by non-covalent or covalent attachments.
  • Non- covalent attachment may be accomplished by simply coating the solid surface (or a portion thereof) with a solution containing the protein (or nucleic acid) and drying.
  • an immobilized specific binding agent such as an antibody, preferably a monoclonal antibody, specific for the protein to be immobilized may be used to anchor the protein to the solid surface.
  • the surfaces may be prepared in advance and stored.
  • the nonimmobilized component is added to the coated surface containing the anchored component. After the reaction is complete, unreacted components are removed (e.g. , by washing) under conditions such that any complexes formed will remain immobilized on the solid surface.
  • the detection of complexes anchored on the solid surface can be accomplished in a number of ways. Where the previously nonimmobilized component is pre-labeled, the detection of label immobilized on the surface indicates that complexes were formed.
  • an indirect label can be used to detect complexes anchored on the surface; e.g., using a labeled antibody specific for the previously nonimmobilized component (the antibody, in turn, may be directly labeled or indirectly labeled with a labeled anti-Ig antibody).
  • a reaction can be conducted in a liquid phase, the reaction products separated from unreacted components, and complexes detected.
  • detection can involve using an immobilized binding agent specific for the variant molecule (such as an antibody or other binding agent specific for a variant protein, polypeptide, peptide or fusion protein (for instance, MUC5AC)) or specific for the test compound, to anchor or capture any complexes formed in solution, and a labeled antibody (or other binding agent) specific for the other component of the possible complex to detect anchored complexes.
  • cell-based assays can be used to identify compounds that interact with a variant molecule.
  • cell lines that express a variant molecule such as a variant MUC5AC encoding sequence or a regulatory sequence variant or other non-coding sequence variant (or combination of two or more variants) or cell lines (e.g., COS cells, CHO cells, HEK293 cells, etc.) that have been genetically engineered to express a variant (e.g., by transfection or transduction of protein encoding DNA) can be used.
  • Interaction of the test compound with, for example, a variant protein (e.g., MUC5AC) expressed by the host cell, or a variant nucleic acid sequence present in the host cell can be determined by comparison or competition with a host cell not treated with the compound, or treated with another compound, or by examining one or more biological characteristics linked to the variant (such as pulmonary fibrosis).
  • a variant protein e.g., MUC5AC
  • a variant nucleic acid sequence present in the host cell can be determined by comparison or competition with a host cell not treated with the compound, or treated with another compound, or by examining one or more biological characteristics linked to the variant (such as pulmonary fibrosis).
  • variant molecules such as a variant nucleic acid or polypeptide (such as those described herein) may be employed in a screening process for compounds which bind the variant molecule and which activate (agonists) or inhibit activation (antagonists) of the molecule or one linked thereto.
  • variant molecules described herein also may be used to assess the binding of small molecule substrates and ligands in, for example, cells, cell-free preparations, chemical libraries, and natural product mixtures. These substrates and ligands may be natural substrates and ligands or may be structural or functional mimetics. See Coligan et al. Current Protocols in Immunology 1 (2): Chapter 5, 1991.
  • such screening procedures involve providing appropriate cells that express a polypeptide of the present disclosure, or a reporter polypeptide operably linked to a non- coding variant nucleic acid found at 1 lpter.
  • Such cells include cells from mammals, insects, yeast, and bacteria.
  • a polynucleotide regulatory sequence or polynucleotide encoding the polypeptide is employed to transfect cells to thereby express a variant molecule.
  • the cell expressing the variant polypeptide or variant nucleic acid is then contacted with a test compound to observe binding, stimulation or inhibition of a functional response.
  • the technique may also be employed for screening of compounds which activate a molecule of the present disclosure by contacting such cells with compounds to be screened and determining whether such compound generates a signal, i.e., activates the polypeptide or reporter polypeptide.
  • Another method involves screening for compounds which are antagonists, and thus inhibit activation of a molecule of the present disclosure by determining inhibition of binding of labeled ligand, such as a factor that binds to a nucleic acid of the disclosure, to cells expressing the variant molecule or a reporter gene operably linked to a non-coding nucleic acid (such as a regulatory region).
  • labeled ligand such as a factor that binds to a nucleic acid of the disclosure
  • a reporter gene operably linked to a non-coding nucleic acid (such as a regulatory region).
  • Such a method involves transfecting a eukaryotic cell with a DNA encoding a variant molecule such that the cell expresses the molecule (or expresses a reporter gene under the control of a non-coding region containing a variant SNP or haplotype as described herein).
  • the cell is then contacted with a potential antagonist in the presence of a labeled form of a ligand or binding factor.
  • the ligand/factor can be labeled, e.g., with radioactivity.
  • the amount of labeled ligand/factor bound to the variant molecule is measured, e.g., by measuring radioactivity associated with transfected cells or membrane another fraction from these cells. If the compound binds to the variant molecule, the binding of labeled ligand/factor to the variant is inhibited as determined by a reduction of labeled ligand/factor that binds.
  • EXAMPLE 13 Pharmaceutical Preparations and Methods of Administration Therapeutic compounds and agents can be administered directly to the mammalian subject for modulation of MUC5AC activity or expression, or the activity or expression of another gene, EST, or protein encoded by a gene or EST found in the 1 lpter region as described herein. Administration is by any of the routes normally used for introducing a modulator compound into ultimate contact with the tissue to be treated.
  • the compounds or agents, alone or accompanied by one or more additional therapeutic agents, are administered in any suitable manner, optionally with pharmaceutically acceptable carrier(s). Suitable methods of administering such compounds/agents are available and well known to those of ordinary skill in the art, and, although more than one route can be used to administer a particular composition, a particular route can often provide a more immediate and more effective reaction than another route.
  • compositions of the present invention are determined in part by the particular composition being administered, as well as by the particular method used to administer the composition. Accordingly, there is a wide variety of suitable formulations of pharmaceutical compositions of the present invention (see, e.g. , Remington 's Pharmaceutical Sciences, 17 l ed. 1985).
  • Formulations suitable for administration include aqueous and non-aqueous solutions, isotonic sterile solutions, which can contain antioxidants, buffers, bacteriostats, and solutes that render the formulation isotonic, and aqueous and non-aqueous sterile suspensions that can include suspending agents, solubilizers, thickening agents, stabilizers, and preservatives.
  • Compositions can be administered, for example, orally, parenterally, intrathecally, and so forth.
  • compositions of compounds can be presented in unit-dose or multi-dose sealed containers, such as ampoules and vials. Solutions and suspensions can be prepared from sterile powders, granules, and tablets of the kind previously described. The compounds/agents also can be optionally administered as part of a prepared food or drug.
  • the dose administered to a subject should be sufficient to affect a beneficial response in the subject over time.
  • the dose will be determined by the efficacy of the particular compound/agent employed and the condition of the subject, as well as the body weight or surface area of the area to be treated, and whether the subject is being treated prophylactically or after the identification and diagnosis of a specific disease, condition, or disorder.
  • the size of the dose also may be influenced by the existence, nature, and extent of any adverse side-effects that accompany the administration of a particular compound in a particular subject.
  • a physician may evaluate circulating plasma levels of the modulator, modulator toxicities, and the production of anti-modulator antibodies.
  • the dose equivalent of a modulator is from about 1 ng/kg to 10 mg/kg for a typical subject.
  • therapeutic compounds of the present disclosure can be administered at a rate determined by the LD 50 of the modulator, and the side effects of the inhibitor at various concentrations, as applied to the mass and overall health of the subject. Administration can be accomplished via single or divided doses.
  • Retroviruses have been considered a preferred vector for experiments in gene therapy, with a high efficiency of infection and stable integration and expression (Orkin et al, Prog. Med. Genet. 7: 130-142, 1988).
  • the full-length MUC5AC gene or cDNA can be cloned into a retroviral vector and driven from either its endogenous promoter or from the retroviral LTR (long terminal repeat).
  • viral transfection systems may also be utilized for this type of approach, including adenovirus, adeno-associated virus (AAV) (McLaughlin et al, J. Virol. 62: 1963-1973, 1988), Vaccinia virus (Moss et al, Annu. Rev. Immunol. 5:305-324, 1987), Bovine Papilloma virus (Rasmussen et al, Methods Enzymol 139:642-654, 1987) or members of the herpesvirus group such as Epstein-Barr virus (Margolskee et al, MoI Cell. Biol. 8:2837-2847, 1988).
  • AAV adeno-associated virus
  • RNA-DNA hybrid oligonucleotides as described by Cole-Strauss et al ⁇ Science 273: 1386-1389, 1996. This technique may allow for site-specific integration of cloned sequences, thereby permitting accurately targeted gene replacement.
  • lipidic and liposome-mediated gene delivery has recently been used successfully for transfection with various genes (for reviews, see Templeton and Lasic, MoI. Biotechnol 11 : 175-180, 1999; Lee and Huang, Crit. Rev. Ther. Drug Carrier Syst. 14:173-206; and Cooper, Semin. Oncol. 23:172- 187, 1996).
  • cationic liposomes have been analyzed for their ability to transfect monocytic leukemia cells, and shown to be a viable alternative to using viral vectors (de Lima et al, MoI Membr. Biol.
  • Such cationic liposomes can also be targeted to specific cells through the inclusion of, for instance, monoclonal antibodies or other appropriate targeting ligands (Kao et al, Cancer Gene Ther. 3:250-256, 1996).
  • gene therapy can be carried out using antisense or other suppressive constructs, the construction of which is discussed above and are well known in the art.
  • Mutant organisms that under-express or over-express a specific mucin protein (or more than one mucin) are useful for research. Such mutants allow insight into the physiological and/or pathological role of particular mucins in a healthy and/or pathological organism.
  • These mutants are "genetically engineered,” meaning that information in the form of nucleotides has been transferred into the mutant's genome at a location, or in a combination, in which it would not normally exist. Nucleotides transferred in this way are said to be “non-native.” For example, a non-mucin promoter inserted upstream of a native mucin encoding sequence would be non-native.
  • Mutants may be, for example, produced from mammals, such as mice, that either over-express MUC5AC or under-express MUC5AC, or that do not express MUC5AC at all, or any combination thereof.
  • Over-expression mutants are made by increasing the number of MUC5 AC genes in the organism, or by introducing a MUC5AC gene into the organism under the control of a constitutive or inducible or viral promoter such as the mouse mammary tumor virus (MMTV) promoter or the whey acidic protein (WAP) promoter or the metallothionein promoter.
  • MMTV mouse mammary tumor virus
  • WAP whey acidic protein
  • Mutants that under-express MUC5AC may be made by using an inducible or repressible promoter, or by deleting the MUC5AC gene, or by destroying or limiting the function of the MUC5AC gene, for instance by disrupting the gene by transposon insertion.
  • Antisense genes or siRNAs may be engineered or introduced into the organism, under a constitutive or inducible promoter, to decrease or prevent MUC5AC expression.
  • a gene is "functionally deleted" when genetic engineering has been used to negate or reduce gene expression to negligible levels.
  • a mutant is referred to in this application as having the mucin gene altered or functionally deleted, this refers to the mucin gene and to any ortholog of this gene.
  • a mutant is referred to as having "more than the normal copy number" of a gene, this means that it has more than the usual number of genes found in the wild-type organism, e.g., in the diploid mouse or human.
  • a mutant mouse over-expressing MUC5AC may be made by constructing a plasmid having the respective encoding sequence driven by a promoter, such as the mouse mammary tumor virus (MMTV) promoter or the whey acidic protein (WAP) promoter.
  • MMTV mouse mammary tumor virus
  • WAP whey acidic protein
  • This plasmid may be introduced into mouse oocytes by microinjection. The oocytes are implanted into pseudopregnant females, and the litters are assayed for insertion of the transgene. Multiple strains containing the transgene are then available for study.
  • WAP is quite specific for mammary gland expression during lactation, and MMTV is expressed in a variety of tissues including mammary gland, salivary gland and lymphoid tissues. Many other promoters might be used to achieve various patterns of expression, e.g., the metallothionein promoter.
  • An inducible system may be created in which the subject expression construct is driven by a promoter regulated by an agent that can be fed to the mouse, such as tetracycline.
  • an agent that can be fed to the mouse, such as tetracycline.
  • a mutant knockout animal e.g., mouse
  • a mucin gene can be made by removing all or some of the coding regions of the mucin gene from embryonic stem cells.
  • the methods of creating deletion mutations by using a targeting vector have been described (Thomas and Capecch, Cell 51 :503-512, 1987).
  • MUC5AC variant proteins provided herein (e.g., as encoded by transcripts designated by RefSeq ID XM_001714774.1
  • MUC5AC can be expressed in a knockout background, such as the Patch mutant mice, in order to provide model systems for studying the effects of these mutants.
  • the resultant knock-in organisms provide systems for studying fibrosis, and particularly pulmonary fibrosis.
  • Those of ordinary skill in the relevant art know methods of producing knock-in organisms. See, for instance, Rane et al. (MoI. Cell Biol, 22: 644-656, 2002); Sotillo et al.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Analytical Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Immunology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)
  • Acyclic And Carbocyclic Compounds In Medicinal Compositions (AREA)

Abstract

Individuals with polymorphisms in MUC5AC are more likely to develop idiopathic interstitial pneumonia (IIP) or pulmonary fibrosis. This discovery provides methods to identify susceptible individuals, and also provides approaches to treatment in this life threatening disease that previously had no known beneficial therapy. Given the relatively high prevalence of some of these SNPs in the general population, it is likely that variants MUC5AC result in susceptibility to other fibroproliferative lung diseases, including asthma, chronic obstructive lung disease, granulomatous lung diseases, and pneumonconioses.

Description

IDENTIFICATIONAND DIAGNOSIS OFPULMONARYFIBROSIS USING MUCIN GENES, AND RELATED METHODS AND COMPOSITIONS
FIELD This disclosure relates to genetic analysis and screening for identification and diagnosis of pulmonary fibrosis. In particular, it relates to use of variation in MUC5AC, a mucin gene, to identify and/or diagnosis individuals having or at risk for developing pulmonary fibrosis.
BACKGROUND
The idiopathic interstitial pneumonias (IIP) are a clinically heterogeneous group of fibrosing interstitial lung diseases that lead to hypoxemic respiratory insufficiency. The most common IIP is usual interstitial pneumonia (UIP), the underlying histology of IPF. Typically, IPF (OMIMl 78500) presents in late life and is lethal within 4-5 years of diagnosis. Treatment options, apart from lung transplantation, are limited and do not appear to prolong survival.
The evidence for a genetic basis to IIP is substantial. Although for most IIP the family history is negative, studies have suggested that in about 5% of cases a history exists. Cases of UIP with a positive family history are termed familial interstitial pneumonia (FIP). Interstitial lung disease has been associated with a variety of genetic diseases with a known inheritance pattern such as Hermansky-Pudlak syndrome, neurofibromatosis, tuberous sclerosis, along with several others; and, several families have been shown to have mutations in telomerase genes and at least one family segregating pulmonary fibrosis been shown to carry a mutation in surfactant protein C. Development of disease varies among individuals exposed to similar levels of fibrogenic dusts such as asbestos and/or organic antigens, suggesting an underlying genetic susceptibility. Familial aggregation has been confirmed through a variety of studies in twins, in siblings raised apart, and in multigenerational families. Recently, statistically significant aggregation of affected individuals within a subset of families was confirmed, and 20 multigenerational pedigrees consistent with autosomal dominant inheritance were identified. These clinical observations are supported by studies in animal models. For example, C57BL/6 mice are more apt to develop more lung fibrosis as opposed to BALB/c or 129 mice following exposure to bleomycin or asbestos.
Interstitial lung disease also results from environmental exposures such as inhalation of fibrogenic dusts or air-borne organic antigens including exposures such as coal dust, wood or metal dust, mold, silica, and cigarette smoke. Latent herpesvirus infections have been associated with an increased risk of this disease. Smoking has long been considered an important risk factor for the development of IPF, and we have shown that cigarette smoking is also a risk factor in the development of FIP. It is likely that complex interactions between genes and environmental exposures are involved in the development of IIP. Identifying the underlying genetic risks may help to focus studies on environmental exposures, disease pathogenesis, and targeted interventions.
SUMMARY OF THE DISCLOSURE Described herein is the discovery that mucin is a protective glycoprotein in the airway that can prevent or inhibit the development of pulmonary fibrosis. Genetic variations are identified, in the sequences encoding MUC5 AC (a primary mucin secreted in the normal human airway) that are associated with risk or development of the disease. Important SNPs in MUC5AC are displayed in table 1 below that are associated with either FIP or IPF in our initial resquencing of this gene and 7 of the most promising SNPs were subsequently validated in an independent cohort (table 2 below). A comprehensive table of all MUC5AC SNPs identified by resequencing is presented in the claims section of this document. All P values have been corrected for gender.
Table 1. 32 MUC5AC Variants Significant at P < 0.05 for FIP and IPF in the Re- Sequencing Cohort (Odds Ratio with 95% Confidence Interval).
Re-Sequencing Cohort Minor Allele Frequency Position SNP FIP IPF FIP IPF Control
Figure imgf000004_0001
Re-Sequencing Cohort Minor Allele Frequency
Position SNP FIP IPF FIP IPF Control
Figure imgf000005_0001
* bp position genomic contig, alternate assembly for Homo sapiens chromosome l l, NW_001838016
Amino Acid Position: from UNIProtKB/Swiss-Prot MUC5ACJiuman (P98088). t P<0.05; Fisher's exact test (two-tailed) % P≤O.01 ; Fisher's exact test (two-tailed)
Re-Sequencing study populations: 69 family-independent FIP cases, 96 unrelated IPF cases, and 54 spouse controls. MAF: minor allele frequency for re-sequencing cohort.
Table 2. Allele-based comparison of MUC5AC variants in FIP and IPF in the re- sequencing and validation cohorts (Odds ratios with 95% confidence intervals).
Re-Sequencing Cohort Validation Cohort Minor Allele Frequency
Position SNP FIP IPF FIP IPF FIP IPF Control
Figure imgf000005_0002
bp position genomic contig, alternate assembly for Homo sapiens chromosome 11 , NW_001838016 ** Ala4729Lys represents 2 SNPs that are in perfect linkage disequilibrium in our study populations with 1^=1.0
Amino acid position: from UniProtKB/Swiss-Prot MUC5AC HUMAN (P98088) t P<0.05; Fisher's exact test (two-tailed) $ P≤O.01 ; Fisher's exact test (two-tailed)
Re-Seqeuncing study populations: 69 family-independent FIP cases, 96 unrelated IPF cases, and 54 spouse controls
Validation study populations: 88 family-independent FIP cases, 136 unrelated IPF cases, and 54 spouse controls MAF: Minor allele frequency reported for both cohorts combined
This discovery provides methods of non-invasively screening individuals to diagnose and/or determine their risk for developing pulmonary fibrosis. It is also contemplated that mucin, and synthetic molecules that mimic its effect, can now be used in the treatment and/or prevention of pulmonary fibrosis, and more generally in conferring a protective effect to the airway of a subject.
The foregoing and other features and advantages will become more apparent from the following detailed description of several embodiments, which proceeds with reference to the accompanying figures.
BRIEF DESCRIPTION AND PRESENTATION OF THE FIGURES
Figure 1 shows multipoint LOD scores across the genome (chromosomes 1-22 and X) for all 82 families (Figure IA) and specifically on chromosomes 10, 1 1, and 12 for three diagnostic categories (all families-dashed, homogenous families-blue, and heterogeneous families-green) for pulmonary fibrosis (Figure IB).
Table 3. Markers and relevant LOD scores on chromosomes 10, 11 , and 12.
Figure imgf000007_0001
Figure 2 shows a fine mapping LOD score graph for Chromosome 1 1, using >200 markers ~ 5 cM, in 242 individuals.
Figure 3 is a graph showing the SNP Map (LD tagged) of Chromosome 11 in an association study. The study involved 150 individuals with familial pulmonary fibrosis (FPF), 167 individuals with idiopathic pulmonary fibrosis (IPF), and 237 control individuals. Key: FPF vs. Controls (♦); IPF vs. Controls (■). Figure 4 is a graphic representation of the MUC5AC gene (solid boxes are exons), showing SNPs and Indels, based on analysis of 69 individuals with FIP, 96 individuals with IPF, and 54 control individuals. Non-synonymous SNPs, intronic Indels, and coding Indels are noted. Figure 5. Generation of Muc5ac deficient mice and their response to bleomycin.
A. The Muc5ac allele was targeted by inserting LoxP sites into the 5' flanking region and intron 1. Neomycin was used for selection, and the selection cassette was removed by crossing animals with Rosa-FLPe knock-in mice (data not shown). B. Southern blot analysis using a probe located outside of the 5' targeting arm (within the region surrounded by the black triangles in Figure IA) confirmed homologous insertion of the 5' LoxP site. C. Long range PCR, restriction endonuclease, and sequence analysis confirmed homologous insertion of the 3' LoxP site. The amplified region is depicted within the region identified by open triangles. EcoRI and BamHI (sites present in the recombinant but not in the wild type allele) were used to digest PCR products. D. DNA extracted from mouse tail biopsies were screened by PCR to identify +/+, +/-, and -/- animals. E. Quantitative RT-PCR was used to confirm loss of Muc 5 ac expression in the stomachs of knockout animals. F. At 21 days after 4 U/kg i.t. bleomycin instillation, there is increased connective tissue deposition in Muc5ac-I- mice compared to Muc5ac+/-, 129/Sv, and C57BL/6 mice (Masson-Trichrome, 4Ox). G. Muc5ac-I- mice (open bar) have significantly more lung collagen than either Muc5ac+I- mice (grey bar) or the parent strains C57BL/6 and 129/Sv (left and right striped bars) *P<0.05 **P<0.01 ***P<0.001 (N=6 per group). H. Positive MUC5AC staining in bronchiolar cells of a C57BL/6 mouse (arrows) 21 days after bleomycin, but little if any staining is evident in a saline-treated mouse at the same time point (40Ox magnification).
Figure 6. Mucociliary clearance and apoptosis in bleomycin challenged mice.
Mice were anesthetized, suspended by their upper incisors, and I5 2, and 3 μm diameter yellow-green fluorescent microspheres (Fluospheres, Invitrogen-Molecular Probes, Carlsbad, CA) were instilled intratracheally using a Microsprayer (Perm Century, Philadelphia, PA). 100,000 fluospheres of each diameter were instilled in a 25 μl volume microspray. To isolate fluospheres after instillation, mice were euthanized by exsanguination under anesthesia, and the lungs and tracheae were removed and minced into 1-2 mm pieces. These were vortexed in 1 ml PBS, and sequentially extracted over 100 μm and 40 μm nylon mesh filters. Extracted microspheres were then measured by quantitative flow cytometry. For this, 10,000 4 μm red fluospheres were added to 2 ml of filtrate, and the total numbers of yellow-green fluospheres 1-3 μm during the period in which 8,000 red Fluospheres were also counted. Panel A representative flow cytometric analyses of microsphere deposition for unchallenged WT mice at baseline (upper left panel) and after 15 min (upper right panel), bleomycin challenged WT after 15 min (lower left panel), and bleomycin challenged Muc5ac-I- mice after 15 min (lower right panel). Blue=l μm, red=2 μm, and yellow=3 μm. Numbers indicate counts for each diameter microsphere. Panel B particle elimination is calculated as the percent clearance of particles deposited at time zero. Data are presented as means ± SEM from 4-6 mice per group. Data were analyzed by Student's t-test. Asterisk denotes a p-value of 0.05 compared to 15 min clearance data from bleomycin and PBS challenged WT and Muc5ac+I- mice and PBS challenged Muc5ac-I- mice. In Panels C-F, lung sections of Muc5ac-I- and Muc5ac+/- mice were stained for caspase 3, 21 days after bleomycin instillation. There are numerous apoptotic cells (red arrows) in fibrotic lung areas of Muc5ac-I- mice (6C) as well in their airways (insert), compared with Muc5ac+/- mice (6D). (20Ox, n=6 mice per group). Differences in apoptotic cells between Muc5ac-I- and Muc5ac- /+ mice are statistically significant (6E-F).
SEQUENCES
Nucleic acid and/or amino acid sequences discussed or referenced herein are referred to by way of accession number from a public repository. It is understood that the corresponding sequence is incorporated by reference herein based on the sequence of that accession number in the referenced public database as of the date of filing of this provisional application. DETAILED DESCRIPTION
/. Abbreviations
2D-PAGE two-dimensional polyacrylamide gel electrophoresis
ASO allele-specifϊc oligonucleotide ASOH allele-specific oligonucleotide hybridization
DASH dynamic allele-specific hybridization
ELISA enzyme-linked immunosorbant assay
FIP familial interstitial pneumonia
FPF familial pulmonary fibrosis HPLC high pressure liquid chromatography
IIP idiopathic interstitial pneumonia
IPF idiopathic pulmonary fibrosis
MALDI-TOF matrix-assisted laser desorption/ionization time-of-flight
PCR polymerase chain reaction RT-PCR reverse-transcription polymerase chain reaction
SSCP single-strand conformation polymorphism
//. Terms
Unless otherwise noted, technical terms are used according to conventional usage. Definitions of common terms in molecular biology may be found in Benjamin Lewin, Genes V, published by Oxford University Press, 1994 (ISBN 0-19-854287-9); Kendrew et al. (eds.), The Encyclopedia of Molecular Biology, published by Blackwell Science Ltd., 1994 (ISBN 0- 632-02182-9); and Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, Inc., 1995 (ISBN 1-56081- 569-8).
In order to facilitate review of the various embodiments of the invention, the following explanations of specific terms are provided:
Antisense, Sense, and Antigene: Double-stranded DNA (dsDNA) has two strands, a 5' -> 3' strand, referred to as the plus strand, and a 3' -> 5' strand (the reverse complement), referred to as the minus strand. Because RNA polymerase adds nucleic acids in a 5' -> 3' direction, the minus strand of the DNA serves as the template for the RNA during transcription. Thus, the RNA formed will have a sequence complementary to the minus strand and identical to the plus strand (except that U is substituted for T). Antisense molecules are molecules that are specifically hybridizable or specifically complementary to either RNA or the plus strand of DNA. Sense molecules are molecules that are specifically hybridizable or specifically complementary to the minus strand of DNA. Antigene molecules are either antisense or sense molecules directed to a dsDNA target. Array: An arrangement of molecules, particularly biological macromolecules (such as polypeptides or nucleic acids) or biological samples (such as tissue sections) in addressable locations on a substrate, usually a flat substrate such as a membrane, plate or slide. The array may be regular (arranged in uniform rows and columns, for instance) or irregular. The number of addressable locations on the array can vary, for example from a few (such as three) to more than 50, 100, 200, 500, 1000, 10,000, or more. A "microarray" is an array that is miniaturized to such an extent that it benefits from microscopic examination for evaluation.
Within an array, each arrayed molecule (e.g., oligonucleotide) or sample (more generally, a "feature" of the array) is addressable, in that its location can be reliably and consistently determined within at least two dimensions on the array surface. Thus, in ordered arrays the location of each feature is usually assigned to a sample at the time when it is spotted onto or otherwise applied to the array surface, and a key may be provided in order to correlate each location with the appropriate feature. Often, ordered arrays are arranged in a symmetrical grid pattern, but samples could be arranged in other patterns (e.g., in radially distributed lines, spiral lines, or ordered clusters). Arrays are computer readable, in that a computer can be programmed to correlate a particular address on the array with information (such as identification of the arrayed sample and hybridization or binding data, including for instance signal intensity). In some examples of computer readable array formats, the individual spots on the array surface will be arranged regularly, for instance in a Cartesian grid pattern, that can be correlated to address information by a computer.
The sample application spot (or feature) on an array may assume many different shapes. Thus, though the term "spot" is used herein, it refers generally to a localized deposit of nucleic acid or other biomolecule, and is not limited to a round or substantially round region. For instance, substantially square regions of application can be used with arrays, as can be regions that are substantially rectangular (such as a slot blot-type application), or triangular, oval, irregular, and so forth. The shape of the array substrate itself is also immaterial, though it is usually substantially flat and may be rectangular or square in general shape.
Binding or interaction: An association between two substances or molecules, such as the hybridization of one nucleic acid molecule to another (or itself). Various methods can be used to detect binding of molecules, many of which are known to those of ordinary skill in the art. Specific examples of binding or interaction are described herein. For instance in some embodiments, a labeled nucleic acid molecule (target) binds to (interacts with) an immobilized nucleic acid molecule (probe) in one or more features of the array.
A labeled target molecule "binds" to a nucleic acid molecule in a spot on an array if, after incubation of the (labeled) target molecule (usually in solution or suspension) with or on the array for a period of time (usually 5 minutes or more, for instance 10 minutes, 20 minutes, 30 minutes, 60 minutes, 90 minutes, 120 minutes or more, for instance over night or even 24 hours), a detectable amount of that molecule associates with a nucleic acid feature of the array to such an extent that it is not removed by being washed with a relatively low stringency buffer (e.g., higher salt (such as 3 x SSC or higher), room temperature washes). Washing can be carried out, for instance, at room temperature, but other temperatures (either higher or lower) also can be used. Targets will bind probe nucleic acid molecules within different features on the array to different extents, based at least on sequence homology, and the term "bind" encompasses both relatively weak and relatively strong interactions. Thus, some binding will persist after the array is washed in a more stringent buffer (e.g., lower salt (such as about 0.5 to about 1.5 x SSC), 55-65° C washes).
Where the probe and target molecules are both nucleic acids, binding of the test or reference molecule to a feature on the array can be discussed in terms of the specific complementarity between the probe and the target nucleic acids. Also contemplated herein are protein-based arrays, where the probe molecules are or comprise proteins, and/or where the target molecules are or comprise proteins, and arrays comprising nucleic acids to which proteins/peptides are bound, or vice versa.
Biological Sample: This term is intended to include tissues, cells and biological fluids, including biological fluids containing cells, that are isolated from a subject, as well as tissues, cells and fluids present within a subject. cDNA: A DNA molecule lacking internal, non-coding segments (e.g., introns) and regulatory sequences that determine transcription. By way of example, cDNA may be synthesized in the laboratory by reverse transcription from messenger RNA extracted from cells.
Complementarity and percentage complementarity: Molecules with complementary nucleic acids form a stable duplex or triplex when the strands bind,
(hybridize), to each other by forming Watson-Crick, Hoogsteen or reverse Hoogsteen base pairs. Stable binding occurs when an oligonucleotide remains detectably bound to a target nucleic acid sequence under the required conditions. Complementarity is the degree to which bases in one nucleic acid strand base pair with the bases in a second nucleic acid strand. Complementarity is conveniently described by percentage, /. e. the proportion of nucleotides that form base pairs between two strands or within a specific region or domain of two strands. For example, if 10 nucleotides of a 15- nucleotide oligonucleotide form base pairs with a targeted region of a DNA molecule, that oligonucleotide is said to have 66.67% complementarity to the region of DNA targeted.
A thorough treatment of the qualitative and quantitative considerations involved in establishing binding conditions that allow one skilled in the art to design appropriate oligonucleotides for use under the desired conditions is provided by Beltz et al. Methods Enzymol 100:266-285, 1983, and by Sambrook et al. (ed), Molecular Cloning: A Laboratory Manual, 2nd ed., vol. 1-3, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989.
DNA (deoxyribonucleic acid): DNA is a long chain polymer that contains the genetic material of most living organisms (the genes of some viruses are made of ribonucleic acid (RNA)). The repeating units in DNA polymers are four different nucleotides, each of which includes one of the four bases (adenine, guanine, cytosine and thymine) bound to a deoxyribose sugar to which a phosphate group is attached. Triplets of nucleotides (referred to as codons) code for each amino acid in a polypeptide, or for a stop signal. The term "codon" is also used for the corresponding (and complementary) sequences of three nucleotides in the mRNA into which the DNA sequence is transcribed.
Enriched: The term "enriched" means that the concentration of a material is at least about 2, 5, 10, 100, or 1000 times its natural concentration (for example), advantageously at least 0.01% by weight. Enriched preparations of about 0.5%, 1%, 5%, 10%, and 20% by weight are also contemplated. EST (Expressed Sequence Tag): A partial DNA or cDNA sequence, typically of between 200 and 2000 sequential nucleotides, obtained from a genomic or cDNA library, prepared from a selected cell, cell type, tissue or tissue type, organ or organism, which corresponds to an mRNA of a gene found in that library. An EST is generally a DNA molecule sequenced from and shorter than the cDNA from which it is obtained. Fibrosis: Formation or development of excess fibrous connective tissue in an organ or tissue as a reparative or reactive process, as opposed to formation of fibrous tissue as a normal constituent of an organ or tissue. Fibrosis-related diseases include, but are not limited to: cystic fibrosis of the pancreas and lungs; endomyocardial fibrosis, idiopathic myocardiopathy; idiopathic interstitial pneumonia; idiopathic pulmonary fibrosis; cryptogenic organizing pneumonia; non specific interstitial pneumonia; acute interstitial pneumonia; hypersensitivity pneumonitis; familial interstitial pneumonia; respiratory bronchiolitis interstitial lung disease; desquamative interstitial lung disease; and diffuse parenchymal lung disease. Fluorophore: A chemical compound, which when excited by exposure to a particular wavelength of light, emits light (/. e. , fluoresces), for example at a different wavelength. Fluorophores can be described in terms of their emission profile, or "color." Green fluorophores, for example Cy3, FITC, and Oregon Green, are characterized by their emission at wavelengths generally in the range of 515-540 λ. Red fluorophores, for example Texas Red, Cy5 and tetramethylrhodamine, are characterized by their emission at wavelengths generally in the range of 590-690 λ.
Examples of fluorophores that may be used are provided in U.S. Patent No. 5,866,366 to Nazarenko et al, and include for instance: 4-acetamido-4'-isothiocyanatostilbene- 2,2'disulfonic acid, acridine and derivatives such as acridine and acridine isothiocyanate, 5- (2'-aminoethyl)aminonaphthalene-l -sulfonic acid (EDANS), 4-amino-N-[3- vinylsulfonyl)phenyl]naphthalimide-3,5 disulfonate (Lucifer Yellow VS), N-(4-anilino-l- naphthyl)maleimide, anthranilamide, Brilliant Yellow, coumarin and derivatives such as coumarin, 7-amino-4-methylcoumarin (AMC, Coumarin 120), 7-amino-4- trifluoromethylcouluarin (Coumaran 151); cyanosine; 4',6-diaminidino-2-phenylindole (DAPI); 5', 5"-dibromopyrogallol-sulfonephthalein (Bromopyrogallol Red); 7-diethylamino- 3-(4'-isothiocyanatophenyl)-4-methylcoumarin; diethylenetriamine pentaacetate; 4,4'- diisothiocyanatodihydro-stilbene-2,2'-disulfonic acid; 4,4'-diisothiocyanatostilbene-2,2'- disulfonic acid; 5-[dimethylamino]naphthalene-l-sulfonyl chloride (DNS, dansyl chloride); 4-(4'-dimethylaminophenylazo)benzoic acid (DABCYL); 4-dimethylaminophenylazophenyl- 4'-isothiocyanate (DABITC); eosin and derivatives such as eosin and eosin isothiocyanate; erythrosin and derivatives such as erythrosin B and erythrosin isothiocyanate; ethidium; fluorescein and derivatives such as 5-carboxyfluorescein (FAM), 5-(4,6-dichlorotriazin-2- yl)aminofluorescein (DTAF), 2'7'-dimethoxy-4'5'-dichloro-6-carboxyfiuorescein (JOE), fluorescein, fluorescein isothiocyanate (FITC), and QFITC (XRITC); fluorescamine; IRl 44; IRl 446; Malachite Green isothiocyanate; 4-methylumbelliferone; ortho cresolphthalein; nitrotyrosine; pararosaniline; Phenol Red; B-phycoerythrin; o-phthaldialdehyde; pyrene and derivatives such as pyrene, pyrene butyrate and succinimidyl 1 -pyrene butyrate; Reactive Red 4 (Cibacron .RTM. Brilliant Red 3B-A); rhodamine and derivatives such as 6-carboxy-X- rhodamine (ROX), 6-carboxyrhodamine (R6G), lissamine rhodamine B sulfonyl chloride, rhodamine (Rhod), rhodamine B, rhodamine 123, rhodamine X isothiocyanate, sulforhodamine B, sulforhodamine 101 and sulfonyl chloride derivative of sulforhodamine 101 (Texas Red); N,N,N',N'-tetramethyl-6-carboxyrhodamine (TAMRA); tetramethyl rhodamine; tetramethyl rhodamine isothiocyanate (TRITC); riboflavin; rosolic acid and terbium chelate derivatives.
Other contemplated fluorophores include GFP (green fluorescent protein), Lissamine™, diethylaminocoumarin, fluorescein chlorotriazinyl, naphthofluorescein, 4,7- dichlororhodamine and xanthene and derivatives thereof. Other fluorophores known to those skilled in the art may also be used. High throughput genomics: Application of genomic or genetic data or analysis techniques that use microarrays or other genomic technologies to rapidly identify large numbers of genes or proteins, or distinguish their structure, expression or function from normal or abnormal cells or tissues, or from cells or tissues of subjects with known or unknown phenotype and/or genotype. Human Cells: Cells obtained from a member of the species Homo sapiens. The cells can be obtained from any source, for example peripheral blood, urine, saliva, tissue biopsy, surgical specimen, amniocentesis samples and autopsy material. From these cells, genomic DNA, mRNA, cDNA, RNA, and/or protein can be isolated.
Hybridization: Nucleic acid molecules that are complementary to each other hybridize by hydrogen bonding, which includes Watson-Crick, Hoogsteen or reversed Hoogsteen hydrogen bonding between complementary nucleotide units. For example, adenine and thymine are complementary nucleobases that pair through formation of hydrogen bonds. "Complementary" refers to sequence complementarity between two nucleotide units. For example, if a nucleotide unit at a certain position of an oligonucleotide is capable of hydrogen bonding with a nucleotide unit at the same position of a DNA or RNA molecule, then the oligonucleotides are complementary to each other at that position. The oligonucleotide and the DNA or RNA are complementary to each other when a sufficient number of corresponding positions in each molecule are occupied by nucleotide units which can hydrogen bond with each other. "Complementary" is a term that indicates a sufficient degree of complementarity such that stable and specific binding occurs between an oligonucleotide and the DNA or RNA (or PNA) target. An oligonucleotide need not be 100% complementary to its target nucleic acid sequence to be specifically hybridizable. An oligonucleotide is specifically hybridizable when binding of the oligonucleotide to the target DNA or RNA molecule interferes with the normal function of the target DNA or RNA, and there is a sufficient degree of complementarity to avoid non-specific binding of the oligonucleotide to non-target sequences under conditions in which specific binding is desired, for example under physiological conditions in the case of in vivo assays, or under conditions in which the assays are performed.
Hybridization conditions resulting in particular degrees of stringency will vary depending upon the nature of the hybridization method of choice and the composition and length of the hybridizing DNA used. Generally, the temperature of hybridization and the ionic strength (especially the Na+ concentration) of the hybridization buffer will determine the stringency of hybridization. Calculations regarding hybridization conditions required for attaining particular degrees of stringency are discussed by Sambrook et al. in Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press (1989), chapters 9 and 1 1 , herein incorporated by reference.
In vitro amplification: Techniques that increase the number of copies of a nucleic acid molecule in a sample or specimen. An example of in vitro amplification is the polymerase chain reaction, in which a biological sample collected from a subject is contacted with a pair of oligonucleotide primers, under conditions that allow for the hybridization of the primers to nucleic acid template in the sample. The primers are extended under suitable conditions, dissociated from the template, and then re-annealed, extended, and dissociated to amplify the number of copies of the nucleic acid.
The product of in vitro amplification may be characterized by electrophoresis, restriction endonuclease cleavage patterns, oligonucleotide hybridization or ligation, and/or nucleic acid sequencing, using standard techniques.
Other examples of in vitro amplification techniques include strand displacement amplification (see U.S. Patent No. 5,744,311); transcription-free isothermal amplification (see U.S. Patent No. 6,033,881); repair chain reaction amplification (see WO 90/01069); ligase chain reaction amplification (see EP-A-320 308); gap filling ligase chain reaction amplification (see U.S. Patent No. 5,427,930); coupled ligase detection and PCR (see U.S. Patent No. 6,027,889); and NASBA™ RNA transcription-free amplification (see U.S. Patent No. 6,025,134).
Idiopathic Interstitial Pneumonia (IIP): A group of lung diseases (including idiopathic pulmonary fibrosis, nonspecific interstitial pneumonia, respiratory bronchiolitis interstitial lung disease, desquamative interstitial pneumonia, and acute interstitial pneumonia), affecting the alveolar epithelium, pulmonary capillary endothelium, basement membrane, perivascular and perilymphatic tissues. The term IIP is used to distinguish these diseases from obstructive airways diseases. Most types of IIP involve fibrosis, but this is not essential; indeed fibrosis is often a later feature. Hence the term pulmonary fibrosis has fallen out of favor. Isolated: An "isolated" biological component (such as a nucleic acid molecule, protein or organelle) has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, i.e., other chromosomal and extra-chromosomal DNA and RNA, proteins and organelles. Nucleic acids and proteins that have been "isolated" include nucleic acids and proteins purified by standard purification methods. The term also embraces nucleic acids and proteins prepared by recombinant expression in a host cell as well as chemically synthesized nucleic acids.
Label: Detectable marker or reporter molecules, which can be attached to nucleic acids. Typical labels include fluorophores, radioactive isotopes, ligands, chemiluminescent agents, metal sols and colloids, and enzymes. Methods for labeling and guidance in the choice of labels useful for various purposes are discussed, e.g. , in Sambrook et al. , in
Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press (1989) and Ausubel et al. , in Current Protocols in Molecular Biology, Greene Publishing Associates and Wiley-Intersciences (1987).
Mucins: A family of large, heavily glycosylated proteins. Some mucins are membrane-bound due to the presence of a hydrophobic membrane-spanning domain that favors retention in the plasma membrane, while others are secreted on mucosal surfaces and saliva. Mucin genes encode mucin monomers that are synthesized as rod-shape apomucin cores that are post-translationally modified by heavy glycosylation. The amino- and carboxy- terminal regions of mucins are lightly glycosylated, but rich in cysteines, which are proposed to be involved in establishing disulfide linkages within and between mucin monomers. The central region of mucins is formed of multiple tandem repeats of 10 to 80 residue sequences, in which up to half of the amino acids are serine or threonine. This region of the protein becomes post-translationally modified (glycosylated) with hundreds of O-linked oligosaccharides. N-linked oligosaccharides are also found on mucins, but much less abundantly. The dense glycosylation of mucins gives them considerable water-holding capacity, and makes them resistant to proteolysis. See also Perez-Vilar & Hill ("Mucin Family of Glycoproteins", Encyclopedia of Biological Chemistry (Lennarz & Lane, EDs.) Academic Press/Elsevier, Oxford, 2004, vol. 2, pp 758-764). Mucins are secreted as massive aggregates of proteins with molecular masses of roughly 1 to 10 million Da. Within these aggregates, monomers are linked to one another mostly by non-covalent interactions, although intermolecular disulfide bonds may also play a role in this process. At least 19 human mucin genes have been distinguished by cDNA cloning: MUCl, 2, 3A, 3B, 4, 5AC, 5B, 6-9, 11-13, and 15-19. The major secreted airway mucins are MUC5AC and MUC5B, while MUC2 is secreted mostly in the intestine but also in the airway (AH et al, Otolaryngol Head Neck Surg. 133(3):423-428, 2005).
Increased mucin production occurs in Adenocarcinoma, and in lung diseases such as asthma, bronchitis, COPD and cystic fibrosis. Mutation: Any change of the DNA sequence within a gene or chromosome, including specifically changes in non-coding regions of a chromosome, for instance changes in or near regulatory regions of genes. In some instances, a mutation will alter a characteristic or trait (phenotype), but this is not always the case. Types of mutations include base substitution point mutations (e.g., transitions or transversions), deletions, and insertions. Missense mutations are those that introduce a different amino acid into the sequence of the encoded protein; nonsense mutations are those that introduce a new stop codon. In the case of insertions or deletions, mutations can be in-frame (not changing the frame of the overall sequence) or frame shift mutations, which may result in the misreading of a large number of codons (and often leads to abnormal termination of the encoded product due to the presence of a stop codon in the alternative frame).
This term specifically encompasses variations that arise through somatic mutation, for instance those that are found only in disease cells, but not constitutionally, in a given individual. Examples of such somatically-acquired variations include the point mutations that frequently result in altered function of various genes that are involved in development of cancers. This term also encompasses DNA alterations that are present constitutionally, that alter the function of the encoded protein in a readily demonstrable manner, and that can be inherited by the children of an affected individual. In this respect, the term overlaps with "polymorphism," as discussed herein, but generally refers to the subset of constitutional alterations. Nucleic acid: A deoxyribonucleotide or ribonucleotide polymer in either single or double stranded form, and unless otherwise limited, encompassing known analogues of natural nucleotides that hybridize to nucleic acids in a manner similar to naturally occurring nucleotides. Nucleic acid array: An arrangement of nucleic acids (such as DNA or RNA) in assigned locations on a matrix, such as that found in cDNA arrays, or oligonucleotide arrays.
Nucleic acid molecules representing genes: Any nucleic acid, for example DNA (intron or exon or both), cDN A or RNA, of any length suitable for use as a probe or other indicator molecule, and that is informative about the corresponding gene.
Nucleotide: "Nucleotide" includes, but is not limited to, a monomer that includes a base linked to a sugar, such as a pyrimidine, purine or synthetic analogs thereof, or a base linked to an amino acid, as in a peptide nucleic acid (PNA). A nucleotide is one monomer in a polynucleotide. A nucleotide sequence refers to the sequence of bases in a polynucleotide. Oligonucleotide: A linear single-stranded polynucleotide sequence ranging in length from 2 to about 5,000 bases, for example a polynucleotide (such as DNA or RNA) which is at least 6 nucleotides, for example at least 10, 12, 15, 18, 20, 25, 50, 100, 200, 1,000, or even 5,000 nucleotides long. Oligonucleotides are often synthetic but can also be produced from naturally occurring polynucleotides. An oligonucleotide analog refers to moieties that function similarly to oligonucleotides but have non-naturally occurring portions. For example, oligonucleotide analogs can contain non-naturally occurring portions, such as altered sugar moieties or inter- sugar linkages, such as a phosphorothioate oligodeoxynucleotide. Functional analogs of naturally occurring polynucleotides can bind to RNA or DNA, and include peptide nucleic acid (PNA) molecules. Such analog molecules may also bind to or interact with polypeptides or proteins.
Operably (or Operatively) linked: A first nucleic acid sequence is operably linked with a second nucleic acid sequence when the first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence. For instance, a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence. Generally, operably linked DNA sequences are contiguous and, where necessary to join two protein-coding regions, in the same reading frame.
Open reading frame (ORF): A series of nucleotide triplets (codons) coding for amino acids without any internal termination codons. These sequences are usually translatable into a peptide.
Peptide Nucleic Acid (PNA): An oligonucleotide analog with a backbone comprised of monomers coupled by amide (peptide) bonds, such as amino acid monomers joined by peptide bonds. Pharmaceutically acceptable carriers: The pharmaceutically acceptable carriers useful with compositions provided herein are conventional. By way of example, Martin, in Remington 's Pharmaceutical Sciences, published by Mack Publishing Co., Easton, PA, 19th Edition, 1995, describes compositions and formulations suitable for pharmaceutical delivery of the molecules and agents, including but not limited to nucleotides and proteins, herein disclosed.
In general, the nature of the carrier will depend on the particular mode of administration being employed. For instance, parenteral formulations usually comprise injectable fluids that include pharmaceutically and physiologically acceptable fluids such as water, physiological saline, balanced salt solutions, aqueous dextrose, glycerol or the like as a vehicle. For solid compositions (e.g. , powder, pill, tablet, or capsule forms), conventional non-toxic solid carriers can include, for example, pharmaceutical grades of mannitol, lactose, starch, or magnesium stearate. In addition to biologically-neutral carriers, pharmaceutical compositions to be administered can contain minor amounts of non-toxic auxiliary substances, such as wetting or emulsifying agents, preservatives, and pH buffering agents and the like, for example sodium acetate or sorbitan monolaurate.
Polymorphism: Variant in a sequence of a gene, usually carried from one generation to another in a population. Polymorphisms can be those variations (nucleotide sequence differences) that, while having a different nucleotide sequence, produce functionally equivalent gene products, such as those variations generally found between individuals, different ethnic groups, geographic locations. The term polymorphism also encompasses variations that produce gene products with altered function, /. e. , variants in the gene sequence that lead to gene products that are not functionally equivalent. This term also encompasses variations that produce no gene product, an inactive gene product, or increased or increased activity gene product.
Polymorphisms can be referred to, for instance, by the nucleotide position at which the variation exists, by the change in amino acid sequence caused by the nucleotide variation, or by a change in some other characteristic of the nucleic acid molecule or protein that is linked to the variation (e.g., an alteration of a secondary structure such as a stem-loop, or an alteration of the binding affinity of the nucleic acid for associated molecules, such as polymerases, RNases, and so forth).
Probes and primers: Nucleic acid probes and primers can be readily prepared based on the nucleic acid molecules provided as indicators of susceptibility to pulmonary fibrosis or a related disease, condition or disorder. It is also appropriate to generate probes and primers based on fragments or portions of these nucleic acid molecules, particularly in order to distinguish between and among different alleles and haplotypes within a single gene. Also appropriate are probes and primers specific for the reverse complement of these sequences, as well as probes and primers to 5' or 3' regions. A probe comprises an isolated nucleic acid attached to a detectable label or other reporter molecule. Typical labels include radioactive isotopes, enzyme substrates, co-factors, ligands, chemiluminescent or fluorescent agents, haptens, and enzymes. Methods for labeling and guidance in the choice of labels appropriate for various purposes are discussed, e.g., in Sambrook et al. (In Molecular Cloning: A Laboratory Manual, CSHL, New York, 1989) and Ausubel et al. (In Current Protocols in Molecular Biology, John Wiley & Sons, New York, 1998).
Primers are short nucleic acid molecules, for instance DNA oligonucleotides 10 nucleotides or more in length. Longer DNA oligonucleotides may be about 15, 20, 25, 30 or 50 nucleotides or more in length. Primers can be annealed to a complementary target DNA strand by nucleic acid hybridization to form a hybrid between the primer and the target DNA strand, and then the primer extended along the target DNA strand by a DNA polymerase enzyme. Primer pairs can be used for amplification of a nucleic acid sequence, e.g., by the polymerase chain reaction (PCR) or other in vitro nucleic-acid amplification methods known in the art. Methods for preparing and using nucleic acid probes and primers are described, for example, in Sambrook et al (In Molecular Cloning: A Laboratory Manual, CSHL, New York, 1989), Ausubel et al. (ed.) (In Current Protocols in Molecular Biology, John Wiley & Sons, New York, 1998), and Innis et al. (PCR Protocols, A Guide to Methods and Applications, Academic Press, Inc., San Diego, CA, 1990). Amplification primer pairs (for instance, for use with polymerase chain reaction amplification) can be derived from a known sequence such as any of the gene or indicated sequences at 1 lpter (Dl 1S4046 to Dl 1S4149; Chromosome 1 1 2.0 Mb-9.0 Mb), and specific alleles thereof (including specific SNPs or haplotypes described herein) described herein, for example, by using computer programs intended for that purpose such as PRIMER (Version 0.5, © 1991, Whitehead Institute for Biomedical Research, Cambridge, MA).
One of ordinary skill in the art will appreciate that the specificity of a particular probe or primer increases with its length. Thus, for example, a primer comprising 30 consecutive nucleotides of a target protein encoding nucleotide will anneal to a target sequence, such as homolog of a mucin protein, with a higher specificity than a corresponding primer of only 15 nucleotides. Thus, in order to obtain greater specificity, probes and primers can be selected that comprise at least 20, 23, 25, 30, 35, 40, 45, 50 or more consecutive nucleotides of a gene or sequence discussed herein.
Also provided are isolated nucleic acid molecules that comprise specified lengths of nucleotide sequences, for instance sequences from MUC5AC or another gene, EST or non- coding sequence at 1 lpter. Such molecules may comprise at least 10, 15, 20, 23, 25, 30, 35, 40, 45 or 50 or more (e.g., at least 100, 150, 200, 250, 300 and so forth) consecutive nucleotides of these sequences or more. These molecules may be obtained from any region of the disclosed sequences (e.g., a specified nucleic acid may be apportioned into halves or quarters based on sequence length, and isolated nucleic acid molecules may be derived from the first or second halves of the molecules, or any of the four quarters, etc.). A cDNA or other encoding sequence also can be divided into smaller regions, e.g. about eighths, sixteenths, twentieths, fiftieths, and so forth, with similar effect.
Another mode of division, provided by way of example, is to divide a protein encoding sequence based on the regions of the sequence that are relatively more or less homologous to equivalent other sequences, such as homologous proteins from other species, or other proteins from a protein family.
Nucleic acid molecules may be selected that comprise at least 10, 15, 20, 25, 30, 35, 40, 50, 100, 150, 200, 250, 300 or more consecutive nucleotides of any of these or other portions of a nucleic acid molecule (such as one encoding MUC5AC or another gene, EST, or corresponding cDNA found in the 1 lpter region described herein) or a specific allele thereof, such as those disclosed herein. Thus, representative nucleic acid molecules might comprise at least 10 consecutive nucleotides of a nucleic acid sequence shown in any one of the sequences in the gene region of transcripts designated as XM_001714774.1 (MUC5AC) and more particularly any 10 consecutive nucleotides overlapping one of the SNPs illustrated in any of these sequences. More particularly, probes and primers in some embodiments are selected so that they overlap or reside adjacent to at least one of the SNPs indicated in the Sequence Listing or the tables of MUC2 (table 5) and MUC5AC (table 6) SNPs included in this application. Purified: The term purified does not require absolute purity; rather, it is intended as a relative term. Thus, for example, a purified nucleic acid preparation is one in which the specified protein is more enriched than the nucleic acid is in its generative environment, for instance within a cell or in a biochemical reaction chamber. A preparation of substantially pure nucleic acid may be purified such that the desired nucleic acid represents at least 50% of the total nucleic acid content of the preparation. In certain embodiments, a substantially pure nucleic acid will represent at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, or at least 95% or more of the total nucleic acid content of the preparation. Similarly, a substantially pure protein or peptide will represent at least 60%, at least 70%, at least 80%, at least 85%, at least 90%, or at least 95% or more of the total protein content of the preparation.
Recombinant: A recombinant nucleic acid is one that has a sequence that is not naturally occurring or has a sequence that is made by an artificial combination of two otherwise separated segments of sequence. This artificial combination can be accomplished by chemical synthesis or, more commonly, by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques.
Regulatory Sequences or Elements: These terms refer generally to a class of DNA sequences that influence or control expression of genes. Included in the term are promoters, enhancers, locus control regions, boundary elements/insulators, silencers, Matrix attachment regions (also referred to as scaffold attachment regions), repressor, transcriptional terminators, origins of replication, centromeres, and meiotic recombination hotspots. Promoters are sequences of DNA near the 5' end of a gene that act as a binding site for RNA polymerase, and from which transcription is initiated. Enhancers are control elements that elevate the level of transcription from a promoter, usually independently of the enhancer's orientation or distance from the promoter. Locus control regions (LCRs) confer tissue- specific and temporally regulated expression to genes to which they are linked. LCRs function independently of their position in relation to the gene, but are copy-number dependent. It is believed that they function to open the nucleosome structure, so other factors can bind to the DNA. LCRs may also affect replication timing and origin usage. Insulators (also know as boundary elements) are DNA sequences that prevent the activation (or inactivation) of transcription of a gene, by blocking effects of surrounding chromatin. Silencers and repressors are control elements that suppress gene expression; they act on a gene independently of their orientation or distance from the gene. Matrix attachment regions (MARs), also known as scaffold attachment regions, are sequences within DNA that bind to the nuclear scaffold. They can affect transcription, possibly by separating chromosomes into regulatory domains. It is believed that MARs mediate higher-order, looped structures within chromosomes. Transcriptional terminators are regions within the gene vicinity that RNA polymerase is released from the template. Origins of replication are regions of the genome, during DNA synthesis or replication phases of cell division, that begin the replication process of DNA. Meiotic recombination hotspots are regions of the genome that recombine more frequently than the average during meiosis.
RNA: A typically linear polymer of ribonucleic acid monomers, linked by phosphodiester bonds. Naturally occurring RNA molecules fall into three classes, messenger (mRNA, which encodes proteins), ribosomal (rRNA, components of ribosomes), and transfer (tRNA, molecules responsible for transferring amino acid monomers to the ribosome during protein synthesis). Total RNA refers to a heterogeneous mixture of all three types of RNA molecules.
Sequence identity: The similarity between two nucleic acid sequences, or two amino acid sequences, is expressed in terms of the similarity between the sequences, otherwise referred to as sequence identity. Sequence identity is frequently measured in terms of percentage identity (or similarity or homology); the higher the percentage, the more similar the two sequences are. Homologs or orthologs of nucleic acid or amino acid sequences will possess a relatively high degree of sequence identity when aligned using standard methods. This homology will be more significant when the orthologous proteins or nucleic acids are derived from species which are more closely related (e.g., human and chimpanzee sequences), compared to species more distantly related (e.g., human and C. elegans sequences). Typically, orthologs are at least 50% identical at the nucleotide level and at least 50% identical at the amino acid level when comparing human orthologous sequences. Methods of alignment of sequences for comparison are well known. Various programs and alignment algorithms are described in: Smith & Waterman, Adv. Appl. Math. 2:482, 1981 ; Needleman & Wunsch, J. MoI. Biol. 48:443, 1970; Pearson & Lipman, Proc. Natl. Acad. ScL USA 85:2444, 1988; Higgins & Sharp, Gene, 73:237-44, 1988; Higgins & Sharp, CABIOS 5:151-3, 1989; Corpet et al. , Nuc. Acids Res. 16:10881-90, 1988; Huang et al. Computer Appls. Biosci. 8, 155-65, 1992; and Pearson et al, Meth. MoI. Bio. 24:307-31, 1994. Altschul et al, J. MoI Biol. 215:403-10, 1990, presents a detailed consideration of sequence alignment methods and homology calculations.
The NCBI Basic Local Alignment Search Tool (BLAST) (Altschul et al, J. MoI. Biol. 215:403-10, 1990) is available from several sources, including the National Center for Biotechnology Information (NCBI, Bethesda, MD) and on the Internet, for use in connection with the sequence analysis programs blastp, blastn, blastx, tblastn and tblastx. Each of these sources also provides a description of how to determine sequence identity using this program.
Homologous sequences are typically characterized by possession of at least 60%, 70%, 75%, 80%, 90%, 95% or at least 98% sequence identity counted over the full length alignment with a sequence using the NCBI Blast 2.0, gapped blastp set to default parameters. Queries searched with the blastn program are filtered with DUST (Hancock and Armstrong, Comput. Appl. Biosci. 10:67-70, 1994). It will be appreciated that these sequence identity ranges are provided for guidance only; it is entirely possible that strongly significant homologs could be obtained that fall outside of the ranges provided.
Nucleic acid sequences that do not show a high degree of identity may nevertheless encode similar amino acid sequences, due to the degeneracy of the genetic code. It is understood that changes in nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid sequences that all encode substantially the same protein. An alternative indication that two nucleic acid molecules are closely related is that the two molecules hybridize to each other under stringent conditions, as described under "specific hybridization."
Single Nucleotide Polymorphism (SNP): A single base (nucleotide) difference in a DNA sequence among individuals in a population. SNPs can be causative (actually involved in or influencing the condition or trait to which the SNP is linked) or associative (linked to but not having any direct involvement in or influence on the condition or trait to which the SNP is linked).
Representative SNPs in MUC5AC are displayed in the tables 1 and 2 above.
Specific binding agent: An agent that binds substantially only to a defined target.
Thus a protein-specific binding agent binds substantially only the specified protein. By way of example, as used herein, the term "X-protein specific binding agent" includes anti-X protein antibodies (and functional fragments thereof) and other agents (such as soluble receptors) that bind substantially only to the X protein (where "X" is a specified protein, or in some embodiments a specified domain or form of a protein, such as a particular allelic form of a protein).
Anti-X protein antibodies may be produced using standard procedures described in a number of texts, including Harlow and Lane {Antibodies, A Laboratory Manual, CSHL, New York, 1988). The determination that a particular agent binds substantially only to the specified protein may readily be made by using or adapting routine procedures. One suitable in vitro assay makes use of the Western blotting procedure (described in many standard texts, including Harlow and Lane {Antibodies, A Laboratory Manual, CSHL, New York, 1988)). Western blotting may be used to determine that a given protein binding agent, such as an anti-X protein monoclonal antibody, binds substantially only to the X protein. Shorter fragments of antibodies can also serve as specific binding agents. For instance, Fabs, Fvs, and single-chain Fvs (SCFvs) that bind to a specified protein would be specific binding agents. These antibody fragments are defined as follows: (1) Fab, the fragment which contains a monovalent antigen-binding fragment of an antibody molecule produced by digestion of whole antibody with the enzyme papain to yield an intact light chain and a portion of one heavy chain; (2) Fab', the fragment of an antibody molecule obtained by treating whole antibody with pepsin, followed by reduction, to yield an intact light chain and a portion of the heavy chain; two Fab' fragments are obtained per antibody molecule; (3) (Fab')2, the fragment of the antibody obtained by treating whole antibody with the enzyme pepsin without subsequent reduction; (4) F(ab')2, a dimer of two Fab' fragments held together by two disulfide bonds; (5) Fv, a genetically engineered fragment containing the variable region of the light chain and the variable region of the heavy chain expressed as two chains; and (6) single chain antibody ("SCA"), a genetically engineered molecule containing the variable region of the light chain, the variable region of the heavy chain, linked by a suitable polypeptide linker as a genetically fused single chain molecule. Methods of making these fragments are routine.
Specific hybridization: Specific hybridization refers to the binding, duplexing, or hybridizing of a molecule only or substantially only to a particular nucleotide sequence when that sequence is present in a complex mixture (e.g. total cellular DNA or RNA). Specific hybridization may also occur under conditions of varying stringency.
Hybridization conditions resulting in particular degrees of stringency will vary depending upon the nature of the hybridization method of choice and the composition and length of the hybridizing DNA used. Generally, the temperature of hybridization and the ionic strength (especially the Na+ concentration) of the hybridization buffer will determine the stringency of hybridization. Calculations regarding hybridization conditions required for attaining particular degrees of stringency are discussed by Sambrook et al. (In: Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, New York, 1989 ch. 9 and 11). By way of illustration only, a hybridization experiment may be performed by hybridization of a DNA molecule to a target DNA molecule which has been electrophoresed in an agarose gel and transferred to a nitrocellulose membrane by Southern blotting (Southern, J. MoI. Biol.
98:503, 1975), a technique well known in the art and described in Sambrook et al. (Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, New York, 1989).
Traditional hybridization with a target nucleic acid molecule labeled with [32P] -dCTP is generally carried out in a solution of high ionic strength such as 6 x SSC at a temperature that is 20-25° C below the melting temperature, Tm, described below. For Southern hybridization experiments where the target DNA molecule on the Southern blot contains 10 ng of DNA or more, hybridization is typically carried out for 6-8 hours using 1-2 ng/ml radiolabeled probe (of specific activity equal to 109 CPM/μg or greater). Following hybridization, the nitrocellulose filter is washed to remove background hybridization. The washing conditions should be as stringent as possible to remove background hybridization but to retain a specific hybridization signal.
The term Tm represents the temperature (under defined ionic strength, pH and nucleic acid concentration) at which 50% of the probes complementary to the target sequence hybridize to the target sequence at equilibrium. Because the target sequences are generally present in excess, at Tm 50% of the probes are occupied at equilibrium. The Tm of such a hybrid molecule may be estimated from the following equation (Bolton and McCarthy, Proc. Natl. Acad. ScL USA 48: 1390, 1962):
Tm = 81.5° C - 16.6(logio[Na+]) + 0.41(% G+C) - 0.63(% formamide) - (600/1) where 1 = the length of the hybrid in base pairs.
This equation is valid for concentrations of Na+ in the range of 0.01 M to 0.4 M, and it is less accurate for calculations of Tm in solutions of higher [Na+]. The equation is also primarily valid for DNAs whose G+C content is in the range of 30% to 75%, and it applies to hybrids greater than 100 nucleotides in length (the behavior of oligonucleotide probes is described in detail in Ch. 11 of Sambrook et al. {Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, New York, 1989).
Thus, by way of example, for a 150 base pair DNA probe derived from a cDNA (with a hypothetical % GC of 45%), a calculation of hybridization conditions required to give particular stringencies may be made as follows: For this example, it is assumed that the filter will be washed in 0.3 x SSC solution following hybridization, thereby: [Na+] = 0.045 M; %GC = 45%; formamide concentration = 0; 1 = 150 base pairs; Tm=81.5 - 16.6(logi0[Na+]) + (0.41 x 45) - (600/150); and so Tm = 74.4° C.
The Tm of double-stranded DNA decreases by 1-1.5° C with every 1% decrease in homology (Bonner et al., J. MoI. Biol. 81 : 123, 1973). Therefore, for this given example, washing the filter in 0.3 x SSC at 59.4-64.4° C will produce a stringency of hybridization equivalent to 90%; that is, DNA molecules with more than 10% sequence variation relative to the target cDNA will not hybridize. Alternatively, washing the hybridized filter in 0.3 x SSC at a temperature of 65.4-68.4° C will yield a hybridization stringency of 94%; that is, DNA molecules with more than 6% sequence variation relative to the target cDNA molecule will not hybridize. The above example is given entirely by way of theoretical illustration. It will be appreciated that other hybridization techniques may be utilized and that variations in experimental conditions will necessitate alternative calculations for stringency.
Stringent conditions may be defined as those under which DNA molecules with more than 25%, 15%, 10%, 6% or 2% sequence variation (also termed "mismatch") will not hybridize. Stringent conditions are sequence dependent and are different in different circumstances. Longer sequences hybridize specifically at higher temperatures. Generally, stringent conditions are selected to be about 5° C lower than the thermal melting point Tm for the specific sequence at a defined ionic strength and pH. An example of stringent conditions is a salt concentration of at least about 0.01 to 1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and a temperature of at least about 30° C for short probes (e.g. 10 to 50 nucleotides). Stringent conditions can also be achieved with the addition of destabilizing agents such as formamide. For example, conditions of 5 X SSPE (750 mM NaCl, 50 mM Na Phosphate, 5 mM EDTA, pH 7.4) and a temperature of 25-30° C are suitable for allele- specific probe hybridizations.
The following is an exemplary set of hybridization conditions and is not meant to be limiting:
Very High Stringency (detects sequences that share 90% identity) Hybridization: 5x SSC at 65°C for 16 hours Wash twice: 2x SSC at room temperature (RT) for 15 minutes each
Wash twice: 0.5x SSC at 650C for 20 minutes each
High Stringency (detects sequences that share 80% identity or greater) Hybridization: 5x-6x SSC at 65°C-70°C for 16-20 hours Wash twice: 2x SSC at RT for 5-20 minutes each
Wash twice: Ix SSC at 55°C-70°C for 30 minutes each
Low Stringency (detects sequences that share greater than 50% identity) Hybridization: 6x SSC at RT to 550C for 16-20 hours Wash at least twice: 2x-3x SSC at RT to 55°C for 20-30 minutes each.
A perfectly matched probe has a sequence perfectly complementary to a particular target sequence. The test probe is typically perfectly complementary to a portion (subsequence) of the target sequence. The term "mismatch probe" refers to probes whose sequence is deliberately selected not to be perfectly complementary to a particular target sequence.
Transcription levels can be quantitated absolutely or relatively. Absolute quantitation can be accomplished by inclusion of known concentrations of one or more target nucleic acids (for example control nucleic acids or with a known amount the target nucleic acids themselves) and referencing the hybridization intensity of unknowns with the known target nucleic acids (for example by generation of a standard curve).
Subject: Living, multicellular vertebrate organisms, a category that includes both human and veterinary subjects for example, mammals, birds and primates.
Transformed: A transformed cell is a cell into which has been introduced a nucleic acid molecule by molecular biology techniques. As used herein, the term transformation encompasses all techniques by which a nucleic acid molecule might be introduced into such a cell, including transfection with viral vectors, transformation with plasmid vectors, and introduction of naked DNA by electroporation, lipofection, and particle gun acceleration.
Vector: A nucleic acid molecule as introduced into a host cell, thereby producing a transformed host cell. A vector may include nucleic acid sequences that permit it to replicate in a host cell, such as an origin of replication. A vector may also include one or more selectable marker genes and other genetic elements known in the art.
Unless otherwise explained, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The singular terms "a," "an," and "the" include plural referents unless context clearly indicates otherwise. Similarly, the word "or" is intended to include "and" unless the context clearly indicates otherwise. Hence "comprising A or B" means including A, or B, or A and B. It is further to be understood that all base sizes or amino acid sizes, and all molecular weight or molecular mass values, given for nucleic acids or polypeptides are approximate, and are provided for description. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including explanations of terms, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.
///. Sequence Variants in Mucin Genes Linked to Pulmonary Fibrosis
Using the techniques discussed herein, associations were found between alleles of the MUC5AC gene and both FIP and IPF. Single nucleotide polymorphisms (SNPs) within the coding region and introns of MUC5 AC are demonstrated to be linked to increased likelihood of IIP. Thus, described herein is the identification of a series of single nucleotide polymorphisms (SNPs), and haplotypes, near or in MUC5AC that partially predict susceptibility to IIP. These SNPs include but are not limited to the SNPs in tables 1 and 2 above.
The discovery that polymorphisms in the sequence of MUC5AC predisposes a subject to IIP enables a variety of diagnostic, prognostic, and therapeutic methods that are further embodiments. The new appreciation of the role ofMUC5AC in IIP enables detection of predisposition to this condition in a subject. This disclosure also enables early detection of subjects at high risk of this and related conditions, and provides opportunities for prevention, diagnosis, prognostication, and/or early treatment.
Moreover, the finding that SNPs in MUC5AC predispose families and individuals to IIP, strongly suggest that airway mucins represent essential glycoproteins that protect the air- lung interface and are very likely to play a role in other fibroproliferative diseases like asthma, chronic obstructive lung disease, granulomatous lung diseases, and pneumoconioses. This is underscored by the relatively high prevalence of SNPs MUC5AC that may represent common susceptibility variants that predispose to the general population to these fibroproliferative processes in the lung.
IV. Mucin 5AC (MUC5AC) and Associated Genomic Sequences No RefSeq is available for MUC5 AC so a model sequence (XM_001714774.1 ) was used to construct the primers and comparative sequence for this gene as illustrated in the below table 4. Each of these public database entries, as they are available on the date of filing of this document, is incorporated herein in its entirety. Table 4
Figure imgf000030_0001
Figure imgf000031_0001
Figure imgf000032_0001
V. Additional Characterization of Idiopathic Interstitial Pneumonia (IIP) Susceptibility SNPs
In order to more fully understand the IIP susceptibility SNPs and other sequence variants described herein, and particularly to determine which have a causative effect and how they influence or alter the activity or expression of specific molecules, additional characterization can be carried out. The following material describes representative methods useful for such characterization.
Sections of non-coding nucleic acid identified herein, particularly those identified herein as including a variant can be tested for functionality or changes in functionality between two or more alleles. For example, segments of DNA can be amplified separately from individuals homozygous for risk alleles and from individuals homozygous for non-risk alleles. Each segment is cloned upstream of a reporter gene (such as luciferase), the resulting constructs transfected into various cell lines, such as lung cells and other cells, and the relative amount of luciferase reporter expression compared. If there is a significant difference between the levels of luciferase expression between the constructs, this indicates that the SNP(s) in that segment likely affect expression of the corresponding mucin or another linked or associated gene.
Additional possible susceptibility SNPs in the region defined herein also can be identified. By way of example, this can be done by surveying public databases of SNPs, and by sequencing DNA from subjects affected with IIP (or another fibroproliferative condition or disease involving fibrosis of the small airways) and from controls. These SNPs can then be tested for evidence of association with IIP disease status and intermediate quantitative traits by genotyping cases and controls, for instance using methods like those described herein. SNPs that show the strongest evidence for association may be better candidates for the causative SNP. This genotype data can also be used to test haplotypes for evidence of association with disease, to help determine whether as yet unidentified SNPs may be more strongly associated.
The findings reported herein can be further corroborated by collecting and testing additional case-control samples for evidence of association of the identified SNPs and haplotype(s) with IIP and other conditions or diseases involving fibrosis of the small airways. In addition, the locations of all the identified SNPs can be compared to segments of DNA conserved across species, because SNPs located in these segments are believed to be more likely to be affect gene expression or function.
It is also advantageous to determine whether SNPs found to be linked to susceptibility to IIP affect the ability of protein(s) to bind to the surrounding segment of DNA. Methods for determining binding are well known to those of ordinary skill in the art, including but not limited to methods described herein.
VI. Isolation of Nucleic Acid(s)
The variant elements (including SNPs and haplotypes) described herein are useful as markers, for instance to identify genetic material as being derived from a particular individual or in making assessments regarding the propensity of an individual to develop a particular disorder or condition {e.g., IPF, etc), the ability of an individual to respond to a certain course of treatment, or in other diagnostic or prognostic and other methods described in more detail herein. Genetic material (nucleic acids such as genomic DNA, RNA, and cDNA) suitable for use in such methods can be generated or derived from a variety of sources. For example, nucleic acid molecules (preferably genomic DNA) can be isolated from a cell from a living or deceased subject using methods well known to those of ordinary skill in the art. Cells can be obtained from biological samples, for instance from tissue samples or from bodily fluid samples that include cells (e.g., blood, urine, semen, exudates, or saliva). Detection methods of the disclosure can be used to detect variant elements in DNA in a biological sample in intact cells (for instance, using in situ hybridization) or in extracted DNA (for instance, using Southern blot hybridization).
VII. Representative Uses of SNPs and Haplotypes
The variants (including individual SNPs and haplotypes) described herein are useful as markers or indicators in a variety of different methods. They can be used, for instance, in diagnostic and prognostic assays, and in monitoring clinical trials for the purposes of predicting outcomes of developing or ongoing therapeutic or treatment regimens. The results of such methods can be used to develop or recommend a course of prophylactic treatment for an individual who is identified as having a specific SNP or combination of SNPs (or a haplotype), to prescribe or develop a course of therapy after identification that a subject has or suffers from a disease or disorder, or to alter or adapt an ongoing therapeutic regimen. Certain embodiments therefore include diagnostic methods for detecting one or more
SNPs or a haplotype in a biological sample, to thereby determine whether a subject is at risk of developing a disorder or disease or condition linked to one or more of the SNPs or the haplotypes described herein, or whether the subject is afflicted with the disease, condition or disorder. The subject methods also can be used to determine whether a subject is at risk for passing on the susceptibility to develop a disease, condition or disorder to their offspring.
Also provided are prognostic, predictive methods for determining whether a subject is at risk of developing a disease, condition or disorder that affects fibrosis, in particular fibrosis of the lung, such as particularly fibrosis of the small airways (including asthma and chronic obstructive pulmonary disease), including for instance IIP such as familial or idiotypic pulmonary fibrosis. For example, SNP sequences or haplotypes can be assayed in a biological sample from a subject. Such assays can be used for prognostic, diagnostic, or predictive purpose to prophylactically or therapeutically treat an individual prior to or after the onset of a disorder, disease or condition (such as IIP) associated with one or more of the SNPs/haplotypes described herein, specifically those located in at 1 lpter.
The nucleotide variants (including individual SNPs and haplotypes) provided herein also can be used for generating polynucleotide reagents. Methods are also provided for identifying or screening for compounds useful for treating or influencing or preventing a disease, disorder or condition associated with a SNP or haplotype located in at 1 lpter.
The following examples are provided to illustrate certain particular features and/or embodiments. These examples should not be construed to limit the invention to the particular features or embodiments described.
EXAMPLE 1: Familial Interstitial Pneumonia is linked to Chromosomes 11, 10, and 12
This example describes the identification of regions of interest in the genome relevant to FIP and/or IIP, and evaluation of smoking as a covariate. We collected 82 families with > 2 members with probable/definite IIP. A genomic screen using 887 microsatellite repeat markers spaced at 4.2 cM intervals was performed. The 82 families were subdivided into those with only idiopathic pulmonary fibrosis (IPF) type disease [homogeneous FIP] and families with = 1 case of IPF and one other type of IIP [heterogenous FIP]. In brief (see Table 3 above), a maximum multipoint lod score of 3.34 was identified at Dl IS 1318, incorporating a 8.8 cM region bounded by Dl 1S4046 and Dl ISl 760. A second linkage peak spans approximately 15 cM and is bounded by D 1OS 1751 and D 1OS 1664 (maximum multipoint LOD score of 2.07 at D 1OS 1649). When considered alone, the homogeneous families identified region of interest at D12S368 (maximum multipoint lod score 2.50). Families with a lower prevalence of cigarette smoking among affected individuals contributed significantly to evidence for linkage at 1 lpter (maximum lod = 4.40; p=0.01).
Regions on chromosomes 10, 1 1, and 12 are identified as containing genes contributing to FIP. Moreover, linkage on chromosome 1 1 is influenced by cigarette smoking, and linkage on chromosome 12 is influenced by disease phenotype.
INTRODUCTION
The idiopathic interstitial pneumonias (IIP) are a clinically heterogeneous group of histologies resulting in fibrosing interstitial lung diseases that lead to hypoxemic respiratory insufficiency. The most common is IIP is usual interstitial pneumonia (UIP), the underlying histology of IPF. Typically, IPF (OMIM 178500) presents in late life and is lethal within 4-5 years of diagnosis. Treatment options, apart from lung transplantation, are limited and do not appear to prolong survival (King et al , Amer JResp Crit Care Med 646-664, 2000; Travis et al , Pneumonias. Amer J Respir Crit Care Med 2:277-304, 2002).
The evidence for a genetic basis to IIP is substantial. Although for most IIP the family history is negative, studies have suggested that in 5-10% of cases such a history exists (Mageto & Raghu, Curr. Opin. PuIm. Med. 3:336-340, 1997; Marshall et al.Jnt. J. Biochem. Cell Biol. 29:107-120, 1997; Marshall et al, Thorax 55:143-146, 2000), and the true prevalence of the familial cases may be underestimated (Steele et al., Am J Respir Crit Care Med 172:1146-1 152, 2006). Cases of UIP with a positive family history are termed familial interstitial pneumonia (FIP). Interstitial lung disease has been associated with a variety of genetic diseases with a known inheritance pattern such as Hermansky-Pudlak syndrome (DePinho & Kaplan, Medicine (Baltimore) 64:192-202, 1985, neurofibromatosis (Riccardi, N. Engl. J. Med. 305: 1617-1627, 1981), tuberous sclerosis (Makle et al, Chest 538-540,
1970; Harris et al, Am. Rev. Respir. Dis. 100:379-387, 1969) along with several others; and, at least one family segregating pulmonary fibrosis been shown to carry a mutation in surfactant protein C (Thomas et al , AmJ. Respir. Crit Care Med. 165 : 1322- 1328, 2002). Development of disease varies among individuals exposed to similar levels of fibrogenic dusts such as asbestos and/or organic antigens, suggesting an underlying genetic susceptibility (Polakoff et al, Ann.N. Y.Acad.Sci. 330:333-339, 1979; Selikoff et al, Ann.N. Y.Acad.Sci. 330:295-311, 1979).
Familial aggregation has been confirmed through a variety of studies in twins, in siblings raised apart, and in multigenerational families (Marshall et al, Thorax 55:143-146, 2000; Steele et al, Am J Respir Crit Care Med 172:1146-1152, 2006; Javaheri et al, Chest 78:591-594, 1980 Solliday et al, Am. Rev. Respir. Dis. 108:193-204, 1973; Bonanni et al, Am.J.Med. 39:41 1-421, 1965; Hughes, Thorax 19:515-525, 1964; Swaye et al, Dis. Chest 55:7-12, 1969; Bitterman et al, N.Engl.J.Med 314:1343-1347, 1986; Lee et al, Chest 127:2034-2041, 2005; Hodgson et al, Thorax 57:338-342, 2002; Tsukahara & Kajii, Jinrui.Idengaku.Zasshi 28:263-267, 1983; Adelman et al, Can. Med. Assoc. J. 95:603-610, 1966; McKusick & Fisher, Ann. Intern. Med. 48:774-790, 1958; Musk et al, Chest 89:206- 210, 1986). Recently, we confirmed statistically significant aggregation of affected individuals within a subset of families and identified 20 multigenerational pedigrees consistent with autosomal dominant inheritance (Steele et al, Am J Respir Crit Care Med 172:1 146-1152, 2006). These clinical observations are supported by studies in animal models. For example, C57BL/6 mice are more apt to develop more lung fibrosis as opposed to BALB/c or 129 mice following exposure to bleomycin or asbestos (Corsini et al, AmJ. Respir. Cell Mol.Biol 1 1 :531-539, 1994; Warshamana et al, Am. J.Respir.Cell Mol.Biol. 27:705-713, 2002; Rossi et al, Am. Rev. Respir. Dis. 135:448-455, 1987; Ortiz et al, Exp.Lung Res. 24:721-743, 1998).
Interstitial lung disease also results from environmental exposures such as inhalation of fibrogenic dusts or air-borne organic antigens including exposures such as coal dust, wood or metal dust, mold, silica, and cigarette smoke (Marshall et al, Thorax 55:143-146, 2000; Mullen et al, J. Occup. Environ. Med. 40:363-367, 1998; Baumgartner et al, Am. J. Epidemiol. 152:307-315, 2000). Latent herpesvirus infections have been associated with an increased risk of this disease (Turner- Warwick, Thorax 53 Suppl 2:S3-S9, 1998; Ferri et al, BrJ. Rheumatol. 36:360-365, 1997). Smoking has long been considered an important risk factor for the development of IPF (Hanley et al, Am. Rev. Respir. Dis. 144: 1102-1106, 1991 ; Schwartz et al, Am. Rev. Respir. Dis. 144:504-506, 1991; Wells et al, AmJ.Respir.Crit Care Med. 155:1367-1375, 1997), and we have shown that cigarette smoking is also a risk factor in the development of FIP (Steele et al, Am J Respir Crit Care Med 172:1146-1152, 2006).
It is likely that complex interactions between genes and environmental exposures are involved in the development of IIP. Identifying underlying genetic risks may help to focus studies on environmental exposures, disease pathogenesis, and targeted interventions.
Results from this study demonstrate that familial interstitial pneumonia (FIP) is influenced by both genetic and environmental factors, and that while linkage on chromosome 11 is influenced by cigarette smoking, the linkage on chromosome 12 is influenced by disease phenotype.
METHODS
Clinical data collection
Subject ascertainment occurred through web-based advertising (www.fpf.duke.edu/ and http:/www.nhlbi. nih.gov/studies/fibrosis/) and direct mailings to physician members of the American Thoracic Society (ATS) to identify potential families. A toll-free number (877-487-441 1) was established to facilitate subject participation. Family, Ascertainment, and Phenotyping Three sites in the United States (National Jewish Medical and Research Center Denver, CO; Vanderbilt University, Nashville, TN; and Duke University Medical Center, Durham, NC) were established to identify subjects with FIP, and to enroll and phenotype probands and family members. The study was approved by the respective institutions' institutional review boards (IRB) and a certificate of confidentiality was obtained from the National Institutes of Health. Following informed consent, all subjects were asked to complete a detailed health and environmental exposure questionnaire, and to obtain a chest radiograph (PA and lateral) and a carbon monoxide diffusing capacity (DLCO) measurement at a local health facility. Dyspnea was assessed utilizing the assessment described in the ATS-DLD-78 questionnaire (supplemental methods) (Ferris, Am.Rev.Respir.Dis. 1 18:1-120, 1978). We obtained a high resolution chest CT (HRCT) scan in the prone and supine position on those subjects who had either unexplained dyspnea of grade 2 or greater, an abnormal chest radiograph suggestive of interstitial lung disease (ILD), a DLCO < 80% predicted, or those who self-reported a diagnosis of ILD. All radiologic images were forwarded to Duke University and independently interpreted by two investigators (MPS and DAS) who were blinded to the clinical history. Standard criteria (King et al, Amer J Resp Crit Care Med 646-664, 2000; Travis et al, Pneumonias. Amer J Respir Crit Care Med 2:277-304, 2002) were used to establish the diagnosis of IIP and inconsistencies between the individual readers were resolved by consensus using a third reader (HPM). Subjects with a HRCT scan suggestive of IIP were recommended to undergo a surgical lung biopsy. All phenotype data, including questionnaires, relevant medical history, digitized radiographic images, and lung function measurements, were entered into PEDIGENE, a secure, coded database.
DNA Specimens Subject DNA was isolated from whole blood with a Gentra Autopure robotics workstation (Gentra Systems, Minneapolis MN), and quantified by UV spectrophotometry on a Nanodrop ND- 1000 spectrophotometer (Nanodrop Technologies, Wilmington DE). All samples were barcoded and entered into an Oracle-based LIMS database (Nautilus™ LIMS, THERMO Electron Corporation, Waltham, MA). Diagnostic Assignment of Study Subjects
For the purposes of this study, a diagnosis of FIP required the presence of 2 or more cases of probable or definite IIP in individuals related within three degrees. We used criteria established by the American Thoracic Society (ATS) and European Respiratory Society (ERS) to guide the classification of patients with ILD (King et al., Amer J Resp Crit Care Med 646-664, 2000; Travis et al, Pneumonias. Amer J Respir Crit Care Med 2:277-304, 2002). Diagnostic categories were unaffected, possible affected, probable affected, and definite affected. Unaffected was defined as no evidence of interstitial lung disease on chest radiograph, DLCO > 80% predicted, and a dyspnea level of 0 or 1 using the ATS dyspnea scale. Definitely affected was defined as either surgical lung biopsy or autopsy evidence of an IIP with an appropriate clinical history. Lung biopsy samples were classified by one of us (TAS) according to revised criteria for the diagnosis of IIPs (Travis et al. , Pneumonias. Amer J Respir Crit Care Med 2:277-304, 2002). Probably affected was defined as bilateral reticular abnormalities associated with honeycombing on HRCT. If honeycombing was absent, bibasilar reticular abnormalities, with or without ground glass opacities in the absence of other explanations for interstitial abnormalities on HRCT, plus either dyspnea of grade 2 or greater or a DLCO < 80% also met the definition. Possibly affected was defined as those subjects with chest radiographs suggestive of ILD who did not have additional testing to establish a more certain diagnosis. Indeterminate was used for those subjects for whom the investigators thought the technical quality of the data was unreliable. For deceased subjects, medical records, radiology reports, autopsy reports, archived lung biopsy slides, and pathology reports were jointly reviewed by study investigators (MPS and DAS) and classified using the best available evidence. Genotyping Methods The initial genomic screen of 50 families included 1 198 microsatellite repeat markers, however as additional families were added at a later date the composition of the DeCode linkage panel evolved. Thus, markers not typed on all 82 families, along with markers with less than 80% efficiency (% called genotypes) were eliminated from the analysis, leaving a total of 887 markers with an average inter-marker distance of 4.2 cM (range = 0 to 27.4 cM). Error Checking
Mendelian pedigree inconsistencies were identified using PEDCHECK (O'Connell & Weeks, Am.J.Hum.Genet. 63:259-266, 1998) and checked by laboratory technicians who were blinded to the pedigree structure. Further verification of inter and intra-familial genetic relationships was performed using RELPAIR (Epstein et al. , Am.J.Hum.Genet 67:1219-1231 , 2000; Boehnke & Cox, Am.J.Hum.Genet. 61 :423-429, 1997) at the beginning of the study using the first 50 genotyped markers and then later using all 887 genotyped markers. Linkage Analysis
Linkage analysis was performed in a series of 82 multiplex families. Eighty of the 111 families described in our clinical description (Steele et al., Am J Respir Crit Care Med 172: 1 146-1 152, 2006) were included in the genomic screen; the remainder of the 1 1 1 families were excluded from the genomic screen because of lack of DNA or lack of informativeness for linkage analysis. Two newly ascertained families, identified using the identical ascertainment strategies as the first series of families, were also included in this linkage analysis.
Our primary analysis consisted of all 82 families regardless of the type of IIP within the family. However, to address the possibility of genetic heterogeneity, we divided our 82 families into homogeneous families, those in which affected family members only had IPF/UIP diagnoses, and heterogeneous families in which at least one affected family member had IPF/UIP form of IIP and at least one affected individual had another type of IIP, such as non-specific interstitial pneumonia (NSIP), cryptogenic organizing pneumonia (COP), respiratory bronchiolitis-associated interstitial lung disease (RBILD), or other IIP. For all three diagnostic strategies (all families, homogeneous families, and heterogeneous families), we used a rigorous definition of unaffected (no evidence of IIP on chest radiograph, DLco > 80%, and dyspnea class < 1). Individuals who were classified as possibly affected or indeterminate were considered of unknown status and their data were not included in the linkage analysis. Using this classification strategy, of the 82 families in the genomic screen, 64 were nuclear families only and 18 were extended pedigrees (including at least one affected relative pair more distantly related than siblings); in the homogeneous pedigrees, 39 were nuclear and 3 were extended; and in the heterogeneous pedigrees, 25 were nuclear and 15 were extended.
Linkage analysis for all three analytic models was performed using the Merlin statistical genetic software package (Abecasis et al, Nat. Genet. 30:97-101, 2002) and applying a non-parametric linkage approach using an exponential model. Non-parametric identity-by-descent sharing statistics (LOD) between all pairs of affected individuals within a pedigree were calculated using the Spairs option since our sample consisted of pedigrees of varied sizes. Thus, though all genotyped individuals were used to help infer the genotype of unavailable affected individuals, only affected individuals were used to calculate the LOD statistic. Genetic marker distance was based on maps from DeCode Genetics. Map order was verified using Map-O-Mat (Matise et al, Am.J.Hum.Genet. 73:271- 284, 2003). Marker allele frequencies were estimated from the data using all individuals (Broman, Genet Epidemiol. 20:307-315, 2001). Multipoint LOD scores > 2.0 were considered to be interesting. Approximate 95% confidence intervals were determined using the one-LOD- score-down method. To evaluate our families for genetic heterogeneity, an ordered subset analysis (OSA) (Hauser et al, Genet Epidemiol. 27:53-63, 2004) was applied to the 82 families in the genomic screen. Briefly, the OSA approach uses a family-specific continuous covariate to rank families according to the covariate value. A subset of families with the maximum evidence for linkage to a particular genetic marker(s), conditional on the covariate ranking, is identified. This approach may identify subsets of the data that are more homogeneous than others, thereby potentially identifying regions of linkage evidence previously unrecognized. The evidence for an increased linkage signal in the subset of families is assessed statistically using permutation. Non-parametric multipoint family-specific LOD scores calculated in the genomic screen of the full set of 82 pedigrees were used as input to the computer program. The potential contributions of disease age-at-onset, and smoking exposure to disease risk were evaluated. For the OSA, disease age-at-onset was defined as an average patient- reported age of first recognition of breathlessness or, when not available, age-at-first diagnostic CT scan was used as a surrogate. For cigarette smoking, the family-specific variable was defined as the proportion of affected individuals within a family who were current or former smokers among those who had smoking history data available.
RESULTS Linkage analysis in 82 families Demographic and clinical characteristics of these 82 families are shown in Table 4 below.
Figure imgf000041_0001
Figure imgf000042_0001
Figure imgf000043_0001
Not all affected individuals were genotyped. 42 affected individuals were deceased and DNA therefore unavailable. Genotypes on these individuals were reconstructed in the linkage analysis by the inclusion of offspring and spouse (when available).
2Age-at-diagnosis is defined as self-report of age-at-first-recognized breathlessness; or, when not available, age at first diagnostic HRCT scan. 3p-value for homogeneous vs. heterogeneous < 0.03
These characteristics, as well as those for the heterogeneous and homogeneous families, are based on all affected individuals, regardless of whether or not they were genotyped. This approach allows as clear a phenotypic characterization on as many diagnosed cases as possible. Although some affected individuals have died (or may not have been available for genotyping), they were phenotyped from medical records, imaging studies, and tissue review. Of the affected individuals in the 82 families, 55.6% were male. The average age of diagnosis, was 66.3 +/- 10.4. Overall, 68.3% of affected subjects were current or former smokers, with an average of 16.3 pack-years. These findings are similar to those reported previously (Steele et al, Am J Respir Crit Care Med 172:1146-1 152, 2006) in a series of 1 1 1 families with > 2 cases of IIP that included 309 characterized affected individuals, a subset of which are incorporated into the genomic screen.
Genetic analysis of the first 50 genotyped markers identified 2 individuals with incorrect gender and 2 individuals who were genetically inconsistent with reported pedigree structure. These four individuals were eliminated from the genomic screen. Additionally, three individuals formerly reported as siblings were identified as half-siblings. The 82 families included 559 genotyped individuals (202 affected). Error checking of the remaining genotyping demonstrated > 99.5% accuracy in genotypes when compared against internal controls.
The linkage analysis in the 82 families demonstrated two regions of interest, LOD score > 2, (Table 3 and Figure 1) on chromosomes 10 and 11. The most convincing evidence for linkage occurred on chromosome 11 where the maximum multipoint LOD score peaked at 3.3 at D11S1318. The approximate 95% confidence interval for this marker was bounded by markers D11S4046 and D11S1760, spanning approximately 8.8 cM (Figure 2).
A second region of interest was identified on chromosome 10, where the maximum multipoint LOD score of 2.1 occurred at D 1OS 1649. The region of interest on chromosome 10 has an approximate 95% confidence interval that spans 15 cM and is bounded by
D10S1751 and D10S1664.
Linkage analysis in families with homogeneous or heterogeneous families
Among the 42 homogeneous families (IPF/UIP only), 244 individuals (96 affected) were genotyped. All affected individuals in these families had an HRCT or lung biopsy consistent with IPF/UIP. Clinical and demographic characteristics for the affected individuals in the homogeneous and heterogeneous families [Table 4] were calculated using all affected members of the pedigrees, regardless of whether they were genotyped or not, as described above. Families with homogeneous disease (IPF/UIP only) tended to have a later age-at-diagnosis, later average age-at-death, were more often male, and were more likely to be cigarette smokers than heterogeneous families (Table 4). Limiting the analysis to homogeneous families, we identified a region of interest on chromosome 12 with a maximum multipoint LOD score of 2.5 at D12S368 (approximate 95% CI spanning an approximate 22.6 cM region bounded by D12S1704 - D12S83). In the 40 heterogeneous families, 315 individuals (106 affected) were genotyped. The heterogeneous families included 88 cases of UIP (61 probable and 27 definite), 14 cases of NSIP (6 probable and 8 definite), 2 cases of COP (definite), 1 case of RBILD (probable), and 32 cases of other, unclassified IIP (31 probable and 1 definite). No specific locus was identified within the heterogeneous pedigrees.
Effect of cigarette smoking and age-at-onset on the linkage analysis
Using ordered subset analysis (OSA), we found that families with a lower proportion of smokers among affected individuals contributed significantly to evidence in favor of linkage at l lpter (p=0.01 ; maximum lod = 4.4). Families in which fewer than 67% of affected individuals with smoking data had ever smoked were considered low-smoking families. Age-at-onset did not significantly affect the LOD scores.
DISCUSSION These findings demonstrate that familial interstitial pneumonia (FIP) is influenced by genetic and environmental factors. We have found that regions on chromosomes 10, 11, and 12 likely contain genes that contribute to the development of FIP. Moreover, our findings indicate that evidence for the linkage on chromosome 11 is increased among those families with a lower proportion of cigarette smoking among affected individuals and the linkage on chromosome 12 is evident only in the homogeneous families. In aggregate, these findings suggest that FIP is a complex disease, associated with multiple phenotypes, and influenced by both multiple genes and at least one defined environmental exposure, cigarette smoke.
Traditionally, genomic screens in complex diseases depend on large datasets of multiplex pedigrees; however, genomic screens of as few as 31-80 pedigrees have identified regions of interest and ultimately genes in many complex diseases (Namjou et al, Arthritis Rheum. AfrlVhl-lWi, 2002; Winn et al, Science 308:1801-1804, 2005; Rampersaud et al, J.Med.Genet. 42(12):940-946, 2005; Wang et al, Science 302:1578-1581, 2003; Ashley- Koch et al, Neurosci.Lett. 379:199-204, 2005; Blanton et al, J.Med.Genet. 39:567-570, 2002; Tlili et al, Hum Hered 60:123-128, 2005; Puranam et al, Ann.Neurol 58:449-458, 2005). For example, the initial genomic screen in late-onset Alzheimer disease included only 31 multiplex pedigrees (Pericak-Vance et al, AmJ. Hum. Genet. 48:1034-1050, 1991) and identified a region of interest on chromosome 19 that was subsequently found to harbor APOE, a major susceptibility gene for Alzheimer disease. Thus, small sample sizes in genomic screens can identify regions of interest when the genetic effect is strong. To our knowledge, only one other genomic screen has been reported in pulmonary fibrosis (Hodgson et al, Am.J.Hum.Genet. 79:149-154, 2006). This screen, performed in 6 multiplex pulmonary fibrosis families from Finland, identified a region of haplotypic sharing at 4q31. The Finnish phenotype most closely matches our homogeneous IPF/UIP subset of families. Markers from our genomic screen completely overlapped the haplotype on 4q31, and there was no evidence for linkage to this region within our homogeneous families; similarly, there is no evidence for linkage to this region in either the heterogeneous families or the full dataset of 82 families. Such population-specific differences in genetic susceptibility are not uncommon in complex diseases. However, the evidence in favor of linkage to l lpter is substantial, with multipoint LOD scores > 3.0 among all 82 families that increase significantly when limited to families with a low proportion of cigarette smokers. There are about 206 known genes in the interval spanning the approximate 95% confidence interval of the chromosome 1 1 linkage peak. Many of the genes are of unknown function, but there are many of known function and of potential relevance to pulmonary fibrosis. Of note, there is a 1 Mb region spanning I lpl5 having at least 7 imprinted genes, representing one of the largest imprinted domains in the human genome. IGF-2 is one of the imprinted genes, and in contrast to others in the region, is expressed from the paternal allele. The presence of an FIP susceptibility gene that is imprinted may represent one explanation for the apparent autosomal dominant inheritance with reduced penetrance seen in FIP in some pedigrees, although many of our pedigrees are inconsistent with an imprinting hypothesis. Several genes in this region are either known or putative growth related genes including the insulin precursor, insulin like growth factor 2, tumor suppressor genes TSSC4, cyclin-dependent kinase inhibitor 1C (p57), and FGF receptor activating protein 1. The Ro/SSA sicca syndrome antigen is in the region, and Sjogren's syndrome is known to be associated with fibrosing interstitial lung disease. Matrix metalloproteinase 26, MMP26, is also in the interval. Matrix metalloproteinases are involved in tissue remodeling and wound healing and are, therefore, attractive biological candidates. MMP7 has been identified as a differentially expressed gene in pulmonary fibrosis, and targeted deletion of MMP7 is protective against bleomycin- induced lung injury and pulmonary fibrosis (Marzuola, Nature 417:679, 2002).
Recently, Armanios et al. (Armanios, M. Y., et al. Telomerase mutations in families with idiopathic pulmonary fibrosis. N Engl J Med 356, 1317-1326, 2007) reported that 8% of families (N=73) with familial idiopathic pulmonary fibrosis had heterozygous mutations in TERT/TERC. They hypothesize that the "fibrotic lesion in patients with short telomeres due to TERT/TERC mutations is provoked by a loss of alveolar cells rather than by a primary fibrogenic process, such as one that would seem to occur in autoimmune disease associated with lung fibrosis." This hypothesis suggests a novel mechanism for development of pulmonary fibrosis. To our knowledge, none of our families have evidence of an autoimmune component to disease. Specific evaluation of markers flanking TERT/TERC in our families failed to identify families that are conclusively linked to this gene, although some families did show slightly positive LOD scores.
While cigarette smoking has long been associated with an increase in risk for IIP (Hanley et al., Am.Rev.Respir.Dis. 144:1102-1 106, 1991 ; Schwartz et al., Am.Rev.Respir.Dis. 144:504-506, 1991; Wells et al, Am.J.Respir.Crit Care Med. 155: 1367-1375, 1997), its contribution to genetic susceptibility is novel. In many complex diseases, early age-at-onset or diagnosis appears to be associated with a more genetically-loaded phenotype. However, among families with FIP, the most compelling genetic signal was observed in families with a low proportion of smokers, not among families with an early age of onset. These results suggest that the strongest genetic signal in FIP may be observed among those with less environmental stress (less exposure to cigarette smoke), and that when environmental exposures are more substantial, genetic factors may be more difficult to identify.
The identification of a gene or genes influencing the development of IIP could be of critical importance in developing novel approaches to prevention and treatment. For instance, identification of a gene predisposing to IIP may enable pre-symptomatic genetic counseling, ealier disease recognition and treatment, and gene-targeted interventions. Our finding that affected individuals with less smoking tend to have a stronger contribution to genetic linkage at l lpter - underscores the importance of considering environmental risk factors in FIP and other genetic diseases.
EXAMPLE 2: Identification of SNPs in Mucin 5AC (MUC 5 AC) associated with pulmonary fibrosis.
We previously observed segregation of affected individuals within families consistent with autosomal dominant inheritance (Steele, M.P., et al. Clinical and pathologic features of familial interstitial pneumonia. Am J Respir Cr it Care Med 172, 1 146-1152, 2005), and identified linkage to the p-terminus of chromosome 11 with a LOD score of 3.3 (1 Ip 15.4- 15.5) (Speer, M., et al. Familial interstitial pneumonia is linked to chromosomes 10, 11, and 12. submitted ,2008). Fine mapping was conducted in the 8.8 cM region of interest on chromosome 11 (bounded by D1 1S4046 and D11S1760) (Figure 2), selecting 306 SNPs using LD bins and the SNPselector program (Xu, H., et al SNPselector: a web tool for selecting SNPs for genetic association studies. Bioinformatics 21, 4181-4186, 2005) (Figure 3). Association testing revealed a SNP, rs7944723 (C/G), that was significantly associated with FIP (145 familial cases versus 233 controls; P=OOOOOOl) and IPF (152 cases of IPF versus 233 controls; P=O.002). rs7944723 is a synonymous SNP (Pro4205Pro) in the mucin 2 gene, MUC 2, that maps to a region harboring 4 mucin genes (telomere to centromere: MUC6, MUC2, MUC5AC, and MUC5E). While there are recombination hotspots located between MUC6 and MUC2, and within the proximal portion of MUC5 B (Rousseau, K., et al. Allelic association and recombination hotspots in the mucin gene (MUC) complex on chromosome 1 Ipl 5.5. Ann Hum Genet 71, 561-569, 2007), markers within MUC2 (primarily expressed in the colon and induced in inflamed lungs) and MUC 5 AC (primarily expressed in stomach and lungs and induced in inflamed lungs) exhibit strong LD (Rousseau, K., et al. Allelic association and recombination hotspots in the mucin gene (MUC) complex on chromosome 1 Ipl5.5. Ann Hum Genet 71, 561-569, 2007). Thus, both MUC2 and MUC5AC were selected for re-sequencing using the oligonucleotide primers for PCR and sequencing listed in Table 4.
Re-sequencing of MUC 2 in 82 family-independent FIP cases, 96 unrelated IPF cases, and 54 spouse controls, and resequencing MUC5AC in 69 family-independent FIP cases, 96 unrelated IPF cases, and 54 spouse controls, identified 209 sequence variations in MUC2 and
197 sequence variations in MUC5AC (Tables 5 and 6 respectively). 7 SNPs of MUC2 and 32
SNPs of MUC5 AC (Table 1) resulted in P values < 0.05 in either FIP or IPF cases compared to controls. While only 1 of the 7 SNPs in MUC2 was significant in both FIP and IPF, 11 of the 32 SNPs in MUC5AC were significant in both FIP and IPF. To screen for large insertions or deletions that may be missed by resequencing of MUC '5 AC, we performed Southern blots on DNA samples from representative study subjects with FIP (N=20), IPF (N=20), and control subjects (N=20) and found no evidence of insertions or deletions.
Table 5: MUC2 variants identified by re-sequencing.
Figure imgf000049_0001
Figure imgf000050_0001
Figure imgf000051_0001
O
Figure imgf000052_0001
Figure imgf000053_0001
K>
Figure imgf000054_0001
Ul
Figure imgf000055_0001
Figure imgf000056_0001
Ul Ul
Figure imgf000057_0001
Based on genome build NCBI 36.1
Ul
Figure imgf000058_0001
Ul
Figure imgf000059_0001
Ul
06
Figure imgf000060_0001
Figure imgf000061_0001
o
Figure imgf000062_0001
c
Figure imgf000063_0001
\
Figure imgf000063_0002
Figure imgf000064_0001
Figure imgf000065_0001
Based on genome build NCBI 36.1
To test the validity of these findings, the 7 most promising MUC5AC SNPs (5 with p- values < 0.05 in both the FIP and IPF case/control analyses, and 2 non-synonymous MUC5AC SNPs that were significant (P < 0.05) in the FIP analysis) were selected for genotyping in separate validation study populations consisting of 88 family-independent FIP cases, 136 cases of IPF, and 54 spouse controls (Table 2). Of 7 SNPs tested, two non- synonymous SNPs (Ala497Val and Ala4729Lys) replicated statistically and the 5 remaining SNPs demonstrated consistent odds ratios in the validation cohorts (Table 2), suggesting that common SNPs in MUC 5 AC are associated with the risk of developing both FIP and IPF. Furthermore, among all cases (FIP [N=157] and IPF [N=232]) and controls (N=108) an association was observed in both FIP (OR=3.3; 95% CI= 1.1 -9.5; p-value=0.0006) and IPF (OR=2.9; 95% CI=I .0-8.0; p-value=0.003) for a haplotype containing 3 SNPs; the synonymous MUC2 SNP (Pro4205Pro) and the replicating non-synonymous MUC5AC SNPs (Ala497Val and Ala4729Lys). This haplotype is present in 12.7% of the cases of FIP, 11.0% of the cases of IPF, and only 4.2% of controls. The 2 non-synonomous MUC5AC SNPs, Ala497Val and Ala4729Lys, are also in strong LD with one another, with r2=0.94 in FIP cases, r2=0.98 in IPF cases, and r2=l .O in the controls.
To explore the potential role of MUC5AC in lung fibrosis, we generated genetically engineered Muc5ac deficient mice (Figure 5A-E). Mice were backcrossed to C57BL/6 for four generations. Using a murine model of bleomycin-induced fibrosis, we compared the responses of Muc :5 ac deficient mice to heterozygous littermates and founder strains. Mucδac deficient mice, their heterozygous littermates, and mice from the founder strains C57BL/6 and 129Sv were exposed to 4U/kg bleomycin by intratracheal instillation and sacrificed after 21 days. Muc5ac deficient mice showed more connective tissue deposition by Masson- trichrome staining (Figure 5F) and significantly more lung collagen by sircoll assay (Figure 5G) compared to either of the wild type founder strains or the Mucδac heterozygous littermates. Furthermore, mucous metaplasia, as determined by Muc5ac protein expression, was detected by immunohistochemistry in bronchial cells of C57BL/6 mice following bleomycin instillation (Figure IH). In contrast, there was negligible Muc5ac staining in bronchial or alveolar tissue of saline-treated C57BL/6 mice (Figure 5H). Consistent with these pathological findings, mucociliary clearance function was reduced in Muc5ac deficient mice following bleomycin challenge. Seven days post exposure (during mucous metaplasia but prior to bleomycin-induced fibrosis), clearance of inhaled particles < 3 μm was dramatically reduced in MucSac deficient mice (Figure 6A-B). Furthermore, we found that MucSac deficient mice had significantly more apoptotic bronchial epithelial cells and apoptotic cells in fibrotic regions of the lung than heterozygotes (Muc5ac-/+) (Figure 6C-F). These data suggest that in the setting of lung injury, defects in mucociliary clearance or enhanced apoptosis in MucSac deficient animals may adversely impact clearance of inhaled particles, possibly precipitating a cycle of lung injury. Because a single MUC2 SNP (Pro4205Pro) was found to be significantly associated with both FIP and IPF in our human cohort, we also examined the effect of Muc2 on bleomycin-induced pulmonary fibrosis. In contrast to our findings with Muc5ac, C51ELI6Muc2~'~ mice (Velcich, A., et al Colorectal cancer in mice genetically deficient in the mucin Muc2. Science 295, 1726-1729, 2002) receiving bleomycin by intratracheal instillation had no difference in lung collagen content or histological fibrosis grading compared to heterozygous littermates or C57BL/6 mice (data not presented).
Since MUCSAC is a major mucin gene in human airway epithelia (Rose, M. C. & Voynow, J. A. Respiratory tract mucin genes and mucin glycoproteins in health and disease. Physiol Rev 86, 245-278, 2006), we used an shRNA targeted to MUC 5AC in a human airway epithelial cell line (NCI H-292) to examine the effect of MUCSAC silencing on WNT and TGF-β/BMP signaling pathways, as both pathways have previously been shown to be involved in lung fibroproliferation. Treatment of cells with MUCSAC shRNA reduced expression of MUC5AC mRNA by approximately 75%. PCR arrays specific for the WNT and TGF-β/BMP signaling pathways showed enhanced expression of several profibrotic mediators over cells transfected with control plasmid. For the WNT signaling pathway, fibroblast growth factor 4 (FGF4) was enhanced ten-fold, the transcription factor FOXNl was increased fifteen-fold, and secreted frizzled-related protein (SFRP4), a WNT binding antagonist shown to be upregulated in IIP (Yang, I.V., et al. Gene expression profiling of familial and sporadic interstitial pneumonia. Am J Respir Crit Care Med 175, 45-54, 2007) and systemic sclerosis (Bayle, J., et al. Increased expression of Wnt2 and SFRP4 in Tsk mouse skin: role of Wnt signaling in altered dermal fibrillin deposition and systemic sclerosis. J Invest Dermatol 128, 871-881, 2008), was increased over fifty-fold in NCI H-292 cells treated with MUC5AC shRNA compared to control shRNA. While the ligands WNT2 and WNT8A were induced in the cells when MUC5AC was knocked down, WNTl expression was decreased significantly. For the TGF-β/BMP signaling pathway, TGF-βl message was enhanced over five-fold and TGF-βl receptor was enhanced approximately two-fold in NCI H-292 cells treated with MUC5AC shRNA compared to control shRNA. Interestingly, a number of other TGF-βl related genes were also enhanced significantly, including a six-fold increase in both BMP binding endothelial regulator (BMPER), a molecule that regulates signaling of the TGF-β superfamily, and cyclin-dependent kinase inhibitor 2B (CDKN2B), a SMAD pathway target gene. However, the most impressive increase in expression occurred with goosecoid (GSC), a SMAD pathway target gene associated with cell differentiation (Izzi, L., et αl. Foxhl recruits Gsc to negatively regulate Mixll expression during early mouse development. EMBO J 26, 3132-3143, 2007) and migration (Niehrs, C, Keller, R., Cho, K. W. & De Robertis, E. M. The homeobox gene goosecoid controls cell migration in Xenopus embryos. Cell 72, 491-503, 1993), which was enhanced ninety-fold in NCI H-292 cells treated with MUC5AC shRNA. GSC is an embryonic transcription factor that has recently been shown to promote tumor metastasis (Hartwell, K.A., et αl. The Spemann organizer gene, Goosecoid, promotes tumor metastasis. Proc Nαtl Acαd Sci U S A 103, 18969-18974, 2006) and our results suggest that this transcription factor may also play a role in lung fibrosis. Thus, reduced expression of MUC5AC in human airway epithelial cells resulted in enhanced expression of WNT and TGF-β/BMP signaling genes that have been shown to be involved in lung fibroproliferation.
To determine the expression and distribution of MUC5 AC glycoprotein in FIP (usual interstitial pneumonia (UIP) pathology) and other IIPs (IPFAJIP, fibrotic nonspecific interstitial pneumonia (NSIP), and cryptogenic organizing pneumonia (COP)), we performed immunohistochemistry on lung tissue obtained from a broad range of patients with FIP and IIP. Tissue sections stained for MUC5AC distribution in both the normal and diseased lung show strong specific cytoplasmic staining within secretory columnar cells of the bronchi and larger proximal bronchioles. As the bronchiolar diameter decreases in the distal airways, there are fewer MUC5AC-positive secretory cells. Distal bronchioles less than 200 urn in diameter are negative. Fibrotic interstitial processes, including FIP/UIP, IPFAJIP, and fibrotic NSIP, show collections of MUC5AC-positive extracellular mucus within fibrotic airspaces, despite their peripheral distribution. While some of the MUC5 AC-positive mucus may have been secreted from proximal airways, there is also MUC5AC-positive staining in patches of metaplastic epithelium lining honeycomb cysts. The airspace MUC5AC- containing mucus permeates the regions of fibrosis and has physical contact with the pneumocyte epithelium, denuded regions secondary to injury, and the underlying stromal tissue. We also stained fibrotic and non-fibrotic human lungs for MUC2, but did not indentify significant staining.
DISCUSSION
Our findings highlight three potential roles for MUC5AC in the development of FIP and IPF. The most straightforward possibility is that polymorphisms we have identified in MUC5AC decrease mucosal host defense by impairing mucociliary clearance. In fact, the exon 12 SNP (Ala497Val) lies within one of two von Willebrand factor D-like domains in MUC5AC that is believed to play a key role in mucin oligomerization and higher order structure of mucus (Thornton, D.J. & Sheehan, J.K. From mucins to mucus: toward a more coherent understanding of this essential barrier. Proc Am Thorac Soc 1, 54-61, 2004; Thornton, D.J., Rousseau, K. & McGuckin, M. A. Structure and function of the polymeric mucins in airways mucus. Annu Rev Physiol 70, 459-486, 2008; Voynow, J.A., Gendler, SJ. & Rose, M.C. Regulation of mucin genes in chronic inflammatory airway diseases. Am J Respir Cell MoI Biol 34, 661-665, 2006; Williams, O. W., Sharafkhaneh, A., Kim, V., Dickey, B.F. & Evans, CM. Airway mucus: From production to secretion. Am J Respir Cell MoI Biol 34, 527-536, 2006). This suggests that this polymorphism may limit the formation of the mucus gel layer, its interaction with either the surfactant layer at the air-liquid interface or macromolecular components of the periciliary layer. The hypothesis that polymorphisms in MUC5AC result in impaired host defense and, consequently, increased susceptibility to IIP, is supported by our observation of impaired mucociliary clearance in Muc5ac deficient mice treated with bleomycin, and by the fact that MUC5AC is hyper-secreted in a number of lung diseases (Williams, O. W., Sharafkhaneh, A., Kim, V., Dickey, B.F. & Evans, CM. Airway mucus: From production to secretion. Am J Respir Cell MoI Biol 34, 527-536, 2006; Young, H. W., et al. Central role of Muc5ac expression in mucous metaplasia and its regulation by conserved 5' elements. Am J Respir Cell MoI Biol 37, 273-290, 2007). Alternatively, novel MUC5AC polymorphisms we observed more frequently in patients with FIP and IPF may affect other basic biological processes that enhance fibroproliferation. While others have shown that the expression of MUC5AC is regulated by TGF-α, EGF, IL-13, STAT6, HIF-I, and the p42/44 MAPK pathway, our findings indicate that MUC5AC expression affects expression of genes in the WNT and TGF-β/BMP signaling pathways involved in lung fibroproliferation, suggesting that MUC 5 AC may specifically regulate critical genes involved in the development of FIP and fibrotic forms of IIP. The specific mechanism(s) by which polymorphisms in a secreted gel mucin, such as MUC5AC, may enhance the fibroproliferative response remain unclear. Perhaps the most intriguing hypothesis for the role of MUC5AC in fibrotic lung disease is based on our observation that absence of MUC5AC enhances apoptosis, particularly in areas of fibroproliferation. In fibrosing IIPs, there is a loss of type I alveolar epithelia, and although type II alveolar epithelia proliferate, the type II alveolar epithelia do not re-epithelialize the alveolus. While this may be caused by aberrant WNT signaling with inhibition of β-catenin, it is also possible that failure of re- epithelialization is a result of accelerated apoptosis in the distal lung. In fact, the other genetic mutations associated with FIP and IPF (surfactant protein C and the telomerase genes are known to enhance apoptosis. Specifically, reduced expression of telomerase or rare loss of function mutations in telomerase genes results in shortened telomeres that markedly diminishes the half-life of stem cells (Hao, L. Y., et al. Short telomeres, even in the presence of telomerase, limit tissue renewal capacity. Cell 123, 1121-1131, 2005) and are reported to be associated with fibrosing IIP. Thus, our observation that reduced expression of MUC5AC also enhances apoptosis provides further support for the hypothesis that regulation of apoptosis is critical in the development of this progressive disease that occurs more commonly in those over 60 years of age. It is also possible that these hypotheses are not mutually exclusive, but represent complementary processes that underlie the pathogenesis of IIP in individuals harboring the non-synonymous MUC5AC polymorphisms we report.
In summary, we have discovered common variants of MUC5AC that are more frequently observed in individuals with FIP and IPF. Two common non-synonymous variants of MUC '5 AC, Ala497Val and Ala4729Lys in strong LD with one another, were more frequently observed in individuals with FIP and IPF than in control individuals and likely represent a risk allele for the development of FIP and IPF. In support of the role of MUC5AC as a candidate disease gene for development of FIP and IPF, we found evidence of enhanced bleomycin-induced fibrosis in MucSac deficient mice that was associated with loss of mucociliary clearance function and enhanced apoptosis. In addition, we demonstrated that MUC 5 AC regulates expression of numerous genes in the WNT and TGF-β/BMP signaling pathways in human airway epithelial cells, and found that MUC5AC is expressed at the site of fibroproliferation in humans with fibrotic forms of either FIP or IIP. These findings implicate MUC5AC in the development of both FIP and IPF, and suggest that MUC5AC may be fundamental to disease pathogenesis, biomarker development, and treatment of the IIPs. EXAMPLE 3: Clinical Uses of Mucin Variants
To perform a diagnostic test for the presence or absence of a polymorphism in a mucin-encoding sequence (e.g., MUC5AC) of an individual, a suitable genomic DNA- containing sample from a subject is obtained and the DNA extracted using conventional techniques. For instance, a blood sample, a buccal swab, a hair follicle preparation, or a nasal aspirate is used as a source of cells to provide the DNA sample; similarly, a surgical specimen, biopsy, or other biological sample containing genomic DNA could be used. It is particularly contemplated that tumor biopsies or tumor DNA found in plasma or other blood products can serve as a source. The extracted DNA is then subjected to amplification, for example according to standard procedures. The allele of the single base-pair mutation is determined by conventional methods including manual and automated fluorescent DNA sequencing, primer extension methods (Nikiforov, et al., Nucl Acids Res. 22:4167-4175, 1994), oligonucleotide ligation assay (OLA) (Nickerson et al. , Proc. Natl. Acad. ScL USA 87:8923-8927, 1990), allele-specific PCR methods (Rust et al., Nucl. Acids Res. 6:3623- 3629, 1993), RNase mismatch cleavage, single strand conformation polymorphism (SSCP), denaturing gradient gel electrophoresis (DGGE), Taq-Man™, oligonucleotide hybridization, and the like. Also, see the following U.S. Patents for descriptions of methods or applications of polymorphism analysis to disease prediction and/or diagnosis: 4,666,828 (RFLP for Huntington's); 4,801,531 (prediction of atherosclerosis); 5,110,920 (HLA typing); 5,268,267 (prediction of small cell carcinoma); and 5,387,506 (prediction of dysautonomia).
Combinations of the SNPs identified in MUC5AC SNP (Tables 1, 2, and 5) could be used identify individuals at risk for pulmonary fibrosis either in families with at least one case of pulmonary fibrosis or even in the general population. In addition to these particular variations in MUC5AC, other variations and mutations of these genes can be detected that may be associated with variable predisposition to development of pulmonary fibrosis or likelihood of having pulmonary fibrosis, and used in combination with the disclosed mucin SNPs, to predict the probability that a subject will develop pulmonary fibrosis or another disease involving fibrosis of the lung parenchyma or small airways. SNPs in MUC5AC could also be used for purposes of early diagnosis. Patients with
IIP are often diagnosed very late in the course of their disease. In fact, most patients with IIP are diagnosed 3-5 years prior to their death from this crippling disease. However, respiratory symptoms arise much earlier and in these subjects, SNPs in MUC5AC could be used to identify individuals that need more aggressive testing and follow up because of their higher risk of IIP.
The SNPs of the present disclosure can ultimately be utilized for the development of personalized treatment for this disease. The value of identifying individuals who carry a susceptible allele of mucin (i.e., individuals who are heterozygous or homozygous for an allele that contains the MUC5AC polymorphisms listed above; or any combination thereof; or another sequence variation in one or proximal to one of the variable regions indicated herein) is that these individuals could then initiate customized therapies (such as specific drug therapies that replace or supplement the function of the variant mucin), or undergo more aggressive treatment of the condition, and thereby beneficially alter its course.
EXAMPLE 4: Variant Gene Probes and Markers
Sequences surrounding and overlapping single base-pair mutations and deletions and insertions in a mucin gene (e.g., MUC5AC) can be useful for a number of gene mapping, targeting, and detection procedures. For example, genetic probes can be readily prepared for hybridization and detection of the SNPs identified in MUC5AC. As will be appreciated, probe sequences may be greater than about 8 or more oligonucleotides in length and possess sufficient complementarity to distinguish between the variant sequence and the reference, for instance, between the A (at position chrl 1 :1144294 in the MUC5AC susceptible allele) and corresponding G (in the reference allele). Similarly, sequences surrounding and overlapping any of the specifically disclosed SNPs (or other polymorphisms or mutations found in accordance with the present teachings, including those encompassed in or proximal to the variable regions), or longer sequences, can be utilized in allele specific hybridization procedures. A similar approach can be adopted to detect other mucin sequence variations. Sequences surrounding and overlapping a mucin variation, or any portion or subset thereof that allows one to identify the variant, are highly useful. Thus, another embodiment provides a genetic marker predictive of the one or more of AADDOl 112371.1 1 1838, rs34474233, and rs34815853 of 3VIUC5AC, comprising a partial sequence of the human genome including at least about 10 contiguous nucleotide residues such as those shown and discussed herein, and sequences complementary therewith.
EXAMPLE 5: Detecting Single Nucleotide Alterations
Single nucleotide alterations, whether categorized as SNPs or new mutations, can be detected by a variety of techniques in addition to merely sequencing the target sequence. Constitutional single nucleotide alterations can arise either from new germline mutations, or can be inherited from a parent who possesses a SNP or mutation in their own germline DNA. The techniques used in evaluating either somatic or germline single nucleotide alterations include hybridization using allele specific oligonucleotides (ASOs) (Wallace et al, CSHL Symp. Quant. Biol. 51 :257-261, 1986; Stoneking et al., Am. J. Hum. Genet. 48:370-382, 1991), direct DNA sequencing (Church and Gilbert, Proc. Natl. Acad. Sci. USA 81 : 1991-1995, 1988), the use of restriction enzymes (Flavell et al, Cell 15:25, 1978; Geever et al., 1981), discrimination on the basis of electrophoretic mobility in gels with denaturing reagent (Myers and Maniatis, Cold Spring Harbor Symp. Quant. Biol. 51 :275-284, 1986), RNase protection (Myers et al, Science 230:1242, 1985), chemical cleavage (Cotton et al, Proc. Natl Acad. Sci. USA 85:4397-4401, 1985), and the ligase-mediated detection procedure (Landegren et al, Science 241 :1077, 1988).
Allele-specifϊc oligonucleotide hybridization (ASOH) involves hybridization of probes to the sequence, stringent washing, and signal detection. Other new methods include techniques that incorporate more robust scoring of hybridization. Examples of these procedures include the ligation chain reaction (ASOH plus selective ligation and amplification), as disclosed in Wu and Wallace {Genomics 4:560-569, 1989); mini- sequencing (ASOH plus a single base extension) as discussed in Syvanen (Meth. MoI. Biol. 98:291-298, 1998); and the use of DNA chips (miniaturized ASOH with multiple oligonucleotide arrays) as disclosed in Lipshutz et al. {BioTechniques 19:442-447, 1995). Alternatively, ASOH with single- or dual-labeled probes can be merged with PCR, as in the 5'-exonuclease assay (Heid et al, Genome Res. 6:986-994, 1996), or with molecular beacons (as in Tyagi and Kramer, Nat. Biotechnol. 14:303-308, 1996).
Another technique is dynamic allele-specific hybridization (DASH), which involves dynamic heating and coincident monitoring of DNA denaturation, as disclosed by Howell et al {Nat. Biotech. 17:87-88, 1999). A target sequence is amplified by PCR in which one primer is biotinylated. The biotinylated product strand is bound to a streptavidin-coated microtiter plate well, and the non-biotinylated strand is rinsed away with alkali wash solution. An oligonucleotide probe, specific for one allele, is hybridized to the target at low temperature. This probe forms a duplex DNA region that interacts with a double strand- specific intercalating dye. When subsequently excited, the dye emits fluorescence proportional to the amount of double-stranded DNA (probe-target duplex) present. The sample is then steadily heated while fluorescence is continually monitored. A rapid fall in fluorescence indicates the denaturing temperature of the probe-target duplex. Using this technique, a single-base mismatch between the probe and target results in a significant lowering of melting temperature (Tm) can be readily detected.
Oligonucleotides specific to normal or allelic sequences can be chemically synthesized using commercially available machines. These oligonucleotides can then be labeled radioactively with isotopes (such as 32P) or non-radioactively, with tags such as biotin (Ward and Langer et al, Proc. Natl. Acad. Sci. USA 78:6633-6657, 1981), and hybridized to individual DNA samples immobilized on membranes or other solid supports by dot-blot or transfer from gels after electrophoresis. These specific sequences are visualized by methods such as autoradiography or fluorometric (Landegren et al, Science 242:229-237, 1989) or colorimetric reactions (Gebeyehu et al, Nucleic Acids Res. 15:4513-4534, 1987). Using an ASO specific for a normal allele, the absence of hybridization would indicate a mutation in the particular region of the gene, or a deleted gene. In contrast, if an ASO specific for a mutant allele hybridizes to a sample then that would indicate the presence of a mutation in the region defined by the ASO. A variety of other techniques can be used to detect the mutations or other variations in
DNA. Merely by way of example, see U.S. Patents No. 4,666,828; 4,801,531; 5,110,920; 5,268,267; 5,387,506; 5,691,153; 5,698,339; 5,736,330; 5,834,200; 5,922,542; and 5,998,137 for such methods. Additional methods include fluorescence polarization methods such as those developed by Pui Kwok and colleagues (see, e.g., Kwok, Hum. Mutat., 19(4):315-23, 2002), microbead methods such as those developed by Mark Chee at Illumina (see, e.g., Oliphant et al., Biotechniques . 2002 Jun;Suppl:56-8, 60-61, Shen et al, Genet. Eng. News, 23(6), 2003), and mass spectrophotometery methods such as those being developed at Sequenom (see, e.g., Jurinke et al., Methods MoI Biol. 187:179-92, 2002; Amexis et al., Proc Natl AcadSci USA 98(21):12097-102, 2001 ; Jurinke et al., Adv Biochem Eng Biotechnol. 2002;77:57-74; Storm et al, Meth. MoI. Biol, 212:241 262, 2002; Rodi et al,
BioTechniques, 32:S62 S69, 2002; U.S. Patent No. 6,300,076; and WO 9820166).
EXAMPLE 6: Differentiation of Individuals Homozygous versus Heterozygous for SNP(s)
Since it is believed that the haplotype of MUC5 AC can influence the pulmonary fibrosis susceptibility of a subject, it may sometimes be beneficial to determine whether a subject is homozygous or heterozygous for SNPs within MUC5AC or one of the other genes or ESTs in the mapped region described herein.
By way of example, the oligonucleotide ligation assay (OLA), as described at Nickerson et al. {Proc. Natl. Acad. Sci. USA 87:8923-8927, 1990), allows the differentiation between individuals who are homozygous versus heterozygous for alleles indicated herein. This feature allows one to rapidly and easily determine whether an individual is homozygous for at least one tyrosine kinase activating mutation, which condition is linked to a relatively high predisposition to developing neoplastic disease and/or an increased likelihood of having a tumor. Alternatively, OLA can be used to determine whether a subject is homozygous for either of these mutations.
As an example of the OLA assay, when carried out in microtiter plates, one well is used for the determination of the presence of the major allele in the MUC5AC gene that contains a G at nucleotide position chrl 1 : 1144294 (numbering from Human Genome Build 36) and a second well is used for the determination of the presence of the minor allele in the same gene that contains an A at that nucleotide position in the alternate allele sequence. Thus, the results for an individual who is heterozygous for the mutation will show a signal in each of the G and A wells.
EXAMPLE 7: Protein-Based Diagnosis
An alternative method of diagnosing mucin variation, gene amplification, or deletion as well as abnormal mucin {e.g., MUC5AC) expression, is to quantitate the level of mucin protein in an individual. Such evaluations can be performed, for example, in lysates prepared from cells, in fresh or frozen cells, in cells that have been smeared or touched on glass slides and then either fixed and/or dried, or in cells that have been fixed, embedded {e.g., in paraffin), and then prepared as histological sections on glass slides. In addition, since mucins (including particularly MUC5AC) are secreted from the cells, the presence and amount of mucins (and the proportions which make up a mixed mucin sample) can be evaluated in samples taken from mucus membranes, including but not limited to fluids of the oropharyngeal tract, such as sputum. Such samples may be taken from, for instance, bronchoalveolar lavage (BAL), sputum, and induced sputum samples. Techniques for acquisition of such samples are well known in the art. Oropharyngeal tract fluids can be acquired through conventional techniques, including sputum induction, bronchoalveolar lavage (BAL), and oral washing. Obtaining a sample from oral washing involves having the subject gargle with an amount normal saline for about 10-30 seconds and then expectorate the wash into a sample cup.
This diagnostic tool would also be useful for detecting reduced levels of the mucin protein that result from, for example, mutations in the promoter regions of the MUC5AC gene or mutations within the coding region of the gene that produced truncated, non- functional or unstable polypeptides, as well as from deletions of a portion of or the entire respective mucin gene. Alternatively, amplification of a mucin-encoding sequence may be detected as an increase in the expression level of mucin protein. Such an increase in protein expression may also be a result of an up-regulating mutation in the promoter region or other regulatory or coding sequence within the mucin gene, or by virtue of a point mutation within the coding sequence, which protects the mucin protein from degradation.
Localization and/or coordination of MUC5AC expression (temporally or spatially) can also be examined using known techniques, such as isolation and comparison of mucin from collected fractions, including specific mucus membranes, or from specific cell or tissue types, or at specific time points after an experimental manipulation. Demonstration of reduced or increased mucin protein levels (or a change in the proportion of one mucin to another, e.g., of MUC5AC to any other mucin), in comparison to such expression in a control cell (e.g., normal, as in taken from a subject not suffering from a fibrotic disease, such as pulmonary fibrosis), would be an alternative or supplemental approach to the direct determination of mucin gene deletion, amplification or mutation status by the methods outlined above and equivalents.
The availability of antibodies specific to a mucin protein will facilitate the detection and quantitation of mucin by one of a number of immunoassay methods which are well known in the art and are presented in Harlow and Lane (Antibodies, A Laboratory Manual, CSHL, New York, 1988). Methods of constructing such antibodies are discussed above, and are well known in the art.
Any standard immunoassay format (e.g., ELISA, western blot, or RIA assay) can be used to measure mucin polypeptide or protein levels, and to compare these with expression levels in control, reference, cell populations. Altered mucin (e.g., MUC5AC) polypeptide expression may be indicative of an abnormal biological condition related to fibrosis, in particular pulmonary fibrosis, and/or a predilection to development of pulmonary fibrosis. Immunohistochemical techniques may also be utilized for mucin polypeptide or protein detection. For example, a tissue sample or swab or swipe may be obtained from a subject, and a section or portion thereof stained for the presence of mucin (or a particular mucin) using a specific binding agent (e.g., anti-mucin antibody) and any standard detection system (e.g., one which includes a secondary antibody conjugated to horseradish peroxidase). General guidance regarding such techniques can be found in, e.g., Bancroft and Stevens (Theory and Practice of Histological Techniques, Churchill Livingstone, 1982) and Ausubel et al. (Current Protocols in Molecular Biology, John Wiley & Sons, New York, 1998). For the purposes of quantitating a mucin protein (such as MUC5AC), a biological sample of the subject (for instance a mouse or a human), which sample includes cellular or secreted proteins, is required. In particular embodiments, biological samples may be obtained from sputum, bronchoavleolar lavage fluid, a lung biopsy specimen, exhaled breath, possibly glycosylated products of mucin that might be present in the serum, and so forth.
Quantitation of mucin protein can be achieved by immunoassay and compared to levels of the protein found in control cells (e.g., healthy, non-neoplastic cells of the same lineage or type as those under evaluation, or from a patient known not to have a neoplastic disease). A significant (e.g., 10% or greater) reduction in the amount of a mucin protein in the cells or mucus sample of a subject compared to the amount of that mucin protein found in a comparative normal sample could be taken as an indication that the subject may have deletions or mutations in the respective mucin gene, whereas a significant (e.g., 10% or greater) increase would indicate that a duplication (amplification), or mutation that increases the stability of the mucin protein or mRNA, may have occurred. Deletion, mutation, and/or amplification within a mucin encoding sequence, and substantial under- or over-expression of mucin protein (such as MUC5AC), may be indicative of fibrotic disease (such as pulmonary fibrosis) and/or a predilection to develop fibrosis.
EXAMPLE 8: Expression of MUC 5AC or Other Protein Variant Polypeptides, or a Reporter Polypeptide under Control of a Variant Regulatory Sequence
The expression and purification of proteins, such as a mucin variant protein, can be performed using standard laboratory techniques, though techniques are preferentially adapted to be fitted to express the mucin proteins. Examples of such method adaptations are discussed or referenced herein. After expression, purified protein may be used for functional analyses, antibody production, diagnostics, and patient therapy. Furthermore, the DNA sequences of the mucin variant cDNAs and regulatory regions, or gene or EST sequences contained within the genomic region described herein, can be manipulated in studies to understand the expression of the gene and the function of its product. Variant or allelic forms of a human MUC5AC gene, including regulatory regions upstream or downstream of the encoding sequence, may be isolated based upon information contained herein, and may be studied in order to detect alteration in expression patterns in terms of relative quantities, tissue specificity and functional properties of the encoded mucin variant protein (e.g., influence on mucus production, formation or resistance to pulmonary fibrosis, and so forth). Partial or full-length cDNA sequences, which encode for the subject protein, may be ligated into bacterial expression vectors. Methods for expressing large amounts of protein from a cloned gene introduced into Escherichia coli (E. coli) or more preferably baculovirus/Sf9 cells may be utilized for the purification, localization and functional analysis of proteins. For example, fusion proteins consisting of amino terminal peptides encoded by a portion of a gene native to the cell in which the protein is expressed (e.g., an E. coli lacZ or trpE gene for bacterial expression) linked to a variant protein may be used to prepare polyclonal and monoclonal antibodies against these proteins. Thereafter, these antibodies may be used to purify proteins by immunoaffinity chromatography, in diagnostic assays to quantitate the levels of protein and to localize proteins in tissues and individual cells by immunofluorescence.
Intact native protein may also be produced in large amounts for functional studies. Methods and plasmid vectors for producing fusion proteins and intact native proteins in culture are well known in the art, and specific methods are described in Sambrook et al. (In Molecular Cloning: A Laboratory Manual, Ch. 17, CSHL, New York, 1989). Such fusion proteins may be made in large amounts, are easy to purify, and can be used to elicit antibody response. Native proteins can be produced in bacteria by placing a strong, regulated promoter and an efficient ribosome-binding site upstream of the cloned gene. If low levels of protein are produced, additional steps may be taken to increase protein production; if high levels of protein are produced, purification is relatively easy. Suitable methods are presented in Sambrook et al. (In Molecular Cloning: A Laboratory Manual, CSHL, New York, 1989) and are well known in the art. Often, proteins expressed at high levels are found in insoluble inclusion bodies. Methods for extracting proteins from these aggregates are described by Sambrook et al. (In Molecular Cloning: A Laboratory Manual, Ch. 17, CSHL, New York, 1989). Vector systems suitable for the expression oϊlacZ fusion genes include the pUR series of vectors (Ruther and Muller-Hill, EMBO J. 2: 1791, 1983), pEXl-3 (Stanley and Luzio, EMBO J. 3: 1429, 1984) and pMRlOO (Gray et al, Proc. Natl. Acad. Sci. USA 79:6598, 1982). Vectors suitable for the production of intact native proteins include pKC30 (Shimatake and Rosenberg, Nature 292: 128, 1981), pKK 177-3 (Amann and Brosius, Gene 40: 183, 1985) and pET-3 (Studiar and Moffatt, J. MoI. Biol. 189: 113, 1986).
Fusion proteins may be isolated from protein gels, lyophilized, ground into a powder and used as an antigen. The DNA sequence can also be transferred from its existing context to other cloning vehicles, such as other plasmids, bacteriophages, cosmids, animal viruses and yeast artificial chromosomes (YACs) (Burke et al, Science 236:806-812, 1987). These vectors may then be introduced into a variety of hosts including somatic cells, and simple or complex organisms, such as bacteria, fungi (Timberlake and Marshall, Science 244: 1313-1317, 1989), invertebrates, plants (Gasser and Fraley, Science 244: 1293, 1989), and animals (Pursel et al, Science 244: 1281-1288, 1989), which cell or organisms are rendered transgenic by the introduction of the heterologous cDNA.
For expression in mammalian cells, the cDNA sequence may be ligated to heterologous promoters, such as the simian virus (SV) 40 promoter in the pSV2 vector (Mulligan and Berg, Proc. Natl. Acad. Sci. USA 78:2072-2076, 1981), and introduced into cells, such as monkey COS-I cells (Gluzman, Cell 23:175-182, 1981), to achieve transient or long-term expression. The stable integration of the chimeric gene construct may be maintained in mammalian cells by biochemical selection, such as neomycin (Southern and Berg, J. MoI. Appl. Genet. 1 :327-341, 1982) and mycophenolic acid (Mulligan and Berg, Proc. Natl. Acad. Sci. USA 78:2072-2076, 1981).
DNA sequences can be manipulated with standard procedures such as restriction enzyme digestion, fill-in with DNA polymerase, deletion by exonuclease, extension by terminal deoxynucleotide transferase, ligation of synthetic or cloned DNA sequences, site- directed sequence-alteration via single-stranded bacteriophage intermediate or with the use of specific oligonucleotides in combination with PCR or other in vitro amplification.
The cDNA sequence (or portions derived from it) or a mini gene (a cDNA with an intron and its own promoter) may be introduced into eukaryotic expression vectors by conventional techniques. These vectors are designed to permit the transcription of the cDNA in eukaryotic cells by providing regulatory sequences that initiate and enhance the transcription of the cDNA and ensure its proper splicing and polyadenylation. Vectors containing the promoter and enhancer regions of the S V40 or long terminal repeat (LTR) of the Rous Sarcoma virus and polyadenylation and splicing signal from SV40 are readily available (Mulligan et al, Proc. Natl. Acad. Sci. USA 78: 1078-2076, 1981 ; Gorman et al, Proc. Natl. Acad. Sci. USA 78:6777-6781, 1982). The level of expression of the cDNA can be manipulated with this type of vector, either by using promoters that have different activities (for example, the baculo virus pAC373 can express cDNAs at high levels in S. frugiperda cells (Summers and Smith, In Genetically Altered Viruses and the
Environment, Fields et al. (Eds.) 22:319-328, CSHL Press, Cold Spring Harbor, New York, 1985) or by using vectors that contain promoters amenable to modulation, for example, the glucocorticoid-responsive promoter from the mouse mammary tumor virus (Lee et al, Nature 294:228, 1982). The expression of the cDNA can be monitored in the recipient cells 24 to 72 hours after introduction (transient expression).
In addition, some vectors contain selectable markers such as the gpt (Mulligan and Berg, Proc. Natl. Acad. ScL USA 78:2072-2076, 1981) or neo (Southern and Berg. J. MoI Appl. Genet. 1 :327-341, 1982) bacterial genes. These selectable markers permit selection of transfected cells that exhibit stable, long-term expression of the vectors (and therefore the cDNA). The vectors can be maintained in the cells as episomal, freely replicating entities by using regulatory elements of viruses such as papilloma (Sarver et al, MoI. Cell Biol. 1 :486, 1981) or Epstein-Barr (Sugden et al, MoI Cell Biol. 5:410, 1985). Alternatively, one can also produce cell lines that have integrated the vector into genomic DNA. Both of these types of cell lines produce the gene product on a continuous basis. One can also produce cell lines that have amplified the number of copies of the vector (and therefore of the cDNA as well) to create cell lines that can produce high levels of the gene product (Alt et al, J. Biol. Chem. 253:1357, 1978). The transfer of DNA into eukaryotic, in particular human or other mammalian cells, is now a conventional technique. The vectors are introduced into the recipient cells as pure DNA (transfection) by, for example, precipitation with calcium phosphate (Graham and vander Eb, Virology 52:466, 1973) or strontium phosphate (Brash et al, MoI. Cell Biol. 7:2013, 1987), electroporation (Neumann et al, EMBO J 1 :841, 1982), lipofection (Feigner et al, Proc. Natl. Acad. Sci USA 84:7413, 1987), DEAE dextran (McCuthan et al, J. Natl Cancer Inst. 41 :351, 1968), microinjection (Mueller et al, Cell 15:579, 1978), protoplast fusion (Schafner, Proc. Natl Acad. ScL USA 77:2163-2167, 1980), or pellet guns (Klein et al, Nature 327:70, 1987). Alternatively, the cDNA, or fragments thereof, can be introduced by infection with virus vectors. Systems are developed that use, for example, retroviruses (Bernstein et al, Gen. Engr'g 7:235, 1985), adenoviruses (Ahmad et al, J. Virol. 57:267, 1986), or Herpes virus (Spaete et al, Cell 30:295, 1982). Protein encoding sequences can also be delivered to target cells in vitro via non-infectious systems, for instance liposomes.
Using the above techniques, the expression vectors containing MUC5AC sequence or cDNA (or a sequence or cDNA corresponding to a gene or EST located in the region of 1 lpter described herein), or fragments or variants or mutants thereof, can be introduced into human cells, mammalian cells from other species or non-mammalian cells as desired. The choice of cell is determined by the purpose of the treatment. For example, monkey COS cells (Gluzman, Cell 23: 175-182, 1981) that produce high levels of the SV40 T antigen and permit the replication of vectors containing the SV40 origin of replication may be used. Similarly, Chinese hamster ovary (CHO), mouse NIH 3T3 fibroblasts or human fibroblasts or lymphoblasts may be used.
The present disclosure thus encompasses recombinant vectors that comprise all or part of MUC5AC variant gene or cDNA sequences, or a regulatory sequence thereof, for expression in a suitable host. The DNA is operatively linked in the vector to an expression control sequence in the recombinant DNA molecule so that a polypeptide can be expressed, or the regulatory sequence is operatively linked to a reporter gene. The expression control sequence may be selected from the group consisting of sequences that control the expression of genes of prokaryotic or eukaryotic cells and their viruses and combinations thereof. The expression control sequence may be specifically selected from the group consisting of the lac system, the trp system, the tac system, the trc system, major operator and promoter regions of phage lambda, the control region of fd coat protein, the early and late promoters of SV40, promoters derived from polyoma, adenovirus, retrovirus, baculovirus and simian virus, the promoter for 3-phosphoglycerate kinase, the promoters of yeast acid phosphatase, the promoter of the yeast alpha-mating factors and combinations thereof.
The host cell, which may be transfected with the vector of this disclosure, may be selected from the group consisting of E. coli, Pseudomonas, Bacillus subtilis, Bacillus stearothermophilus or other bacilli; other bacteria; yeast; fungi; insect; mouse or other animal; or plant hosts; or human tissue cells. It is appreciated that for mutant or variant MUC5 AC DNA sequences, similar systems are employed to express and produce the mutant product. In addition, fragments of these proteins can be expressed essentially as detailed above. Such fragments include individual mucin protein domains or sub-domains, as well as shorter fragments such as peptides. Protein fragments having therapeutic properties may be expressed in this manner also, including for instance substantially soluble fragments.
EXAMPLE 9: Production of Protein Specific Binding Agents
Monoclonal or polyclonal antibodies may be produced to either a wildtype or reference protein or specific allelic forms of these proteins, for instance particular portions that contain a differential amino acid encoded by a SNP and therefore may provide a distinguishing epitope, for instance antibodies produced to a mucin protein or peptide. Optimally, antibodies raised (generated) against these proteins or peptides would specifically detect the protein or peptide with which the antibodies are generated. That is, an antibody generated to a specified target protein or a fragment thereof would recognize and bind that protein and would not substantially recognize or bind to other proteins found in target cells, for instance human cells. In some embodiments, an antibody is specific for (or measurably preferentially binds to) an epitope in a variant protein (e.g., an allele ofMUC5AC as described herein) versus the reference protein, or vice versa. The determination that an antibody specifically detects a target protein or form of the target protein is made by any one of a number of standard immunoassay methods; for instance, the western blotting technique (Sambrook et ah, In Molecular Cloning: A Laboratory Manual, CSHL, New York, 1989). To determine that a given antibody preparation (such as one produced in a mouse) specifically detects the target protein by western blotting, total cellular protein is extracted from human cells (for example, lymphocytes) and electrophoresed on a sodium dodecyl sulfate-polyacrylamide gel. The proteins are then transferred to a membrane (for example, nitrocellulose) by western blotting, and the antibody preparation is incubated with the membrane. After washing the membrane to remove non-specifically bound antibodies, the presence of specifically bound antibodies is detected by the use of an anti-mouse antibody conjugated to an enzyme such as alkaline phosphatase. Application of an alkaline phosphatase substrate 5-bromo-4-chloro-3-indolyl phosphate/nitro blue tetrazolium results in the production of a dense blue compound by immunolocalized alkaline phosphatase. Antibodies that specifically detect the target protein will, by this technique, be shown to bind to the target protein band (which will be localized at a given position on the gel determined by its molecular weight). Non-specific binding of the antibody to other proteins may occur and may be detectable as a weak signal on the Western blot. The non-specific nature of this binding will be recognized by one skilled in the art by the weak signal obtained on the Western blot relative to the strong primary signal arising from the specific antibody-target protein binding. Substantially pure mucin protein or protein fragment (peptide) suitable for use as an immunogen may be isolated from the transfected or transformed cells as described above, or using equivalent well known techniques. Concentration of protein or peptide in the final preparation is adjusted, for example, by concentration on an Amicon filter device, to the level of a few micrograms per milliliter. Monoclonal or polyclonal antibody to the protein can then be prepared as follows:
A. Monoclonal Antibody Production by Hybridoma Fusion
Monoclonal antibody to epitopes of the target protein identified and isolated as described can be prepared from murine hybridomas according to the classical method of Kohler and Milstein {Nature 256:495-497, 1975) or derivative methods thereof. Briefly, a mouse is repetitively inoculated with a few micrograms of the selected protein over a period of a few weeks. The mouse is then sacrificed, and the antibody-producing cells of the spleen isolated. The spleen cells are fused by means of polyethylene glycol with mouse myeloma cells, and the excess un-fused cells destroyed by growth of the system on selective media comprising aminopterin (HAT media). The successfully fused cells are diluted and aliquots of the dilution placed in wells of a microtiter plate where growth of the culture is continued. Antibody-producing clones are identified by detection of antibody in the supernatant fluid of the wells by immunoassay procedures, such as ELISA, as originally described by Engvall (Meth. Enzymol. 70:419-439, 1980) and derivative methods thereof. Selected positive clones can be expanded and their monoclonal antibody product harvested for use. Detailed procedures for monoclonal antibody production are described in Harlow and Lane {Antibodies, A Laboratory Manual, CSHL, New York, 1988).
B. Polyclonal Antibody Production by Immunization Polyclonal antiserum containing antibodies to heterogeneous epitopes of a single protein can be prepared by immunizing suitable animals with the expressed protein, which can be unmodified or modified to enhance immunogenicity. Effective polyclonal antibody production is affected by many factors related both to the antigen and the host species. For example, small molecules tend to be less immunogenic than others and may require the use of carriers and adjuvant. Also, host animals vary in response to site of inoculations and dose, with either inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng level) of antigen administered at multiple intradermal sites appear to be most reliable. An effective immunization protocol for rabbits can be found in Vaitukaitis et al. (J. Clin. Endocrinol. Metab. 33:988-991, 1971). Booster injections can be given at regular intervals, and antiserum harvested when antibody titer thereof, as determined semi-quantitatively, for example, by double immunodiffusion in agar against known concentrations of the antigen, begins to fall. See, for example, Ouchterlony et al. (In Handbook of Experimental Immunology, Wier, D. (ed.) chapter 19. Blackwell, 1973). Plateau concentration of antibody is usually in the range of about 0.1 to 0.2 mg/ml of serum (about 12 μM). Affinity of the antisera for the antigen is determined by preparing competitive binding curves, as described, for example, by Fisher {Manual of Clinical Immunology, Ch. 42, 1980). C. Antibodies Raised against Synthetic Peptides
A third approach to raising antibodies against a specific protein or peptide {e.g., a peptide that is specific to a variant mucin such as those described herein) is to use one or more synthetic peptides synthesized on a commercially available peptide synthesizer based upon the predicted amino acid sequence of the protein or peptide. Polyclonal antibodies can be generated by injecting these peptides into, for instance, rabbits or mice.
D. Antibodies Raised by Injection of Encoding Sequence
Antibodies may be raised against proteins and peptides by subcutaneous injection of a DNA vector that expresses the desired protein or peptide, or a fragment thereof, into laboratory animals, such as mice. Delivery of the recombinant vector into the animals may be achieved using a hand-held form of the Biolistic system (Sanford et al, Paniculate Sci. Technol. 5:27-37, 1987) as described by Tang et al. {Nature 356: 152-154, 1992). Expression vectors suitable for this purpose may include those that express a protein-encoding sequence (for instance, a protein encoding a mucin, such as MUC5AC) under the transcriptional control of either the human β-actin promoter or the cytomegalovirus (CMV) promoter.
Antibody preparations prepared according to these protocols are useful in quantitative immunoassays which determine concentrations of antigen-bearing substances in biological samples; they are also used semi-quantitatively or qualitatively to identify the presence of antigen in a biological sample; or for immunolocalization of the specified protein. Optionally, antibodies, e.g., mucin-specifϊc monoclonal antibodies, can be humanized by methods known in the art. Antibodies with a desired binding specificity can be commercially humanized (Scotgene, Scotland, UK; Oxford Molecular, Palo Alto, CA).
E. Antibodies Specific for Specific Protein Variants
Antibodies can be produced that specifically recognize protein variants (and peptides derived therefrom). In particular, production of antibodies (and fragments and engineered versions thereof) that recognize at least one variant protein with a higher affinity than they recognize a corresponding protein is beneficial, as the resultant antibodies can be used in analysis, diagnosis and treatment {e.g., inhibition or enhancement of protein action, such as for instance inhibition or enhancement of a biological activity of MUC5AC), as well as in study and examination of the proteins themselves. In particular embodiments, it is beneficial to generate antibodies from a peptide taken from a variation-specific region of the target protein. By way of example, such regions include any peptide (usually four or more amino acids in length) that overlaps with one or more of SNP-encoded variants in a coding sequence described herein. Longer peptides also can be used, and in some instances will produce a stronger or more reliable immunogenic response. Thus, it is contemplated in some embodiments that more than four amino acids are used to elicit the immune response, for instance, at least 5, at least 6, at least 8, at least 10, at least 12, at least 15, at least 18, at least 20, at least 25, or more, such as 30, 40, 50, or even longer peptides. Also, it will be understood by those of ordinary skill that it is beneficial in some instances to include adjuvants and other immune response enhancers, including passenger peptides or proteins, when using peptides to induce an immune response for production of antibodies.
Embodiments are not limited to antibodies that recognize epitopes containing the actual mutation identified in each variant. Instead, it is contemplated that variant-specific antibodies also may each recognize an epitope located anywhere throughout the specified variant molecule, which epitopes are changed in conformation and/or availability because of the mutation. Antibodies directed to any of these variant-specific epitopes are also encompassed herein.
By way of example, the following references provide descriptions of methods for making antibodies specific to mutant proteins: Hills et al., {Int. J. Cancer, 63: 537-543, 1995); Reiter & Maihle {Nucleic Acids Res., 24: 4050-4056, 1996); Okamoto et al. {Br. J. Cancer, 73: 1366-1372, 1996); Nakayashiki et α/., (J/?«. J. Cancer Res., 91 : 1035-1043, 2000); Gannon et al. {EMBO J., 9: 1595-1602, 1990); Wong et al. {Cancer Res., 46: 6029-6033, 1986); and Carney et al. (J Cell Biochem., 32: 207-214, 1986). Similar methods can be employed to generate antibodies specific to specific protein variants, including variants of MUC5AC, or another protein encoded by a gene or EST in the region of 1 lpter discussed herein.
EXAMPLE lO: Kits
Kits are provided which contain the necessary reagents for determining the presence or absence of variation(s) in a mucin-encoding sequence, such as probes or primers specific for the MUC5AC gene or a variable region therein, such as those regions indicated by the gene region of transcripts designated as XM_001714774.1 (MUC5 AC). Such kits can be used with the methods described herein to determine whether a subject is predisposed to pulmonary fibrosis or development of fibrosis of the small airways, or whether the subject is expected to respond to one or another therapy, such as a mucin supplement or replacement therapy. The provided kits may also include written instructions. The instructions can provide calibration curves or charts to compare with the determined {e.g., experimentally measured) values. A. Kits for Amplification of Mucin Sequences
Oligonucleotide probes and primers, including those disclosed herein, can be supplied in the form of a kit for use in detection of a predisposition to pulmonary fibrosis in a subject. In such a kit, an appropriate amount of one or more of the oligonucleotide primers is provided in one or more containers. The oligonucleotide primers may be provided suspended in an aqueous solution or as a freeze-dried or lyophilized powder, for instance. The container(s) in which the oligonucleotide(s) are supplied can be any conventional container that is capable of holding the supplied form, for instance, microfuge tubes, ampoules, or bottles. In some applications, pairs of primers may be provided in pre-measured single use amounts in individual, typically disposable, tubes or equivalent containers. With such an arrangement, the sample to be tested for the presence of a mucin variation (e.g., a SNP in MUC5 AC) can be added to the individual tubes and amplification carried out directly.
The amount of each oligonucleotide primer supplied in the kit can be any appropriate amount, depending for instance on the market to which the product is directed. For instance, if the kit is adapted for research or clinical use, the amount of each oligonucleotide primer provided would likely be an amount sufficient to prime several PCR amplification reactions. Those of ordinary skill in the art know the amount of oligonucleotide primer that is appropriate for use in a single amplification reaction. General guidelines may for instance be found in Innis et al. (PCR Protocols, A Guide to Methods and Applications, Academic Press, Inc., San Diego, CA, 1990), Sambrook et al. (In Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, New York, 1989), and Ausubel et al. (In Current Protocols in Molecular Biology, Greene Publ. Assoc, and Wiley- Intersciences, 1992).
A kit may include more than two primers, in order to facilitate the in vitro amplification of mucin sequences, for instance the MUC5AC gene or the 5' or 3' flanking region thereof.
In some embodiments, kits may also include the reagents necessary to carry out nucleotide amplification reactions, including, for instance, DNA sample preparation reagents, appropriate buffers (e.g., polymerase buffer), salts (e.g., magnesium chloride), and deoxyribonucleotides (dNTPs). Kits may in addition include either labeled or unlabeled oligonucleotide probes for use in detection of mucin variant sequence(s). In certain embodiments, these probes will be specific for a potential mutation that may be present in the target amplified sequences. The appropriate sequences for such a probe will be any sequence that includes one or more of the identified polymorphic sites, particularly nucleotide positions that overlap with the variants shown herein, such that the sequence the probe is complementary to a polymorphic site and the surrounding mucin encoding sequence.
It may also be advantageous to provide in the kit one or more control sequences for use in the amplification reactions. The design of appropriate positive control sequences is well known to one of ordinary skill in the appropriate art. B. Kits for Detection of mRNA Expression
Kits similar to those disclosed above for the detection of mucin sequence variations directly can be used to detect mucin mRNA expression, such as over- or under-expression. Such kits include an appropriate amount of one or more oligonucleotide primers for use in, for instance, reverse transcription PCR reactions, similarly to those provided above with art- obvious modifications for use with RNA amplification.
In some embodiments, kits for detection of altered expression of MUC5AC mRNA may also include some or all of the reagents necessary to carry out RT-PCR in vitro amplification reactions, including, for instance, RNA sample preparation reagents (including e.g., an RNase inhibitor), appropriate buffers (e.g., polymerase buffer), salts (e.g., magnesium chloride), and deoxyribonucleotides (dNTPs). Written instructions may also be included.
Such kits may in addition include either labeled or unlabeled oligonucleotide probes for use in detection of the in vitro amplified target sequences. The appropriate sequences for such a probe will be any sequence that falls between the annealing sites of the two provided oligonucleotide primers, such that the sequence the probe is complementary to is amplified during the PCR reaction. In certain embodiments, these probes will be specific for a potential mutation that may be present in the target amplified sequences, for instance specific for the single nucleotide polymorphism AADDOl 112371.1_1 1838, rs34474233, or rs34815853 (all in MUC5AC). Additional SNPs are described herein. It may also be advantageous to provide in the kit one or more control sequences for use in the RT-PCR reactions. The design of appropriate positive control sequences is well known to one of ordinary skill in the appropriate art.
Alternatively, kits may be provided with the necessary reagents to carry out quantitative or semi-quantitative Northern analysis of mucin mRNA. Such kits include, for instance, at least one mucin-specific oligonucleotide for use as a probe. This oligonucleotide may be labeled in any conventional way, including with a selected radioactive isotope, enzyme substrate, co-factor, ligand, chemiluminescent or fluorescent agent, hapten, or enzyme. In certain embodiments, such probes will be specific for a potential mutation that may be present in the target amplified sequence, such as the mutations disclosed herein. C. Kits for Detection of Mucin Protein Expression
Kits for the detection of mucin protein expression (such as over- or under-expression of MUC5AC) are also encompassed. Such kits may include at least one target protein specific binding agent (e.g., a polyclonal or monoclonal antibody or antibody fragment that specifically recognizes the mucin protein) and may include at least one control (such as a determined amount of mucin protein, or a sample containing a determined amount of mucin protein). The target protein specific binding agent and control may be contained in separate containers.
The mucin protein expression detection kits may also include a means for detecting mucin:binding agent complexes, for instance the agent may be detectably labeled. If the detectable agent is not labeled, it may be detected by second antibodies or protein A for example, which may also be provided in some kits in one or more separate containers. Such techniques are well known.
Additional components in specific kits may include instructions for carrying out the assay. Instructions will allow the tester to determine whether MUC5AC expression levels are elevated. Reaction vessels and auxiliary reagents such as chromogens, buffers, enzymes, etc. may also be included in the kits.
D. Kits for Detection of Homozygous versus Heterozygous Allelism Also provided are kits that allow differentiation between individuals who are homozygous versus heterozygous for the AADDOl 112371.1_11838, rs34474233, or rs34815853 SNP (all in MUC5AC); or any combination thereof. Additional SNPs are described herein. Such kits provide the materials necessary to perform oligonucleotide ligation assays (OLA), as described at Nickerson et al. (Proc. Natl. Acad. ScL USA 87:8923- 8927, 1990). In specific embodiments, these kits contain one or more microtiter plate assays, designed to detect mutation(s) in the mucin sequence(s) of a subject, as described herein.
Additional components in some of these kits may include instructions for carrying out the assay. Instructions will allow the tester to determine whether a mucin allele is homozygous or heterozygous. Reaction vessels and auxiliary reagents such as chromogens, buffers, enzymes, etc. may also be included in the kits. It may also be advantageous to provide in the kit one or more control sequences for use in the OLA reactions. The design of appropriate positive control sequences is well known to one of ordinary skill in the appropriate art. EXAMPLE 11: Screening Assays for Compounds that Modulate Expression or Activity of a Target (such as MUC 5 AC)
The following assays are designed to identify compounds that interact with (e.g., bind to) a variant form of a MUC5AC, compounds that interact with (e.g., bind to) intracellular proteins that interact with such a variant form, compounds that interfere with the interaction of MUC5 AC with transmembrane or intracellular proteins involved in signal transduction, and to compounds which modulate the activity of MUC5AC (i.e., modulate the level of gene expression) or modulate the level of activity of a variant form of MUC5AC. Assays may additionally be utilized which identify compounds which bind to MUC5AC regulatory sequences (e.g., promoter sequences) and which may modulate gene expression. See, e.g., Platt, J. Biol. Chem. 269:28558-28562, 1994. It is contemplated that these assays also can be used to identify compounds that interact in any of the ways listed above with another gene, regulatory sequence, gene corresponding with an EST, or protein encoded thereby, from the region of 1 lpter described herein as being linked to susceptibility to pulmonary fibrosis. The compounds which may be screened in accordance with the disclosure include, but are not limited to peptides, antibodies and fragments thereof, and other organic compounds (e.g., peptidomimetics, small molecules) that bind to one or more variant sequences (including variant regulatory sequences or encoding sequences) as described herein and either mimic the activity triggered by the natural ligand (i.e., agonists) or inhibit the activity triggered by the natural ligand (i.e., antagonists); as well as peptides, antibodies or fragments thereof, and other organic compounds that mimic the a variant (or a portion thereof) and bind to and "neutralize" natural ligand.
Such compounds may include, but are not limited to, peptides such as, for example, soluble peptides, including but not limited to members of random peptide libraries; (see, e.g., Lam et al, Nature 354:82-84, 1991 ; Houghten et al, Nature 354:84-86, 1991), and combinatorial chemistry-derived molecular library made of D- and/or L- configuration amino acids, phosphopeptides (including, but not limited to, members of random or partially degenerate, directed phosphopeptide libraries; see, e.g., Songyang et al, Cell 72:767-778, 1993), antibodies (including, but not limited to, polyclonal, monoclonal, humanized, anti- idiotypic, chimeric or single chain antibodies, and Fab, F(ab')2 and Fab expression library fragments, and epitope-binding fragments thereof), and small organic or inorganic molecules.
Other compounds which can be screened in accordance with the disclosure include but are not limited to small organic molecules that are able to gain entry into an appropriate cell and affect the expression of MUC5AC gene or some other gene involved in a related signal transduction pathway (e.g., by interacting with the regulatory region or transcription factors involved in gene expression); or such compounds that affect the activity of a variant MUC5 AC or the activity of some other intracellular factor involved in the signal transduction pathway. Computer modeling and searching technologies permit identification of compounds, or the improvement of already identified compounds, that can modulate expression or activity of a variant target protein. Having identified such a compound or composition, the active/binding/effector sites or regions are identified. Such active sites typically might be ligand binding sites, such as the interaction domains of a molecule with a variant MUC5AC itself or a sequence encoding the protein or regulating the expression thereof, or the interaction domains of a molecule with a specific allelic variant in comparison to the interaction domains of that molecule with another variant of the protein.
The active site can be identified using methods known in the art including, for example, from the amino acid sequences of peptides, from the nucleotide sequences of nucleic acids, or from study of complexes of the relevant compound or composition with its natural ligand. In the latter case, chemical methods can be used to find the active site by finding where on the factor the complexed ligand is found. Next, the three dimensional geometric structure of the active site is determined. This can be done by known methods can determine a complete molecular structure. On the other hand, solid or liquid phase NMR can be used to determine certain intra-molecular distances. Any other experimental method of structure determination can be used to obtain partial or complete geometric structures, such as high resolution electron microscopy. The geometric structures may be measured with a complexed ligand, natural or artificial, which may increase the accuracy of the active site structure determined. In another embodiment, the structure of the specified target protein is compared to that of a "variant" of the specified protein and, rather than solve the entire structure, the structure is solved for the protein domains that are changed.
If an incomplete or insufficiently accurate structure is determined, the methods of computer based numerical modeling can be used to complete the structure or improve its accuracy. Any recognized modeling method may be used, including parameterized models specific to particular biopolymers such as proteins or nucleic acids, molecular dynamics models based on computing molecular motions, statistical mechanics models based on thermal ensembles, or combined models. For most types of models, standard molecular force fields, representing the forces between constituent atoms and groups, are necessary, and can be selected from force fields known in physical chemistry. The incomplete or less accurate experimental structures can serve as constraints on the complete and more accurate structures computed by these modeling methods.
Finally, having determined the structure of the active site, either experimentally, by modeling, or by a combination, candidate modulating compounds can be identified by searching databases containing compounds along with information on their molecular structure. Such a search seeks compounds having structures that match the determined active site structure and that interact with the groups defining the active site. Such a search can be manual, but is preferably computer assisted. These compounds found from this search are potential MUC5AC modulating compounds. Alternatively, these methods can be used to identify improved modulating compounds from an already known modulating compound or ligand. The composition of the known compound can be modified and the structural effects of modification can be determined using the experimental and computer modeling methods described above applied to the new composition. The altered structure is then compared to the active site structure of the compound to determine if an improved fit or interaction results. In this manner systematic variations in composition, such as by varying side groups, can be quickly evaluated to obtain modified modulating compounds or ligands of improved specificity or activity.
In another embodiment, the structure of a specified protein or nucleic acid sequence, such as a regulatory sequence, (the reference form) is compared to that of a variant protein or sequence (encoded by a different allele of the same protein, or a variant non-coding nucleic acid sequence such as a regulatory sequence containing one or more SNPs). Then, potential inhibitors (or enhancers) are designed that bring about a structural change in the reference form so that it resembles the variant form. Or, potential mimics are designed that bring about a structural change in the variant form so that it resembles another variant form, or the form of the reference receptor. In the case of nucleic acid sequences (including for instance regulatory sequences), the inhibitors, enhancers, or mimics may influence the binding of one or more other proteins to the nucleic acid sequence, for instance in a way that affects the transcription of an encoding sequence that is operably linked to that nucleic acid sequence. Further experimental and computer modeling methods useful to identify modulating compounds based upon identification of the active sites of compounds, various variants of MUC5 AC, regulatory regions thereof, and other sequences or proteins encoded for in the region of 1 lpter described herein, and related transduction and transcription factors will be apparent to those of skill in the art. Examples of molecular modeling systems are the CHARMM and QUANTA programs (Polygen Corporation, Waltham, Mass.). CHARMM performs the energy minimization and molecular dynamics functions. QUANTA performs the construction, graphic modeling and analysis of molecular structure. QUANTA allows interactive construction, modification, visualization, and analysis of the behavior of molecules with each other.
A number of articles review computer modeling of drugs interactive with specific- proteins, such as Rotivinen et al. Acta Pharmaceutical Fennica 97:159-166, 1988; Ripka, New Scientist 54-57, 1988; McKinaly and Rossmann, Annu Rev Pharmacol Toxicol 29:111- 122, 1989; Perry and Davies, OSAR: Quantitative Structure-Activity Relationships in Drug Design pp. 189-193, 1989 (Alan R. Liss, Inc.); Lewis and Dean, Proc R Soc Lond236:\25- 140 and 141-162, 1989; and, with respect to a model receptor for nucleic acid components, Askew et al., JAm Chem Soc 1 11 : 1082-1090, 1989. Other computer programs that screen and graphically depict chemicals are available from companies such as BioDesign, Inc. (Pasadena, Calif.), Allelix, Inc. (Mississauga, Ontario, Canada), and Hypercube, Inc. (Cambridge, Ontario). Although these are primarily designed for application to drugs specific to particular proteins, they can be adapted to design of drugs specific to regions of DNA or RNA, once that region is identified.
Although described above with reference to design and generation of compounds which could alter binding, one could also screen libraries of known compounds, including natural products or synthetic chemicals, and biologically active materials, including proteins, for compounds which are inhibitors or activators.
Compounds identified via assays such as those described herein may be useful, for example, in elaborating the biological function of a variant MUC5 AC gene product, and for designing therapeutic molecules useful in the diagnosis and/or treatment of pulmonary fibrosis.
EXAMPLE 12: In vitro Screening Assays for Compounds that Bind to a Nucleotide Variant
In vitro systems may be used to identify compounds capable of interacting with {e.g., binding to) a variant protein or nucleic acid sequence including one or more of the SNPs described herein. Compounds identified using such systems may be useful, for example, in modulating the activity of "wild type" (reference) and/or "variant" gene products (such as MUC5AC); may be useful in elaborating the biological function of such proteins; may be utilized in screens for identifying compounds that disrupt normal protein-protein or protein- nucleic acid interactions; may in themselves disrupt such interactions; or may be used to study or characterize the regulation of gene expression, for instance expression of MUC5AC or a reporter protein linked to a regulatory sequence from MUC5AC or another gene or EST from l lpter. One type of assay that can be used to identify compounds that bind to a variant molecule (such as a variant protein, peptide, or nucleic acid) involves preparing a reaction mixture of a variant molecule and a test compound under conditions and for a time sufficient to allow the two components to interact and bind, thus forming a complex which can be removed and/or detected in the reaction mixture. The molecular species used can vary depending upon the goal of the screening assay. For example, where agonists or antagonists of a protein are sought, the full length protein (e.g., MUC5AC), or a soluble truncated portion thereof, or a fusion protein containing a variant peptide fused to a protein or polypeptide that affords advantages in the assay system (e.g., labeling, isolation of the resulting complex, etc.) can be utilized. Where compounds that interact with a nucleic acid sequence, such as a regulatory or putative regulatory sequence, are sought to be identified, oligonucleotides corresponding to a variant sequence (containing at least one SNP position as discussed herein) and fusion nucleic acid molecules containing a variant sequence can be used.
The screening assays can be conducted in a variety of ways. For example, one method to conduct such an assay involves anchoring a variant molecule (such as a protein, polypeptide, peptide or fusion protein, or nucleic acid) or the test substance(s), onto a solid phase and detecting variant molecule/test compound complexes anchored on the solid phase at the end of the reaction. In one embodiment of such a method, the variant molecule(s) may be anchored onto a solid surface, and the test compound(s), which is not anchored, may be labeled, either directly or indirectly. In practice, microtiter plates may conveniently be utilized as the solid phase. The anchored component may be immobilized by non-covalent or covalent attachments. Non- covalent attachment may be accomplished by simply coating the solid surface (or a portion thereof) with a solution containing the protein (or nucleic acid) and drying. Alternatively, an immobilized specific binding agent, such as an antibody, preferably a monoclonal antibody, specific for the protein to be immobilized may be used to anchor the protein to the solid surface. The surfaces may be prepared in advance and stored.
In order to conduct the assay, the nonimmobilized component is added to the coated surface containing the anchored component. After the reaction is complete, unreacted components are removed (e.g. , by washing) under conditions such that any complexes formed will remain immobilized on the solid surface. The detection of complexes anchored on the solid surface can be accomplished in a number of ways. Where the previously nonimmobilized component is pre-labeled, the detection of label immobilized on the surface indicates that complexes were formed. Where the previously nonimmobilized component is not pre-labeled, an indirect label can be used to detect complexes anchored on the surface; e.g., using a labeled antibody specific for the previously nonimmobilized component (the antibody, in turn, may be directly labeled or indirectly labeled with a labeled anti-Ig antibody).
Alternatively, a reaction can be conducted in a liquid phase, the reaction products separated from unreacted components, and complexes detected. Such detection can involve using an immobilized binding agent specific for the variant molecule (such as an antibody or other binding agent specific for a variant protein, polypeptide, peptide or fusion protein (for instance, MUC5AC)) or specific for the test compound, to anchor or capture any complexes formed in solution, and a labeled antibody (or other binding agent) specific for the other component of the possible complex to detect anchored complexes.
Alternatively, cell-based assays, membrane vesicle-based assays and membrane fraction-based assays can be used to identify compounds that interact with a variant molecule. To this end, cell lines that express a variant molecule, such as a variant MUC5AC encoding sequence or a regulatory sequence variant or other non-coding sequence variant (or combination of two or more variants) or cell lines (e.g., COS cells, CHO cells, HEK293 cells, etc.) that have been genetically engineered to express a variant (e.g., by transfection or transduction of protein encoding DNA) can be used. Interaction of the test compound with, for example, a variant protein (e.g., MUC5AC) expressed by the host cell, or a variant nucleic acid sequence present in the host cell, can be determined by comparison or competition with a host cell not treated with the compound, or treated with another compound, or by examining one or more biological characteristics linked to the variant (such as pulmonary fibrosis).
A variant molecule, such as a variant nucleic acid or polypeptide (such as those described herein) may be employed in a screening process for compounds which bind the variant molecule and which activate (agonists) or inhibit activation (antagonists) of the molecule or one linked thereto. Thus, variant molecules described herein also may be used to assess the binding of small molecule substrates and ligands in, for example, cells, cell-free preparations, chemical libraries, and natural product mixtures. These substrates and ligands may be natural substrates and ligands or may be structural or functional mimetics. See Coligan et al. Current Protocols in Immunology 1 (2): Chapter 5, 1991.
In general, such screening procedures involve providing appropriate cells that express a polypeptide of the present disclosure, or a reporter polypeptide operably linked to a non- coding variant nucleic acid found at 1 lpter. Such cells include cells from mammals, insects, yeast, and bacteria. In particular, a polynucleotide regulatory sequence or polynucleotide encoding the polypeptide is employed to transfect cells to thereby express a variant molecule. The cell expressing the variant polypeptide or variant nucleic acid is then contacted with a test compound to observe binding, stimulation or inhibition of a functional response. The technique may also be employed for screening of compounds which activate a molecule of the present disclosure by contacting such cells with compounds to be screened and determining whether such compound generates a signal, i.e., activates the polypeptide or reporter polypeptide.
Another method involves screening for compounds which are antagonists, and thus inhibit activation of a molecule of the present disclosure by determining inhibition of binding of labeled ligand, such as a factor that binds to a nucleic acid of the disclosure, to cells expressing the variant molecule or a reporter gene operably linked to a non-coding nucleic acid (such as a regulatory region). Such a method involves transfecting a eukaryotic cell with a DNA encoding a variant molecule such that the cell expresses the molecule (or expresses a reporter gene under the control of a non-coding region containing a variant SNP or haplotype as described herein). The cell is then contacted with a potential antagonist in the presence of a labeled form of a ligand or binding factor. The ligand/factor can be labeled, e.g., with radioactivity. The amount of labeled ligand/factor bound to the variant molecule is measured, e.g., by measuring radioactivity associated with transfected cells or membrane another fraction from these cells. If the compound binds to the variant molecule, the binding of labeled ligand/factor to the variant is inhibited as determined by a reduction of labeled ligand/factor that binds.
EXAMPLE 13: Pharmaceutical Preparations and Methods of Administration Therapeutic compounds and agents can be administered directly to the mammalian subject for modulation of MUC5AC activity or expression, or the activity or expression of another gene, EST, or protein encoded by a gene or EST found in the 1 lpter region as described herein. Administration is by any of the routes normally used for introducing a modulator compound into ultimate contact with the tissue to be treated. The compounds or agents, alone or accompanied by one or more additional therapeutic agents, are administered in any suitable manner, optionally with pharmaceutically acceptable carrier(s). Suitable methods of administering such compounds/agents are available and well known to those of ordinary skill in the art, and, although more than one route can be used to administer a particular composition, a particular route can often provide a more immediate and more effective reaction than another route.
Pharmaceutically acceptable carriers are determined in part by the particular composition being administered, as well as by the particular method used to administer the composition. Accordingly, there is a wide variety of suitable formulations of pharmaceutical compositions of the present invention (see, e.g. , Remington 's Pharmaceutical Sciences, 17l ed. 1985).
Formulations suitable for administration include aqueous and non-aqueous solutions, isotonic sterile solutions, which can contain antioxidants, buffers, bacteriostats, and solutes that render the formulation isotonic, and aqueous and non-aqueous sterile suspensions that can include suspending agents, solubilizers, thickening agents, stabilizers, and preservatives. Compositions can be administered, for example, orally, parenterally, intrathecally, and so forth.
The formulations of compounds can be presented in unit-dose or multi-dose sealed containers, such as ampoules and vials. Solutions and suspensions can be prepared from sterile powders, granules, and tablets of the kind previously described. The compounds/agents also can be optionally administered as part of a prepared food or drug.
The dose administered to a subject, in the context of the present disclosure, should be sufficient to affect a beneficial response in the subject over time. The dose will be determined by the efficacy of the particular compound/agent employed and the condition of the subject, as well as the body weight or surface area of the area to be treated, and whether the subject is being treated prophylactically or after the identification and diagnosis of a specific disease, condition, or disorder. The size of the dose also may be influenced by the existence, nature, and extent of any adverse side-effects that accompany the administration of a particular compound in a particular subject. In determining the effective amounts of the modulator to be administered, a physician may evaluate circulating plasma levels of the modulator, modulator toxicities, and the production of anti-modulator antibodies. In general, the dose equivalent of a modulator is from about 1 ng/kg to 10 mg/kg for a typical subject. For administration, therapeutic compounds of the present disclosure can be administered at a rate determined by the LD50 of the modulator, and the side effects of the inhibitor at various concentrations, as applied to the mass and overall health of the subject. Administration can be accomplished via single or divided doses.
EXAMPLE 14: Gene Therapy
Gene therapy approaches for replacing inactive or reduced activity MUC5AC, or reducing the risk of developing a fibrotic disease such as pulmonary fibrosis, in subjects are now made possible by the present disclosure. Retroviruses have been considered a preferred vector for experiments in gene therapy, with a high efficiency of infection and stable integration and expression (Orkin et al, Prog. Med. Genet. 7: 130-142, 1988). The full-length MUC5AC gene or cDNA can be cloned into a retroviral vector and driven from either its endogenous promoter or from the retroviral LTR (long terminal repeat). Other viral transfection systems may also be utilized for this type of approach, including adenovirus, adeno-associated virus (AAV) (McLaughlin et al, J. Virol. 62: 1963-1973, 1988), Vaccinia virus (Moss et al, Annu. Rev. Immunol. 5:305-324, 1987), Bovine Papilloma virus (Rasmussen et al, Methods Enzymol 139:642-654, 1987) or members of the herpesvirus group such as Epstein-Barr virus (Margolskee et al, MoI Cell. Biol. 8:2837-2847, 1988). Recent developments in gene therapy techniques include the use of RNA-DNA hybrid oligonucleotides, as described by Cole-Strauss et al {Science 273: 1386-1389, 1996). This technique may allow for site-specific integration of cloned sequences, thereby permitting accurately targeted gene replacement.
In addition to delivery of MUC5 AC encoding sequence to cells using viral vectors, it is possible to use non-infectious methods of delivery. For instance, lipidic and liposome- mediated gene delivery has recently been used successfully for transfection with various genes (for reviews, see Templeton and Lasic, MoI. Biotechnol 11 : 175-180, 1999; Lee and Huang, Crit. Rev. Ther. Drug Carrier Syst. 14:173-206; and Cooper, Semin. Oncol. 23:172- 187, 1996). For instance, cationic liposomes have been analyzed for their ability to transfect monocytic leukemia cells, and shown to be a viable alternative to using viral vectors (de Lima et al, MoI Membr. Biol. 16: 103-109, 1999). Such cationic liposomes can also be targeted to specific cells through the inclusion of, for instance, monoclonal antibodies or other appropriate targeting ligands (Kao et al, Cancer Gene Ther. 3:250-256, 1996). To reduce the level of mucin expression, gene therapy can be carried out using antisense or other suppressive constructs, the construction of which is discussed above and are well known in the art.
EXAMPLE 15: Knockout and Over expression Transgenic Animals
Mutant organisms that under-express or over-express a specific mucin protein (or more than one mucin) are useful for research. Such mutants allow insight into the physiological and/or pathological role of particular mucins in a healthy and/or pathological organism. These mutants are "genetically engineered," meaning that information in the form of nucleotides has been transferred into the mutant's genome at a location, or in a combination, in which it would not normally exist. Nucleotides transferred in this way are said to be "non-native." For example, a non-mucin promoter inserted upstream of a native mucin encoding sequence would be non-native. An extra copy of a mucin gene on a plasmid, transformed into a cell, would be non-native. Mutants may be, for example, produced from mammals, such as mice, that either over-express MUC5AC or under-express MUC5AC, or that do not express MUC5AC at all, or any combination thereof. Over-expression mutants are made by increasing the number of MUC5 AC genes in the organism, or by introducing a MUC5AC gene into the organism under the control of a constitutive or inducible or viral promoter such as the mouse mammary tumor virus (MMTV) promoter or the whey acidic protein (WAP) promoter or the metallothionein promoter. Mutants that under-express MUC5AC may be made by using an inducible or repressible promoter, or by deleting the MUC5AC gene, or by destroying or limiting the function of the MUC5AC gene, for instance by disrupting the gene by transposon insertion. Antisense genes or siRNAs may be engineered or introduced into the organism, under a constitutive or inducible promoter, to decrease or prevent MUC5AC expression.
A gene is "functionally deleted" when genetic engineering has been used to negate or reduce gene expression to negligible levels. When a mutant is referred to in this application as having the mucin gene altered or functionally deleted, this refers to the mucin gene and to any ortholog of this gene. When a mutant is referred to as having "more than the normal copy number" of a gene, this means that it has more than the usual number of genes found in the wild-type organism, e.g., in the diploid mouse or human.
A mutant mouse over-expressing MUC5AC may be made by constructing a plasmid having the respective encoding sequence driven by a promoter, such as the mouse mammary tumor virus (MMTV) promoter or the whey acidic protein (WAP) promoter. This plasmid may be introduced into mouse oocytes by microinjection. The oocytes are implanted into pseudopregnant females, and the litters are assayed for insertion of the transgene. Multiple strains containing the transgene are then available for study.
WAP is quite specific for mammary gland expression during lactation, and MMTV is expressed in a variety of tissues including mammary gland, salivary gland and lymphoid tissues. Many other promoters might be used to achieve various patterns of expression, e.g., the metallothionein promoter.
An inducible system may be created in which the subject expression construct is driven by a promoter regulated by an agent that can be fed to the mouse, such as tetracycline. Such techniques are well known in the art.
A mutant knockout animal (e.g., mouse) from which a mucin gene is deleted can be made by removing all or some of the coding regions of the mucin gene from embryonic stem cells. The methods of creating deletion mutations by using a targeting vector have been described (Thomas and Capecch, Cell 51 :503-512, 1987).
EXAMPLE 16: Knock-in Organisms
In addition to knock-out systems, it is also beneficial to generate "knock-ins" that have lost expression of the wildtype protein but have gained expression of a different, usually mutant form of the same protein. By way of example, MUC5AC variant proteins provided herein (e.g., as encoded by transcripts designated by RefSeq ID XM_001714774.1
(MUC5AC) can be expressed in a knockout background, such as the Patch mutant mice, in order to provide model systems for studying the effects of these mutants. In particular embodiments, the resultant knock-in organisms provide systems for studying fibrosis, and particularly pulmonary fibrosis. Those of ordinary skill in the relevant art know methods of producing knock-in organisms. See, for instance, Rane et al. (MoI. Cell Biol, 22: 644-656, 2002); Sotillo et al.
(EMBO J, 20: 6637-6647, 2001); Luo et al. (Oncogene, 20: 320-328, 2001); Tomasson et al.
(Blood, 93: 1707-1714, 1999); Voncken et al. (, 86: 4603-4611, 1995); Andrae et al. (Mech.
Dev., 107: 181-185, 2001); Reinertsen et al. (Gene Expr., 6: 301-314, 1997); Huang et al. (MoI. Med., 5: 129-137, 1999); Reichert et al. (Blood, 97: 1399-1403, 2001); and Huettner et al. (Nat. Genet., 24: 57-60, 2000), by way of example.
It will be apparent that the precise details of the methods and compositions described herein may be varied or modified without departing from the spirit of the described invention. We claim all such modifications and variations that fall within the scope and spirit of the claims below.

Claims

1. A method of diagnosing or prognosing a subject as being susceptible to a type of pulmonary fibrosis, comprising: detecting presence of a risk allele of a SNP or a risk haplotype in 11 pter of the subject, wherein the allele is or the haplotype comprises a variant as indicated in table 6.
2. The method of claim 1, wherein said detecting step is carried out on a biological sample collected from said subject.
3. The method of claim 1 or 2, wherein said detecting step comprises an In vitro amplification step.
4. The method of claim 3, wherein said In vitro amplification step comprises a polymerase chain reaction step.
5. A kit for determining whether or not a subject has or is susceptible to pulmonary fibrosis, comprising: a container comprising at least one oligonucleotide specific MUC5AC SNP sequence as described one of table 6, and instructions for using the kit, the instructions indicating steps for: performing a method to detect the presence of mutant/polymorphic MUC5AC nucleic acid in the sample; and analyzing data generated by the method, wherein the instructions indicate that presence of the mutant/polymorphic nucleic acid in the sample indicates that the individual has or is predisposed to the biological condition.
6. A method for screening a subject for a predisposition to pulmonary fibrosis or a related disease or condition, the method including: determining from a sample taken from the subject the presence or absence of a SNP in MUC5AC.
7. The method of claim 6, wherein the SNP is selected from those listed in table 6.
8. The method of claim 6 or 7, wherein said determining step comprises an In vitro amplification step.
9. The method of claim 8, wherein said in vitro amplification step comprises a polymerase chain reaction step.
PCT/US2008/013271 2007-12-03 2008-12-02 Identification and diagnosis of pulmonary fibrosis using mucin genes, and related methods and compositions WO2009073167A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US99207907P 2007-12-03 2007-12-03
US60/992,079 2007-12-03

Publications (2)

Publication Number Publication Date
WO2009073167A2 true WO2009073167A2 (en) 2009-06-11
WO2009073167A3 WO2009073167A3 (en) 2009-09-03

Family

ID=40718413

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/013271 WO2009073167A2 (en) 2007-12-03 2008-12-02 Identification and diagnosis of pulmonary fibrosis using mucin genes, and related methods and compositions

Country Status (1)

Country Link
WO (1) WO2009073167A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011094345A1 (en) 2010-01-26 2011-08-04 National Jewish Health Methods and compositions for risk prediction, diagnosis, prognosis, and treatment of pulmonary disorders
WO2014197713A3 (en) * 2013-06-05 2015-02-12 The Regents Of The University Of Colorado, A Body Corporate Molecular phenotyping of idiopathic interstitial pneumonia identifies two subtypes of idiopathic pulmonary fibrosis
CN115322133A (en) * 2022-07-29 2022-11-11 四川大学华西医院 Application of compound in preparation of pulmonary fibrosis viscosity-responsive fluorescent probe

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1403638A1 (en) * 2002-09-25 2004-03-31 Mondobiotech SA Molecular methods for diagnosing interstitial lung diseases
US20060275808A1 (en) * 2005-05-20 2006-12-07 Young Robert P Methods of analysis of polymorphisms and uses thereof
US20060292562A1 (en) * 2002-05-29 2006-12-28 Pollard Harvey B Methods of identifying genomic and proteomic biomarkers for cystic fibrosis, arrays comprising the biomarkers and methods of using the arrays
US20070099202A1 (en) * 2005-05-19 2007-05-03 Young Robert P Methods and compositions for assessment of pulmonary function and disorders

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060292562A1 (en) * 2002-05-29 2006-12-28 Pollard Harvey B Methods of identifying genomic and proteomic biomarkers for cystic fibrosis, arrays comprising the biomarkers and methods of using the arrays
EP1403638A1 (en) * 2002-09-25 2004-03-31 Mondobiotech SA Molecular methods for diagnosing interstitial lung diseases
US20070099202A1 (en) * 2005-05-19 2007-05-03 Young Robert P Methods and compositions for assessment of pulmonary function and disorders
US20060275808A1 (en) * 2005-05-20 2006-12-07 Young Robert P Methods of analysis of polymorphisms and uses thereof

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011094345A1 (en) 2010-01-26 2011-08-04 National Jewish Health Methods and compositions for risk prediction, diagnosis, prognosis, and treatment of pulmonary disorders
US20110217315A1 (en) * 2010-01-26 2011-09-08 National Jewish Health Methods and compositions for risk prediction, diagnosis, prognosis, and treatment of pulmonary disorders
US8673565B2 (en) 2010-01-26 2014-03-18 National Jewish Health Methods and compositions for risk prediction, diagnosis, prognosis, and treatment of pulmonary disorders
US10858707B2 (en) 2010-01-26 2020-12-08 National Jewish Health Methods and compositions for risk prediction, diagnosis, prognosis, and treatment of pulmonary disorders
US11649503B2 (en) 2010-01-26 2023-05-16 National Jewish Health Methods and compositions for risk prediction, diagnosis, prognosis, and treatment of pulmonary disorders
WO2014197713A3 (en) * 2013-06-05 2015-02-12 The Regents Of The University Of Colorado, A Body Corporate Molecular phenotyping of idiopathic interstitial pneumonia identifies two subtypes of idiopathic pulmonary fibrosis
CN115322133A (en) * 2022-07-29 2022-11-11 四川大学华西医院 Application of compound in preparation of pulmonary fibrosis viscosity-responsive fluorescent probe
CN115322133B (en) * 2022-07-29 2023-07-04 四川大学华西医院 Application of compound in preparation of pulmonary fibrosis viscosity response fluorescent probe

Also Published As

Publication number Publication date
WO2009073167A3 (en) 2009-09-03

Similar Documents

Publication Publication Date Title
Nishimura et al. Comparative genomics and gene expression analysis identifies BBS9, a new Bardet-Biedl syndrome gene
EP2851432B1 (en) RCA locus analysis to assess susceptibility to AMD
US7867727B2 (en) Methods and reagents for treatment and diagnosis of vascular disorders and age-related macular degeneration
US9115400B2 (en) LMNA gene and its involvement in Hutchinson-Gilford Progeria Syndrome (HGPS) and arteriosclerosis
KR20130100207A (en) Methods and reagents for treatment and diagnosis of age-related macular degeneration
KR20100016525A (en) Method for determination of progression risk of glaucoma
US20170152565A1 (en) Method for diagnosing renal diseases or predispositions
EP2298878A2 (en) Compositions and methods for the diagnosis of cancer
WO2009073167A2 (en) Identification and diagnosis of pulmonary fibrosis using mucin genes, and related methods and compositions
WO2008118969A2 (en) Methods and agents for evaluating inflammatory bowel disease, and targets for treatment
US20100291560A1 (en) Methods and compositions for diagnosis and treatment of dyskeratosis congenita and related disorders
JP2006526986A (en) Diagnosis method for inflammatory bowel disease
US7351534B2 (en) Gene mutation associated with age-related macular degeneration
WO2007131202A2 (en) Genomics of in-stent restenosis
US20050089885A1 (en) IRF6 polymorphisms associated with cleft lip and/or palate
EP2322656A1 (en) Methods for diagnosing skin diseases
US20070202502A1 (en) Assay For Bipolar Affective Disorder
WO2001078575A2 (en) Ext2 as a predictive marker for osteoporosis
AU2008201172B2 (en) Methods and Compositions for the Diagnosis of Cancer Susceptibilites and Defective DNA Repair Mechanisms and Treatment Thereof
US7771942B2 (en) Genetic marker for prostate cancer
WO2010009534A1 (en) Methods for the treatment, prevention and diagnosis of lipid metabolism associated diseases
WO2015083685A1 (en) Dlg1/SAP97 GENE SPLICING VARIANT, AND DETECTION OF SCHIZOPHRENIA USING SPLICING VARIANT
WO2003064591A2 (en) Abca8 nucleic acids and proteins, and uses thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08857847

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08857847

Country of ref document: EP

Kind code of ref document: A2