WO2001054557A2 - Computational subtraction method - Google Patents
Computational subtraction method Download PDFInfo
- Publication number
- WO2001054557A2 WO2001054557A2 PCT/US2001/012736 US0112736W WO0154557A2 WO 2001054557 A2 WO2001054557 A2 WO 2001054557A2 US 0112736 W US0112736 W US 0112736W WO 0154557 A2 WO0154557 A2 WO 0154557A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequences
- database
- sequence
- host organism
- microbe
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Definitions
- the invention relates to a method and system for detecting microbes harbored by a host organism.
- the invention relates to a method and system for detecting novel infectious disease organisms associated with the pathogenesis of human diseases.
- microorganisms Humans and animals are in continuous contact with microorganisms. Generally, because of the effectiveness of host defense mechanisms these microorganisms do not cause disease. However, some microorganisms (e.g., opportunistic pathogens) can become infective in particular types of individuals, such as those who are immunocornpromised. Still other microorganisms are extremely virulent upon contact. For example, microorganisms such as the Ebola virus are associated with close to 100% fatality rates.
- Cummings and Relman report using a DNA microarray comprising sequences from known pathogens to detect the presence pathogens in patient samples.
- the method will only be able to detect pathogens for which at least some sequence information is known.
- the invention provides a method of using a computer system to identify a microbe inhabiting a host organism which comprises the steps of obtaining sequence information from a plurality of sequences from at least one host organism and searching a database of host organism genomic sequences to determine the presence or absence of the plurality of expressed sequences in the database.
- the absence of at least one of the sequences in the database indicates that the at least one sequence is a candidate microbe sequence.
- Individual sequences can be searched sequentially; however, preferably, sets of sequences are searched at a time.
- the method comprises the steps of obtaining sequence information from a library of genomic DNA from a host organism and searching a database of genomic sequences from host organisms to determine the presence or absence of a sequence in the library in the database.
- a sequence that is present in the library but is absent in the database is identified as a candidate microbe sequence.
- the microbe can be a symbiotic organism, such as a mutualistic organism, a commensal organism or a parasitic organism.
- the microbe can also be a pathogen.
- Microbes which can be identified by the method include, but are not limited to, phage, bacteria, viruses, protozoa and fungi.
- the host organism can be a microorganism, a plant, or an animal, such as a mammal (e.g., a human being).
- the host organism can also be an insect, bird, or a fish.
- the plurality of sequences from the least one host organism comprises expressed sequences.
- the plurality of sequences can comprise EST and/or cDNA sequences.
- Sequence informat n relating to expressed sequences can be obtained by sequencing a library of expressed sequences from one or more host organisms. Additionally, or alternatively, expressed sequence information can be obtained from a database of expressed sequences, such as an EST or cDNA database.
- sequences from the at least one host organism suspected harboring a microbe are enriched for sequences which are present in the at least one host organism and which are not present in a plurality of host organisms which do not harbor the microbe.
- Enrichment can be performed using a subtractive hybridization assay, which can be a differential gene expression assay.
- Subtractive hybridization assays include, but are not limited to, representational difference analysis, SAGE, and suppression subtraction analysis.
- enrichment can be performed by electronically subtracting sequences from the at least one host organism which are stored in a first database from sequences of the plurality of organisms which are stored in a second database.
- the first and second databases are both expressed sequence databases and electronic subtraction is used to enrich for differentially expressed sequences which are expressed in the at least one host organism suspected of harboring a microbe and not expressed in the plurality of organisms which do not harbor the organism.
- enriched sequences are then compared to sequences in a host organism genomic database to identify sequences in the at least one host organism suspected of harboring a microbe which are not present in the host organism genomic database. These sequences are identified as candidate sequences belonging to a microbe.
- one or more of the following sequences are eliminated from the host organism genomic database: vector sequences, mitochondrial sequences, repetitive sequences, sequences from other species, low quality sequences, known host organism mRNA sequences, and combinations thereof.
- the method according to the invention is used to identify the sequence of a pathogen.
- the at least one host organism is an organism which has a pathogenic condition, and sequences from the host organism (expressed or genomic) are compared to genomic sequences in a database from host organisms which do not have the pathogenic condition.
- the pathogenic condition can be a disease selected from the group consisting of an inflammatory disease, an autoimmune disease, and a cell proliferative disease.
- the disease can be selected from the group consisting of: sarcoidosis, inflammatory bowel disease (e.g., such as Crohn's disease), atherosclerosis, multiple sclerosis, rheumatoid arthritis, type I diabetes mellitus, lupus erythematosus, Hodgkin's disease, and bronchioalveolar carcinoma.
- sarcoidosis e.g., such as Crohn's disease
- atherosclerosis e.g., such as Crohn's disease
- multiple sclerosis e.g., multiple sclerosis
- rheumatoid arthritis e.g., type I diabetes mellitus
- lupus erythematosus e.g., Hodgkin's disease
- bronchioalveolar carcinoma e.g., bronchioalveolar carcinoma.
- the invention provides a method of using a computer system to identify a microbe inhabiting a host organism, comprising the steps of: obtaining sequence information from a plurality of expressed sequences from at least one host organism; and searching a database of host organism genomic sequences to determine the presence or absence of the plurality of expressed sequences in the database, wherein the absence of an expressed sequence in the database identifies the expressed sequence as a candidate microbe sequence.
- the plurality of sequences are from a library.
- the library is a library of expressed sequences.
- the library comprises human sequences.
- the library comprises human sequences from one or more humans having a disease.
- the disease can be selected from the group consisting of an inflammatory disease, an autoimmune disease, and a cell proliferative disease.
- the disease is selected from the group consisting of sarcoidosis, inflammatory bowel disease, atherosclerosis, multiple sclerosis, rheumatoid arthritis, type I diabetes mellitus, lupus erythematosus, Hodgkin's disease, and bronchioalveolar carcinoma.
- the invention provides a method of using a computer system to identify a microbe inhabiting a host organism comprising the steps of: obtaining expressed sequence information from a plurality of sequences from at least one non-microbial host organism and searching a database of microbial sequences to determine the presence or absence of the plurality of expressed sequences in the database, wherein the presence of an expressed sequence in the database identifies the expressed sequence as a candidate microbe sequence.
- the plurality of sequences are from a library of expressed sequences.
- the library of sequences comprises sequences from one or more humans having a pathological condition, e.g., such as an infectious disease.
- Candidate sequences can be used as query sequences to search a database of microbial sequences, such as a database comprising bacterial and/or viral sequences.
- Candidate sequences also can be used to search databases comprising fungal sequences, parasitic sequences, and/or protozoan sequences.
- Candidate sequences also can be used as query sequences to search a non- redundant expressed sequence database comprising sequences from host organisms.
- Candidate sequences or their complements can be used to probe a library of sequences from at least one microbe to identify first hybridizing sequences, preferably sequences which are longer in length (e.g., numbers of bases) than the candidate sequence.
- Hybridizing sequences can in turn be used to identify second hybridizing sequences which are longer in length than the first hybridizing sequences. Overlapping sequences which are identified can be used to map the genomic structure of the microbe. In some aspects, the complement of the candidate sequence is hybridized to RNA from the microbe and used to generate cDNAs.
- the candidate sequence can be used to express a peptide; for example, by operably linking the candidate sequence to a promoter sequence in an expression vector.
- sequences identified by probing a library of sequences using the candidate sequence as a probe can be used to express one or more peptides.
- the peptides are antigenic.
- the peptides can be administered to a host organism to elicit a protective immune response.
- Nucleic acid sequences expressing the peptides can also be administered to the host organism to elicit a protective immune response to the peptides expressed by these sequences.
- the candidate sequence and/or other sequences identified by the candidate sequence can be used to detect the presence or absence of the microbe in a sample from the host organism.
- the hybridization of the candidate sequence and/or the other sequences to nucleic acid sequences in the sample from the host organism under stringent conditions can provide an indication of the presence of the microbe in the sample.
- detection of hybridization is used to provide a diagnosis that the host organism is infected by the pathogen.
- Peptides expressed by the candidate sequences and/or sequences identified using the candidate sequence can be used as antigens to generate antibodies which can also be used in diagnostic assays.
- an antibody which specifically binds to a peptide expressed by the candidate sequence and/or sequences identified using the candidate sequence is contacted with a sample from the host organism and binding of the antibody to a polypeptide within the sample provides an indication that the host organism harbors the microbe.
- the complementary sequence of a coding sequence of the candidate sequence or of another sequence identified by the candidate sequence is administered to a host organism harboring the microbe in an amount sufficient to prevent the expression of a polypeptide encoded by the candidate sequence or the sequence identified by the candidate sequence in the host organism.
- the complementary sequence can further comprise a cleaving moiety for cleaving RNA (e.g., the complementary sequence can be a ribozyme).
- a system for performing the method comprises a a first database comprising sequences from at least one host organism suspected of harboring a microbe and a second database comprising genomic sequences from host organisms not suspected of harboring the microbe.
- the system further comprises an information management system comprising a search and subtraction function for identifying sequences in the first database which are not present in the second database.
- the information management system comprises a sequence alignment function and can compare a set of sequences in the first database with all sequences in the second database.
- the system preferably comprises at least one user device connectable to the network and, preferably, a high speed, linear array processor.
- the system comprises a program capable of implementing an algorithm for simultaneously comparing a plurality of sequences in a first database with all sequences in a second database, e.g., such as the algorithm implemented by the MEGABLAST program.
- the system comprises a program which sequentially compares a plurality of individual sequences from the first database with a plurality, and preferably all, sequences in the second database.
- the system generates a result sequence set comprising sequences in the first database which do not match sequences in the genomic database.
- the system comprises an identity or scoring matrix which requires a score of greater than or equal to 60 (e.g., equivalent to thirty identical consecutive nucleotides).
- the system iteratively computes the degree of alignment between sequences in the first and second database. Iterative computing preferably is performed using progressively smaller word sizes.
- the system provides one or more programs for performing one or more electronic subtraction functions for eliminating any of: vector sequences, repetitive sequences, mitochondrial sequences, sequences from non-host organisms, low quality sequences, known host organism mRNA sequences, and combinations thereof, from the genomic database.
- the invention additionally provides a computer program product comprising a computer readable memory on which is embedded one or more programs for implementing any of the system functions and/or methods described above.
- Figure 1 is a flow chart demonstrating a method of computational subtraction analysis according to one embodiment of the invention to identify microbes harbored by a human being.
- Figure 2 is a schematic of a system according to one aspect of the invention for performing a computational subtraction analysis.
- the invention provides a method and system for performing computational subtraction to detect microbes harbored by a host organism.
- the microbes are pathogens and the system is used to identify sequences belonging to these pathogens which can then be used in methods of diagnosis and treatment.
- the microbes can be symbiotic organisms, such as commensal or parasitic organisms.
- candidate sequences identified as belonging to a microbe are used to isolate and clone additional sequences from the microbe.
- expressed sequence is a sequence which is transcribed.
- “Expressed sequence information” refers to the nucleotide sequence of an expressed sequence such as an RNA molecule, a cDNA molecule or a portion of genomic DNA which corresponds to an expressed sequence, e.g., such as those portions of a gene whose complement will become part of an RNA transcript.
- An expressed sequence may include both coding sequences (i.e., codons which are translated into polypeptide sequences) as well as non-coding sequence (i.e., untranslated sequences).
- a match between sequences refers to a level of sequence similarity equivalent to a BLAST score ranging from 40 (the equivalent of 20 consecutive identical nucleotides) to 2000 (the equivalent of 1000 consecutive identical nucleotides).
- a query sequence is "present” in a database if the database contains a sequence which matches the query sequence and is "absent” in a database if the database does not contain the matching sequence.
- a "low quality sequence” is a sequence which has greater than 2.5% N nucleotides, i.e., nucleotides whose identity cannot be determined at 95% confidence levels.
- symbiosis or a “symbiotic relationship” refers to an association between two organisms that live together. Symbiotic relationships include mutualistic relationships, commensalistic relationships, and parasitic relationships.
- mutualism or a “mutualistic relationship” refers to a mutually- beneficial association between two organisms.
- parasitism or a “parasitic relationship” refers to an association between two organisms in which one organism lives at the expense of the other organism and can cause damage to the other organism.
- a "pathogen” is an organism that can cause disease in another organism (e.g., the host organism).
- a "microbe” is any organism that can live and/or replicate within a host organism for at least a portion of its life cycle. While some microbes can exist for at least a portion of their life cycle intracellularly within the cells of a host organism, microbes which grow and/or replicate extracellularly are also encompassed within the scope of the invention. Microbes include, but are not limited to, phage, viruses, gram-positive and gram-negative bacteria, protozoa, small unicellular and multicellular eukaryotes (e.g., fungi, such as yeast), and the like. The term “microbe” and “microorganism” are used interchangeably herein.
- a "host organism” can any organism that can harbor (e.g., provide a habitat and/or nutrients for) another organism.
- a host can be a bacteria which harbors a phage, a simple eukaryote such as yeast which can harbor a bacteria, or a mammal such as a human being which can harbor by any of the foregoing.
- infection refers to the growth of a pathogen in a host organism.
- infectious disease refers to a disease that can be transmitted from host organism to host organism.
- a “carrier” refers to a patient who shows full recovery after infection and displaying symptoms but still carries and is capable of spreading the infectious form of a microbe.
- sequence identified by a candidate sequence refers to genomic sequences of microbes to which the candidate sequence or its complement hybridizes, or to which the latter genomic sequences hybridize, under stringent conditions.
- sequences are identified by the candidate sequence electronically, e.g., by searching a database of sequences from one or more microbes. Sequences which are identified as belonging to the same microbe as the organism from which the candidate sequence was obtained are said to be "identified by the candidate sequence.”
- stringent conditions refer to conditions under which a sequence will specifically bind to its complement to enable detection of the complement and to distinguish the complement from other nucleic acid sequences in a sample. Stringency conditions are described in Sambrook et al., In Molecular Cloning: A Laboratory Manual, 2" edition, vols. 1-2. Cold Spring Harbor Press (1989), the entirety of which is incorporated by reference herein. As used herein, stringent conditions require at least 80% base pairing, more preferably, at least 90-95% base pairing, and most preferably, at least 98% base pairing.
- a "fragment" of a candidate sequence or a sequence identified by the candidate sequence refers to a sequence which is shorter in length than the candidate sequence but sufficiently long to specifically hybridize to the candidate sequence. In one embodiment, a fragment ranges in size from 6 nucleotides to one less nucleotide than the full-length sequence.
- a promoter operably linked to another sequence refers to a promoter and/or promoter element and/or enhancer element(s) capable of inducibly or constitutively causing transcription of the other sequence.
- a "bodily fluid” refers to any of blood, plasma, sera, urine, CSF fluid, sputum, breast exudates, pus, and the like.
- computational subtraction or “electronic subtraction” or “filtering” refers to a computational method of eliminating records (e.g., such as sequences) from a database.
- the invention provides a systematic method to identify sequences of microbes capable of inhabiting a host organism.
- the microbes can be pathogenic and associated with an infectious disease.
- the microbes can also exist symbiotically within a host organism, e.g., in a mutualistic, commensal, or parasitic relationship within the host organism.
- the microbe can be any of a phage, a virus (e.g., an RNA or DNA virus), a bacteria, a protozoa, or other microorganism, a small unicellular or multicellular eukaryotic organism (e.g., a fungus, such as yeast), and the like.
- the host organism can be a microorganism, a fungus, an animal, or a plant.
- the host organism is a mammal, such as a human being or a domestic animal.
- the host organism can also be an insect, bird, or a fish.
- Host organism sequences can be obtained from particular tissues or cells of the host organism, or from cell lines derived from these tissues or cells, or from bodily fluids from the host organism.
- the invention provides a computational subtraction method for detecting and identifying microbe sequences.
- the method comprises comparing the sequence information of a plurality of sequences obtained from one or more host organisms with sequences in a genomic database of host sequences to identify which of the plurality of sequences are not found (i.e., do not match other sequences) in the database. Sequences which are not found in the database are identified as candidate sequences which are likely to belong to a microbe. Preferably, sequence information from sets of sequences (two or more sequences, and preferably ten or more sequences) are compared against the entire genomic database at a time.
- nucleotide sequence alignment algorithms can be used for this purpose, including those known in the art.
- the algorithm of Zhang et al., J Comput. Biol. 7(1-2): 203-14 (2000) which is embodied in one form in the MEGABLAST program, is used to compare sequences in an entire database of sequences from one or more host organisms (a "test database") against a genomic database. Smaller sets of sequences (e.g., at least two or at least ten) can also be compared.
- sequences from the plurality of sequences can be compared sequentially, individually, against genomic databases, e.g., such as by using the BLAST program described in Altschul et al., J. Mol.
- the genomic database is searched for short perfect matches of a set length (i.e., a word size). This enables a more rapid comparison than window/stringency matching.
- a word size ranging from 10-30 bases is used.
- a series of sequential searches is performed, using progressively smaller word sizes ranging from 30 to 10 bases. More preferably, a first search using a word search of 24 is performed, followed by a second word search of 20, followed by a third word search of 16, followed by a fourth word search of 12.
- a test sequence is shifted to the left or right of sequences in the database to identify maximal regions of alignment.
- a scoring matrix is used to identify the likelihood that one or more sequences in the test database do not match or are absent from the genomic database. Preferably, scores of greater than or equal to 60 are required. In one aspect, the scoring matrix assigns a match if there is a BLAST score ranging from 40 (the equivalent of 20 consecutive nucleotides) through 2000 (the equivalent of 1000 consecutive nucleotides). In another aspect, a matrix is used which assigns expectation values to matches and mismatches after alignment. Expectation values can be adjusted to require that a score does not grow simply by extending the alignment in a random way. For example, in one embodiment, expectation values of from 10 "20 - 10 "3 can be selected, and preferably, expectation values of 10 "7 are used.
- Gap values can be set to any desired value as is routine in the art (see, e.g., Smith et al., 1981, J. Mol. Evol. 18(1): 38-46, Levitt et al, 1998, Proc. Natl. Acad. Sci. USA 95(11): 5913-5920, the entireties of which are incorporated herein by reference.
- a results database is created, preferably comprising sequences from the test database which are ranked according to their alignment with sequences in the genomic database.
- sequences which show a high degree of alignment to genomic sequences from the host organism e.g., having at least 20-1000 consecutive identical nucleotides are not included in the results database or are subsequently removed from the results database.
- a subtraction operation is performed to remove sequences from either the genomic database and/or the test database and/or the results database.
- subtraction operations can be used to remove vector sequences, repetitive sequences, mitochondrial sequences, sequences from other species, low quality sequences, known host organism mRNA sequences, and the like. It should be obvious to those of skill in the art that the order of subtraction operations is not critical and that one or more subtraction operations can be used.
- a first candidate sequence set of sequences is again compared to the host organism genomic database, and/or one or more filtering databases using a reduced word size than was used in the previous series of operations, to generate a second candidate sequence set which is then stored in a results database.
- filtering databases a first candidate sequence set of sequences is again compared to the host organism genomic database, and/or one or more filtering databases using a reduced word size than was used in the previous series of operations, to generate a second candidate sequence set which is then stored in a results database.
- low quality sequences are removed, before or after filtering.
- the test database is an expressed sequence database of sequences from the host organism, such as an EST or cDNA database (e.g., a library database).
- EST or cDNA database e.g., a library database.
- databases are known in the art and include, but are not limited to, human expressed sequence databases such as the NCBI EST database, the LIFESEQTM, database (Incyte Pharmaceuticals, Palo Alto, Calif), the random cDNA sequence database from Human Genome Sciences, the EMEST8 database (EMBL, Heidelberg, Germany), and the like (see, also, Boguski et al., 1993, Nat. Genet. 4(4): 332-333, the entirety of which is incorporated by reference herein).
- the test database also can be generated by inputting and storing sequence information obtained by sequencing a plurality of nucleic acids from a library of expressed sequences from one or more host organisms suspected of harboring a microbe, into a user device of a system 1 (shown in Figure 2) as described further below.
- Libraries of expressed sequences can be generated using total RNA or polyadenylated RNA, and by using random priming or oligodT priming or a combination of these methods. Such techniques are known in the art. Libraries of particular interest include, but are not limited to, libraries of expressed sequences from one or more patients with an inflammatory disease, an autoimmune disease, and a cell proliferative disease.
- libraries of expressed sequences from one or more patients with a sease selected from the group consisting of sarcoidosis, inflammatory bowel disease (such as Crohn's disease), atherosclerosis, multiple sclerosis, rheumatoid arthritis, type I diabetes mellitus, lupus erythematosus, Hodgkin's disease, and bronchioalveolar carcinoma are used to obtain expressed sequence information.
- a sease selected from the group consisting of sarcoidosis, inflammatory bowel disease (such as Crohn's disease), atherosclerosis, multiple sclerosis, rheumatoid arthritis, type I diabetes mellitus, lupus erythematosus, Hodgkin's disease, and bronchioalveolar carcinoma are used to obtain expressed sequence information.
- a sease selected from the group consisting of sarcoidosis, inflammatory bowel disease (such as Crohn's disease), atherosclerosis, multiple sclerosis,
- test database can consist of entirely expressed sequences
- test database can also be a genomic sequence database.
- the test database can comprise sequence information from a plurality of sequences in a genomic library from one or more host organisms suspected of harboring a microbe.
- genomic sequence test databases are used to identify expressed sequences of microbes which are not polyadenylated (and/or which have integrated into the genome of the host organism), e.g., such as bacterial expressed sequences which would likely escape detection in expressed sequence libraries generated from polyadenylated RNA.
- the test database can be enriched for sequences which are found in host organism(s) suspected of harboring a microbe and which are not found in host organisms not harboring the microbe.
- the enrichment method comprises combining genomic test sequences with other genomic sequences (reference sequences), expressed test sequences with expressed reference sequences, or expressed test sequences with genomic reference sequences, and removing sequences which are common to both test and reference sequence sets, thereby enriching for test sequences which are not found in a reference set of sequences.
- a subtractive hybridization method is used to enrich for expressed sequences in a sample of nucleic acids from a host organism which is suspected of harboring a microbe and which are not expressed in host organisms which do not harbor the microbe.
- Samples can comprise total nucleic acids, polyadenylated RNA, or total RNA.
- Subtractive hybridization methods to enrich for differentially expressed sequences are known in the art and include, but are not limited to, SAGE (Serial Amplification of Gene Expression) (see, e.g., Velculescu et al., Science 270: 484 (1995) and U.S. Patent No.
- enrichment is performed electronically. For example, sequences from at least one host organism suspected of harboring a microbe stored in a test database can be subtracted from sequences in a "reference database" comprising sequences from a plurality of host organisms not harboring the microbe.
- the test database and reference database are both expressed sequence databases and electronic subtraction is used to enrich for differentially expressed sequences which are expressed in the at least one host organism and which are not expressed in the plurality of host organisms.
- the test database is a relational database which segregates particular types of sequences from other types of sequences within the database.
- expressed sequence information can be subdivided within an expressed sequence database according to a particular tissue, or cell type, or cell line, in which the sequence is expressed.
- particular portions of the test database can be compared to the genomic database during a search, sequentially, or simultaneously.
- a candidate sequence is identified, it is compared to a nucleotide sequence database comprising sequences from a plurality of species, to identify the microbial organism genus to which the sequence belongs or to which the species is related evolutionarily.
- a nucleotide sequence database comprising sequences from a plurality of species
- GenBank's nucleotide or "nt" database can be used.
- the candidate sequence can be used as a query sequence to search a database comprising only microbe sequences.
- the database is a microbial sequence database which can be a viral sequence database, or a fungal or parasite sequence database.
- Such databases are known in the art and include, but are not limited to, the Incyte Microbial Database, the TIGR Microbial Database, the TIGR Parasites Database, TIGR Fungal Database, and the TIGR Viral Genome Sequencing Project Database. This step can be used to identify or evaluate the taxonomic relationship between the candidate sequence and sequences of other known microbes for which genomic sequence information is known.
- candidate sequences are compared to sequences in a non- redundant RNA database to determine whether the sequence matches known host organism RNA molecules.
- a candidate sequence is conceptually translated to identify open reading frames and the am- no acid sequences of a polypeptide encoded by the candidate sequence can be used as a query sequence to search a protein sequence database comprising microbial sequences (i.e., the database can comprise multiple species sequences in addition to microbial sequences, such as the GenBank nr database, or the database can comprise exclusively microbial sequences).
- the sequence is also used as a query sequence to search a nucleotide sequence database comprising microbial sequences (e.g., such as the GenBank nt database or an exclusively microbial sequence database) to identify sequence whose conceptual translations match known microbial proteins but whose nucleotide sequences do not match microbial nucleotides.
- microbial sequences e.g., such as the GenBank nt database or an exclusively microbial sequence database
- These latter classes of sequences which are preferably stored in a results database, are likely to identify sequences belonging to microbes of the same genus as the microbe whose protein was identified as a match, but which do not necessarily represent microbes belonging to the same species, i.e., the sequences are likely to represent previously uncharacterized microbes.
- Celera Human Genome http://www.celera.com
- GIRI Genetic Information Research Institute
- TIGR databases e.g., the TIGR Human Gene Index Database
- the genomic database also can be generated by inputting and storing sequence information obtained by sequencing a plurality of nucleic acids from a genomic library of sequences from one or more host organisms which do not harbor microbes, into a user device of a system described further below.
- Genomic databases contemplated according to the invention include genomic sequence information from any of the host organisms described above, e.g., from a microorganism, a fungus (e.g., yeast), an animal (insect, bird, fish, or mammal, such as a human being or domestic animal) or a plant.
- a fungus e.g., yeast
- an animal insect, bird, fish, or mammal, such as a human being or domestic animal
- the invention further provides a system 1 for performing the computational subtraction analysis described above (see, Figure 2).
- the system 1 comprises a first database 2 (e.g., the test database) comprising sequences from at least one host organism suspected of harboring a microbe and a second database 3 comprising genomic sequences from host organisms not harboring the microbe.
- the system 1 further comprises an information management system 4 comprising a search function for identifying sequences in the first database 2 which are not present in the second database 3.
- the system 1 further comprises a program embodied in a computer readable medium for executing sequence alignments between at least a first sequence in the first database and a plurality, and preferably, all sequences in the second database.
- the program can be part of a server 5 (which also can store program applications required by the information management system 4) or part of a processor which is part of a user device 9.
- the user device 9 is in communication with the server 5 and/or other servers (not shown).
- the user device 9 can be a computer, a laptop, a wireless device, and the like.
- the system further can include additional user devices 9, output devices 6 (e.g., printers), and input devices (e.g., keyboards 7, mice, joysticks, and the like).
- the user device 9 preferably includes an interface 8 which can be displayed by the device 3 in response to the user accessing the system 1 to activate the information management system 5.
- the system 1 is preferably connectable to the network 10, enabling a user to access the system remotely from any user device 3 that is connectable to the network 10.
- sets of sequences (at least 2, at least 10, at least 100, or at least 500) in the test database 2 are compared at a single time with sequences in the genomic database 3.
- the information management system 5 comprises a program which is capable of implementing an algorithm, such as the one used in the MEGABLAST program for performing this function (see, e.g., Zhang et al., supra).
- the at least 2, at least 10, at least 100, or at least 500 sequences are compared individually and sequentially with sequences in the genomic database 3.
- the user device 3 or the server 5 comprises a high speed, linear array processor that can locate highly similar sequence segments (e.g., having a BLAST score of at least 40) from any at least two sequences.
- the processor comprises a high speed circuit chip that provides an equivalent of about 400,000 transistors or 100,000 gates, as described in U.S. Patent No. 5,964,860, for use in performing high speed sequence analyses.
- the system 1 further comprises an input device 7 that receives a set of sequences (either sequentially or simultaneously), a memory that stores the set of sequences (not shown), and a processor that transfers information from the set of sequences to the memory (e.g., in the form of data characters representing nucleotide bases in the set of sequences).
- the processor can be part of a user device 3, but is preferably part of the server 5.
- the system 1 further includes an identity matrix and a result sequence set (e.g., from the results database described above) (not shown), in which members of a set of compared sequences are ranked according to their degree of match to sequences in the genomic database 3.
- the results sequence set can include sequences which do not match sequences in the genomic database. Sequences which have the least amount of match (as determined using parameters established by the user) can be displayed on an interface 8 of the user device 8 in response to a user query to match sequences.
- the identity matrix is pre-selected by the user to require a match score of greater than or equal to 60 with a word size of between 10 and 30.
- the system 1 iteratively computes the degree of alignment between sequences using progressively smaller word sizes from 30 to 10, (e.g., first using a word size of 24, then a word size of 20, then a word size of 16, then a word size of 12).
- the score value remains the same and is some value greater than or equal to 60.
- the matrix is designed to eliminate low quality sequences (e.g., as determined using a base calling program such as PHRED), short sequences (less than 1 0 nucleotides), or sequences comprising a maximum number of ambiguous or unreadable nucleotides, such that there is a minimum length of quality sequences (e.g., sequences whose bases have a high confidence (at least 95%) of being accurate) of at least 50 nucleotides, and preferably at least 150 nucleotides.
- a base calling program such as PHRED
- short sequences less than 1 0 nucleotides
- sequences comprising a maximum number of ambiguous or unreadable nucleotides, such that there is a minimum length of quality sequences (e.g., sequences whose bases have a high confidence (at least 95%) of being accurate) of at least 50 nucleotides, and preferably at least 150 nucleotides.
- the system 1 provide one or more programs for performing one or more electronic subtraction functions analogous to an electronic subtractive hybridization.
- the system 1 is capable of eliminating, in response to a user command or in response to a pre-programmed set of instructions, any of: vector sequences, repetitive sequences, mitochondrial sequences, sequences from other species, low quality sequences, known host mRNA sequences (i.e., sequences known to belong to the host organism), and combinations thereof.
- the invention provides a computer program product comprising a computer readable memory on which is embedded one or more programs for implementing any of the system 1 functions described above.
- candidate sequences identified using the methods and system 1 described above are used as probes to probe a library of sequences from at least one microbe.
- the microbe can be a phage, a virus, a bacteria, a protozoa, or other microorganism, a small unicellular or multicellular eukaryotic organism, such as a fungi (e.g., yeast), and the like.
- Microbes can be cultured from host organisms to provide nucleic acids suitable for generating libraries using methods known in the art.
- the library is a genomic library.
- libraries can be generated from genomic or expressed sequences from which host organism sequences have been subtracted as described above.
- microbe sequences which are enriched in these samples can be ligated to linkers or adapters and amplified using primers which hybridize to these linkers or adapters.
- the linkers or adapters can include promoter sequences and microbe sequences can be amplified by providing polymerases which recognize these sequences and the appropriate nucleotides (e.g., using a transcription-based amplification system).
- candidate sequences are used to identify hybridizing sequences within the library which are longer in length than the candidate sequence, either at the 5' end or 3' end or both. These longer sequences are used, in turn, to identify other sequences which are preferably longer in length either at the 5' end or 3' end or both.
- Overlapping clones can be mapped using restriction enzyme analysis in combination with Southern analysis, and/or sequence analysis, to further characterize the genome structure of the microbe.
- genomic sequence information is inputted into a microbial genomic database (i.e., a database comprising only microbial sequences).
- Microbe sequences can be evaluated using a sequence analysis program such as the Gene Locator and Interpolated Markov Modeler, or GlimmerTM , program to identify coding sequences and to distinguish such sequences from non-coding DNA (see, e.g., Salzberg et al. Nucl. Acids Res. 26(2):544-S (1998). A version of Glimmer designed for small eukaryotes is described in Salzberg et al., Genomics 59: 24- 1 (1999). The entirety of these references is incorporated by reference herein.
- GlimmerTM Gene Locator and Interpolated Markov Modeler
- RNA samples are obtained from host organisms harboring the microbe (e.g., total RNA or polyA RNA if the microbe's RNA is polyadenylated) and a complement of the candidate sequence is used as a primer to generate cDN A molecules from the RNAs obtained.
- cDNAs are generated using a RACE method (see, e.g., Siebert et al., In Gene Cloning and Analysis by RT-PCR (BioTechniques Books, Natick, MA), pp. 305-320 (1998); Don et al, Nucl Acids Res. 19: 4008 (1991); Roux, PCR Methods Appl.
- sequences of cDNA clones are inputted into a results database for comparison to a database comprising microbial nucleotide sequences.
- candidate sequences and/or their complements can be used as primers in PCR or RT-PCR assays to identify additional microbial sequences of interest, for example, in nucleic acids obtained from cultures of these microbes.
- asymmetric or one-directional PCR can be performed using the candidate sequence or its complement as a single primer in primer extension reactions to identify microbial sequences flanking the primer sequence in the microbial genome or in a microbial transcript.
- One-directional PCR is known in the art and is described in U.S. Patent No. 6,184,025, for example, the entirety of which is incorporated by reference.
- At least two primers corresponding to the candidate sequence are used (e.g., primers capable of amplifying a nucleic acid fragment which comprises a subsequence of the candidate sequence of at least 50 nucleotides).
- the primers can be used to verify that the candidate sequences do not represent previously unsequenced host genomic DNA.
- the primers can be used in amplification reactions with host genomic DNA to verify that no amplification of host genomic sequences occurs.
- Candidate sequences, their complements, or sequences identified by candidate sequences can be used in hybridization assays to detect the presence of a microbe in a sample.
- diagnostic methods are described as "diagnostic", this does not imply that the method is necessarily used to determine the presence or absence of a pathogenic condition in an organism.
- diagnostic methods can be used to detect the presence of a commensal microbe within a sample, which can, in some instances, be desirable (e.g., such as when the microbe produces vitamins for the host).
- the hybridization assays can be used to detect the presence of one or more pathogens in a sample from an organism, and the results of such as assay can be used to provide treatment options for the organism.
- the hybridization assays are used to detect carrier organisms which are infected by pathogens but which do not show symptoms of a pathogenic condition.
- nucleic acids from a sample obtained from a host organism are contacted under stringent conditions with a test sequence derived from the candidate sequence.
- a test sequence derived from a candidate sequence refers to the candidate sequence itself, or a fragment thereof, or another sequence from the microbe which the candidate sequence has been used to identify, or to complements of any of these sequences.
- test sequence can be used as a diagnostic probe to detect expressed sequences or genomic sequences of the microbes in the sample by detecting the formation of a hybridization complex between the test sequence and a nucleic acid in the sample.
- test sequences are labeled with detectable labels.
- the test sequence is bound to a molecule which is detectably labeled or which itself can bind to detectably labeled molecule(s).
- the amount of test sequences bound is used to provide an indication of the number of microbes in a sample (for example, by providing a comparison to test samples comprising a known amount of microbes).
- either the sample sequences, or probe sequences, or both are amplified (e.g., by PCR, LCR or some other means of amplification) to increase the sensitivity of the assay.
- the test sequence itself is used as a primer in an amplification assay or a reverse transcription-based assay.
- Methods of labeling, hybridizing, amplifying and quantitating nucleic acids are known in the art.
- Probes can be obtained by restriction digestion of cloned sequences or can be synthesized using means known in the art.
- PNA probes can also be used to enhance the specificity of assays.
- panels of nucleic acid sequences representing different regions of the genome of the microbe can be used simultaneously or sequentially to detect the microbe.
- panels of nucleic acid probes from different microbes can be used in the diagnostic assays described above.
- the probes, or oligonucleotides comprising probe sequences can be immobilized on a substrate (e.g., a microarray) as described in Cummings et al., supra, to increase the throughput of diagnostic assays.
- the candidate sequence, or a sequence identified by the candidate sequence is used to express a peptide, for example, by operably linking the candidate sequence to a promoter sequence in an expression vector.
- the candidate sequence is linked in frame to a cleavable amino acid sequence whose expression is operably linked to the promoter sequence.
- a peptide can be synthesized using the predicted amino acid sequence of the candidate sequence or a coding sequence of the sequence identified by the candidate sequence.
- the peptide is an antigenic peptide.
- the peptide can be used to generate antibodies which specifically bind to the peptide and to polypeptides or proteins comprising the peptide.
- Antibodies encompassed within the scope of the invention include, but are not limited to, monoclonal antibodies, polyclonal antibodies, double chain antibodies, single chain antibodies, chimeric antibodies, antibody fragments comprising at least one antigen binding site, and the like.
- antibodies specific for peptides expressed by nucleic acids from the microbe are used in histological assays, such as immunohistochemistry, immunofluorescence, immunoelectron microscopy, and the like.
- antibodies can also be used in immunoassays as are routine in the art.
- the detection of binding of an antibody to a sample from a host organism suspected of harboring a microbe can be used to provide a diagnosis that the organism harbors the microbe (e.g., that the microbe may be found on or within its cells, or in bodily fluids from the organism).
- the antibodies according to the invention can be used to detect microbes which are shed by host cells and which may be present in bodily fluids outside of cells or in proximity to cells or tissues from the host organism, or to detect antigens which are presented after processing of polypeptides of a microbe by host cells (e.g., by host cell MHC class I molecules), or to detect microbes which typically exist extracellularly within a host organism, such as bacteria.
- host cells e.g., by host cell MHC class I molecules
- panels of antibodies specific for a single microbe can be used as probes, either simultaneously or sequentially.
- Panels of antibodies specific for a plurality of microbes can also be used.
- antibodies are arrayed on a substrate to increase the throughput of the analysis.
- peptides themselves can be used as diagnostic reagents.
- peptides can be reacted from sera from an organism suspected of containing a microbe to detect the presence of circulating antibodies which react with the peptides.
- the invention provides a sequence which is a complement or an antisense sequence of a coding sequence of the candidate sequence or of the coding sequence of another sequence which has been identified by the candidate sequence.
- the antisense sequence can be administered to a host organism in an amount sufficient to prevent the expression of a polypeptide encoded by the candidate sequence or the other sequence identified by the candidate sequence.
- Antisense nucleic acids according to the invention additionally can be modified to enhance their stability in vivo, as described in Agrarwal et al., Proc. Natl. Acad. Sci. USA 85: 7079 (1988), and Sarin et al., Proc. Natl. Acad. Sci. USA 85: 7448 (1988), for example, the entireties of which are incorporated herein by reference.
- Antisense nucleic acids also can be modified to include a cleaving agent for cleaving a molecule to which the antisense nucleic acid binds.
- the nucleic acid can be engineered to sequences which provide the function of a ribozyme.
- Antisense molecules can be administered directly to a target site.
- antisense molecules can be administered tooically (e.g.
- antisense molecules are administered to the patient enterally or parenterally.
- Antisense molecules can be administered with suitable carrier molecules to facilitate delivery to a target site (e.g., by complexing the molecules with liposomes) and/or can be bound to a targeting molecule (e.g., a ligand specific for a receptor expressed on the surface of a host cell infected by the microbe).
- the targeting molecule includes an intracellular localization signal for delivering the antisense molecule to the interior of the cell.
- candidate sequences can be used to generate peptides.
- the peptides are administered to the host organism in an amount sufficient to enable the host organism to mount a protective immune response against the microbe.
- the peptides are used as a vaccine.
- nucleic acid sequences which encode these peptides and which are operably linked to one or more promoter elements can be administered to the host organism in an amount sufficient to enable the host organism to mount a protective immune response against the microbe (e.g., providing a DNA vaccine).
- a protective immune response can include the production of macrophages which specifically recognize the microbe (e.g., during an extracellular portion of its life cycle) and/or the production of cells which produce neutralizing antibodies which specifically bind to the microbe and which prevent the microbe from infecting further cells.
- a plurality of peptides from the same microbe or a nucleic acid expressing the plurality of peptides is administered to the organism.
- the microbe is isolated and nucleic acids removed, and the microbe itself is administered to an organism to generate a protective immune response (see, e.g., as described in U.S. Patent No. 5,698, 430, the entirety of which is incorporated by reference herein). Examples
- EST library Unigene library #271
- This library contains 7,073 EST's. 6,752 of these EST's comprise at least 100 discrete, unambiguous 15-mers (e.g., sequences whose nucleotide identity can be assigned at greater than 95% confidence levels or 0% N's).
- a system 1 according to the invention was used to compare the sequences in the EST library against known human mRNA sequences, human repeat sequences, human mitochondrial sequences, the Human Genome Project (HGP) and Celera Genomics Human Genomic DNA sequences and to eliminate matching sequences. Matches within mouse genomic DNA sequences (Celera) were also searched for and removed under the assumption that these would represent unsequenced regions of the human genome.
- Ten primer sequences were capable of amplifying nucleic acids in all samples of human genomic DNA, while ten primer sequences could not amplify any samples, and two primers (corresponding to HPV sequences) were able to amplify only HeLa cell genomic DNA.
- the ten sequences which amplified all human genomic DNA samples are likely to represent previously unsequenced regions of the human genome, while those primer sequences unable to amplify sequences in any samples are likely to represent sequences brought together by splicing (and which are therefore too far apart in genomic DNA to be amplified), sequences of non-human origin, or sequencing errors.
- computational subtraction was used to scan existing EST databases for candidate microbial sequences using the same method as described in example 1.
- EST's in a NCBI EST database of 3,287,578 sequences were serially compared against filter databases using the MEGABLAST tool with a word-size of 24, to filter or subtract matching sequences.
- sequences were filtered through a known human mRNA database (the NCBI RefSeq human mRNA database), which after subtraction left 1,438,967 sequences, a human mitochondria database, which after subtraction left 1,409,118 sequences, a vector sequence database (the NCBI UniVec database), which after subtraction left 1,396,697 sequences, a human repetitive sequence database (GIRST HumRep), which after subtraction left 1,368,895 sequences, a human genome database, which after subtraction left 144,498 sequences, and a mouse genome database, which after subtraction left 137,01 1 sequences.
- a known human mRNA database the NCBI RefSeq human mRNA database
- a human mitochondria database which after subtraction left 1,409,118 sequences
- a vector sequence database (the NCBI UniVec database)
- GIRST HumRep) human repetitive sequence database
- a mouse genome database which after subtraction left 137,01 1 sequences.
- Sequences were subsequently tested by BLASTN searches against GenBank nt databases (i.e., a database comprising multiple species' sequences, including microbial sequences) using (using a word size of 16) and by BLASTX searches against the nr non-redundant protein databases (using a word size of 3).
- GenBank nt databases i.e., a database comprising multiple species' sequences, including microbial sequences
- BLASTX searches against the nr non-redundant protein databases (using a word size of 3).
- a results database of these matches is available at http://www.hcs.harvard.edu/ ⁇ weber/meyerson2/ nrnt.cgi, the entirety of which is incorporated by reference herein.
- EST sequences that passed all filters were compared to GenBank's nt database (a database representing multiple species) using the MEGABLAST algorithm. Alignments with a bit score of 100 or greater were categorized as "matching" those in the nt database. Sequences remaining after subtraction which match viral genome sequences are shown in Table 1. Included in these sequences were sequences belonging to a variety of pathogenic viruses. As shown in Table 1, the most common viral match was to Hepatitis B virus sequences, for which there were 33 EST matches in the databases. Thirty-two of these matches were derived from the library GKC which is made from normal liver tissue from a Chinese patient with hepatocellular carcinoma.
- Hepatitis B virus sequences are abundant in this library, representing 0.2% of the 16,743 total sequences in this library.
- Table 1 a variety of other pathogenic virus sequences including human papillomavirus; adenovirus; and a variety of herpesviruses, including cytomegalovirus, Epstein-Barr virus, and Kaposi's sarcoma herpesvirus; were identified by computational subtraction methods according to the invention.
- Table 2 summarizes sequences remaining after computational subtraction which match bacterial sequences. After identifying expressed sequences as candidate sequences not found in the human genome, these sequences were compared to the GenBank nt database using the BLASTX algorithm (BLAST 2.0) and alignments with a bit score of 100 or greater where catagorized as matches. Table 2 shows the ten most frequently appearing bacterial sequences after computational subtraction. As can be seen from Table 2, there are numerous matches to Pseudomonas aeruginosa sequences, a common pathogen as well as a commensal organism. In addition, there are numerous matches to other Pseudomonas species.
- Table 3 shows the more interesting category of bacterial matches which is shown in Table 3 which shows the set of bacterial sequences whose conceptual translations match known bacterial proteins and which do not share significant nucleotide sequence similarity with known bacterial nucleotide sequences. These sequences were identified by passing EST sequences through the filter databases described above and comparing remaining sequences to the GenBank nt database using the BLASTN algorithm (with a threshold of 60 bits) and to the non-redundant ("nr") protein database using the BLASTX algorithm (setting a threshold of 100 bits). EST's matching the nr database but not the nt database were categorized as "translation-only alignments.” These series of operations revealed numerous pathogens with matches only to translated sequences.
- the HeLa cell EST library analyzed is available as Library 271 (Stratagene_HeLa_cell_s3_S37216) in the UniGene resource at the NCBI web-site.
- the Celera draft of the human genom3 and the 3x coverage of shotgun sequence from the mouse genome were downloaded from Celera ' s website in January, 2001.
- RepBase6.2 was downloaded from the GIRI database on March 7, 2001.
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP01928651A EP1390895A4 (en) | 2001-04-19 | 2001-04-19 | Computational subtraction method |
CA002444604A CA2444604A1 (en) | 2001-04-19 | 2001-04-19 | Computational subtraction method |
PCT/US2001/012736 WO2001054557A2 (en) | 2001-04-19 | 2001-04-19 | Computational subtraction method |
AU2001255485A AU2001255485A1 (en) | 2001-04-19 | 2001-04-19 | Computational subtraction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2001/012736 WO2001054557A2 (en) | 2001-04-19 | 2001-04-19 | Computational subtraction method |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2001054557A2 true WO2001054557A2 (en) | 2001-08-02 |
WO2001054557A3 WO2001054557A3 (en) | 2002-03-21 |
Family
ID=21742516
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2001/012736 WO2001054557A2 (en) | 2001-04-19 | 2001-04-19 | Computational subtraction method |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1390895A4 (en) |
AU (1) | AU2001255485A1 (en) |
CA (1) | CA2444604A1 (en) |
WO (1) | WO2001054557A2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6996477B2 (en) * | 2001-04-19 | 2006-02-07 | Dana Farber Cancer Institute, Inc. | Computational subtraction method |
EP1723260A2 (en) * | 2004-02-17 | 2006-11-22 | Dana-Farber Cancer Institute | Nucleic acid representations utilizing type iib restriction endonuclease cleavage products |
-
2001
- 2001-04-19 AU AU2001255485A patent/AU2001255485A1/en not_active Abandoned
- 2001-04-19 WO PCT/US2001/012736 patent/WO2001054557A2/en active Application Filing
- 2001-04-19 CA CA002444604A patent/CA2444604A1/en not_active Abandoned
- 2001-04-19 EP EP01928651A patent/EP1390895A4/en not_active Ceased
Non-Patent Citations (2)
Title |
---|
See also references of EP1390895A2 * |
WAN ET AL.: 'Cloning differentially expressed mRNAs' NATURE BIOTECHNOLOGY vol. 14, December 1996, pages 1685 - 1691, XP002947129 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6996477B2 (en) * | 2001-04-19 | 2006-02-07 | Dana Farber Cancer Institute, Inc. | Computational subtraction method |
EP1723260A2 (en) * | 2004-02-17 | 2006-11-22 | Dana-Farber Cancer Institute | Nucleic acid representations utilizing type iib restriction endonuclease cleavage products |
EP1723260A4 (en) * | 2004-02-17 | 2008-05-28 | Dana Farber Cancer Inst Inc | Nucleic acid representations utilizing type iib restriction endonuclease cleavage products |
Also Published As
Publication number | Publication date |
---|---|
CA2444604A1 (en) | 2002-08-02 |
WO2001054557A3 (en) | 2002-03-21 |
AU2001255485A1 (en) | 2001-08-07 |
EP1390895A4 (en) | 2008-04-02 |
EP1390895A2 (en) | 2004-02-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lecompte et al. | Multiple alignment of complete sequences (MACS) in the post-genomic era | |
WO2020176620A1 (en) | Systems and methods for using sequencing data for pathogen detection | |
US6996477B2 (en) | Computational subtraction method | |
JP2003021630A (en) | Method of providing clinical diagnosing service | |
Pearl et al. | A rapid classification protocol for the CATH Domain Database to support structural genomics | |
Chang et al. | Including biological literature improves homology search | |
KR20200051714A (en) | HLA TISSUE MATCHING AND METHODS THEREFOR | |
Cissé et al. | Genomic insights into the host specific adaptation of the Pneumocystis genus | |
Kellam | Post‐genomic virology: the impact of bioinformatics, microarrays and proteomics on investigating host and pathogen interactions | |
Malik et al. | Structural and functional annotation of human FAM26F: a multifaceted protein having a critical role in the immune system | |
Gazi et al. | Functional, structural and epitopic prediction of hypothetical proteins of Mycobacterium tuberculosis H37Rv: An in silico approach for prioritizing the targets | |
WO2019242445A1 (en) | Detection method, device, computer equipment and storage medium of pathogen operation group | |
Faber-Hammond et al. | Pseudo-de novo assembly and analysis of unmapped genome sequence reads in wild zebrafish reveal novel gene content | |
EP3219809B1 (en) | Pathology determination assistance device, method, program and storage medium | |
EP1390895A2 (en) | Computational subtraction method | |
Das et al. | Identification of novel MET exon 14 skipping variants in non-small cell lung cancer patients: a prototype workflow involving in silico prediction and RT-PCR | |
Suhre | Genetic associations with ratios between protein levels detect new pQTLs and reveal protein-protein interactions | |
WO2021105005A1 (en) | Method and system for phenotypic profile similarity analysis used in diagnosis and ranking of disease-driving factors | |
US20020072862A1 (en) | Creation of a unique sequence file | |
Felice et al. | Pan-genomic analyses of 47 complete genomes of the Rickettsia genus and prediction of new vaccine targets and virulence factors of the species | |
Korth et al. | Unlocking the Mysteries of Virus‐Host Interactions: Does Functional Genomics Hold the Key? | |
Mou et al. | In Silico Functional Annotation of VP 128 Hypothetical Protein from Vibrio parahaemolyticus | |
Mukherjee et al. | Dual RNA-Seq meta-analysis in Plasmodium infection | |
Singh et al. | Investigation of hub genes and their nonsynonymous single nucleotide polymorphism analysis in Plasmodium falciparum for designing therapeutic methodologies using next-generation sequencing approach | |
JP7373843B2 (en) | Prediction device, prediction program, and prediction method for predicting infection-causing organisms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
TCAT | At: translation of claims filed | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2444604 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2001928651 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2001928651 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
NENP | Non-entry into the national phase |
Ref country code: JP |