AU2006214332B2

AU2006214332B2 - Replikin peptides and uses thereof

Info

Publication number: AU2006214332B2
Application number: AU2006214332A
Authority: AU
Inventors: Elenore S. Bogoch; Samuel Bogoch; Samuel Winston Bogoch; Anne-Elenore Elizabeth Borsanyl
Original assignee: BORSANYI ANNE ELENORE BOGOCH
Current assignee: BORSANYI ANNE ELENORE BOGOCH
Priority date: 2005-02-16
Filing date: 2006-02-16
Publication date: 2012-03-22
Anticipated expiration: 2026-02-16
Also published as: IL185308A0; CA2598381A1; SG160327A1; JP2008539164A; WO2006088962A9; NZ560415A; EP1859063A4; WO2006088962A3; AU2006214332A1; EP1859063A2; WO2006088962A2

Abstract

The present invention provides a new class of peptides related to rapid replication and high human mortality, and their use in diagnosing, preventing and treating disease including vaccines and therapeutics for emerging viral diseases and methods of identifying the new class of peptides and related structures.

Description

,I UW1IU.30+3 LJ~-UO-LUU ICIS06054 WO 200608u962 PC/US2006/005343 SYSTEMS AND METHODS FOR IDENTIFYING REPLIKIN SCAFFOLDS AND USES OF SAID REPLIUN SCAFFOLDS 10001] This application claims priority to U.S. Provisional Appin. Ser. No. 60/653,083, filed February 16, 2005, and is a continuation-in-part of U.S. AppIn. Ser. s No. 11/116,203, filed April 28, 2005, which claims priority to U.S. Provisional Appln. Ser. No. 60/565,847, filed April 28, 2004, and is a continuation-in-part of U.S. Appln. Ser. No. 10/860,050, filed June 4,2004, which claims priority to U.S. Provisional Applns. 60/531,686, filed December 23, 2003, 60/504,958, filed September 23, 2003, and 60/476,186, filed June 6, 2003, and is a continuation-in-part of U.S. Appin. Ser. to No. 10/189,437, filed July 8, 2002, which is a continuation-in-part of U.S. Appln. Ser. No. 10/105,232, filed March 26, 2002, which is a continuation-in-part of U.S. Appln. Ser. No. 09/984,057, filed October 26, 2001, which claims priority from U.S. Provisional Appins. 60/303,396, filed July 9, 2001, and 60/278,761, filed March 27, 2001. Each of the foregoing applications is incorporated herein by reference. 15 Technical Field of the Invention 100021 This invention relates generally to two newly discovered classes of peptides that share structural characteristics and the use of bioinformatics to search databases of amino acids, nucleic acids and otber biological information to identify shared structural characteristics. Replikins are a newly discovered class of peptides 20 that share structural characteristics and have been correlated with rapid replication of viruses and organisms. Replikin Scaffolds are a sub-set of the class of Replikin peptides. Exoskeleton Sciffolds are another newly discovered class of peptides that share structural characteristics and have been correlated with a decrease in replication. Background of the Invention 25 100031 Rapid replication is characteristic of virulence in certain bacteria, viruses and malignancies, but no chemistry common to rapid replication in different organisms has been described. The inventor:, have found a family of conserved small protein sequences related to rapid replication, Replikins. Such Replikins offer new targets for developing effective detection methods and therapies. There is a need in the art for 30 methods of identifying patterns of amino acids such as Replikins. Bioinformatic Identification of Amino Acid Sequences 10004] Identification of amino acid sequences, nucleic acid sequences and other biological structures may be aided with the implementation of bioinformatics. Publicly available databases containing amino acid and nucleic acid sequence information may AM1ENDEff ff- fIA/US WO 2006/088962 PCT/US2006/005343 be searched to identify and define Replikins, Replikin Scaffolds and Exoskeleton Scaffolds within representative proteins or protein fragments or genomes or genome fragments. [0005] Databases of amino acids and proteins are maintained by a variety of 5 research organizations, including, for example, the National Center for Biotechnology Information (NCBI) at the U.S. National Library of Medicine, and the Influenza Sequence Database at the Los Alamos National Laboratory. These databases are typically accessible via the Internet through web pages that provide a researcher with capabilities to search for and retrieve specific proteins. 10 Amino Acid Search Tools [00061 As is known in the art, databases of proteins and amino acids may be searched using a variety of database tools and search engines. Using these publicly available tools, patterns of amino acids may be described and located in many different proteins corresponding to many different organisms. Several methods and techniques 15 are available by which patterns of amino acids may be described. One popular format is the PROSITE pattern. A PROSITE pattern description may be assembled according to the following rules: (1) The standard International Union of Pure and Applied Chemistry (IUPAC) one-letter codes for the amino acids are used (see FIG. 12). 20 (2) The symbol 'x' is used for a position where any amino acid is accepted. (3) Ambiguities are indicated by listing the acceptable amino acids for a given position, between square parentheses '[ ]'. For example: [ALT] would stand for Alanine or Leucine or Threonine. 25 (4) Ambiguities are also indicated by listing between a pair of curly brackets '{ }' the amino acids that are not accepted at a given position. For example: {AM} stands for any amino acid except Alanine and Methionine. (5) Each element in a pattern is separated from its neighbor by a'-'. (6) Repetition of an element of the pattern can be indicated by following 30 that element with a numerical value or a numerical range between parentheses. Examples: x(3) corresponds to x-x-x, x(2,4) corresponds to x-x or x-x-x or x-x-x-x. 2 WO 20061088962 PCT/USZOUbIuu)34i (7) When a pattern is restricted to either the N- or C-terminal of a sequence, that pattern either starts with a '< symbol or respectively ends with a '>' symbol. (8) A period ends the pattern. 5 100071 Examples of PROSITE patterns include: PA [AC]-x-V-x(4)-{ED). This pattern is translated as: [Alanine or Cysteine]-any- Valine -any-any-any-any-(any but Glutarnic Acid or Aspartic Acid) PA <A-x-[STI(2)-x(0,1)-V. This pattern, which must be in the N terminal of the sequence ('<), is translated as: Alanine -any-[Serine or Threonine]-( 10 Serine or Threonine]-(any or none)-Valine. (00081 Another popular format for describing amino acid sequence patterns is the regular expression format that is familiar to computer scientists. In computer science, regular expressions are typically used to describe patterns of characters for which finite automata can be automatically constructed to recognize tokens in a language. Possibly 15 the most notable regular expression search tool is the Unix utility grep. 100091 In the context of describing amino acid sequence patterns, a simplified set of regular expression capabilities is typically employed. Amino acid sequence patterns defined by these simple regular expression rules end up looking quite similar to PROSITE patterns, both in appearance and in result. A regular expression description 20 for an amino acid sequence may be created according to the following rules: (1) Use capital letters for amino acid residues and put a "-" between two amino acids (not required). (2) Use "[...]" for a choice of multiple amino acids in a particular position. [LIVM] means that any one of the amino acids L, I, V, or M can be in that 25 position. (3) Use "{...}" to exclude amino acids. 'ITus, {CF} means C and F should not be in that particular position. In some systems, the exclusion capability can be specified with a "^" character. For example, ^G would represent all amino acids except Glycine, and [AILMV] would represents all amino acids except 1, L, M, 30 and V. (4) Use "x" or "X" for a position that can be any amino acid. (5) Use "(n)", where n is a number, for multiple positions. For example, x(3) is the same as "xxx". 3 AMENDE B - IUS uauuvii.-r IJUuuu PCT/US2006/005343 WO 2006/088962 (6) Use "(nJ,n2)" for multiple or variable positions. Thus, x(1, 4 ) represents "x" or "xx" or "xxx" or "xxxx". (7) Use the symbol ">" at the beginning or end of the pattern to require the pattern to match the N or C terminus. For example, ">MDEL" (SEQ ID NO: 5 13) finds only sequences that start with MDEL (SEQ ID NO: 13). "DEL>" finds only sequences that end with DEL. 100010] The regular expression, "[LIVM].-VIC]-x (2)-G-[DENQTA]-x-[GAC]-x (2)-[LIMFYJ(4)-x (2)-G" illustrates a 17 amino acid peptide that has: an L, 1, V, or M at position 1; a V, I, or C at position 2; any residue at positions 3 and 4; a G at position 10 5 and so on .... [00011) Other similar formats are in use as well. For example, the Basic Local Alignment Search Tool (BLAST) is a well-known system available on the Internet, which provides tools for rapid searching of nucleotide and protein databases. BLAST accepts input sequences in three formats: FASTA sequence format, NCBI Accession 15 numbers, or GenBank sequence numbers. However, these formats are even simpler in structure than regular expressions or PROSITE patterns. An example sequence in FASTA format is: >gil532319IpirTVFV2E:TVFV2E envelope protein ZLRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVHCTNLMNTTVTT 20 GLLLNGSYSENRT QIWQKHRTSNDSALILLNKHYNLTVTCKF3GNKTVLPVTIMAGLVFHS QKYNLRLRQAWC HFPSNWKGAWKEVKEEIVNLPKERYRGTNDPKRIFFQRQWGDPETANL WFNCHGEFFYCK 25 MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPGPCVQRTYVACHI RSVIIWLETISKK TYAPPREGHLECTSTVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRY KLVEITPIGF APTEVRRYTGGHERQKRVPFVXXX XXXXXYJOXXXXXXXXVQS 30 QHLLAGILQQQKNL LAAVEAQQQMLKLTIWGVK (SEQ ID NO: 14). 1000121 Features of the BLAST system include sequence comparison algorithms that are used to search sequence databases for regions of local alignments in order to detect 4.

WO 2006/088962 PCT/US2006/005343 relationships among sequences which share regions of similarity. However, the BLAST tools are limited in terms of the structure of amino acid sequences that can be discovered and located. For example, BLAST is not capable of searching for a sequence that has "at least one lysine residue located six to ten amino acid residues 5 from a second lysine residue," as required by a Replikin pattern, for example. Nor is BLAST capable of searching for amino acid sequences that contain a specified percentage or concentration of a particular amino acid, such as a sequence that has "at least 6% lysine residues." Need for Replikin Search Tools 10 [000131 As can be seen from its definition, a Replikin pattern description cannot be represented as a single linear sequence of amino acids. Thus, PROSITE patterns and regular expressions, both of which are well suited to describing ordered strings obtained by following logical set-constructive operations such as negation, union and concatenation, are inadequate for describing Replikin patterns. 15 [00014] In contrast to linear sequences of amino acids, a Replikin pattern is characterized by attributes of amino acids that transcend simple contiguous ordering. In particular, the requirement that a Replikin pattern contain at least 6% lysine residues, without more, means that the actual placement of lysine residues in a Replikin pattern is relatively unrestricted. Thus, in general, it is not possible to represent a Replikin 20 pattern description using a single PROSITE pattern or a single regular expression. [00015] Accordingly, there is a need in the art for a system and method to scan a given amino acid sequence and identify and count all instances of a Replikin pattern. Similarly, there is a need in the art for a system and method to search protein databases and amino acid databases for amino acid sequences that match a Replikin pattern. 25 Additionally, there is a need in the art for a generalized search tool that permits researchers to locate amino acid sequences of arbitrary specified length that includes any desired combination of the following characteristics: (1) a first amino acid residue located more than N positions and less than M positions away from a second amino acid residue; (2) a third amino acid residue located anywhere in the sequence; and (3) 30 the sequence contains at least R percent of an amino acid residue. Finally, the shortcomings of the prior art are even more evident in research areas relating to disease prediction and treatment. There is a significant need in the art for a system to predict in advance the occurrence of disease (for example, to predict strain-specific influenza 5 WO 2006/088962 PCT/US2006/005343 epidemics) and similarly to enable synthetic vaccines to be designed based on amino acid sequences or amino acid motifs that are discovered to be conserved over time and which have not been previously detectable by prior art methods of searching proteins and amino acid sequences. 5 SUMMARY OF THE INVENTION [000161 The present invention provides a method for identifying nucleotide or amino acid sequences that include a Replikin sequence. The method is referred to herein as a 3-point-recognition method. By use of the "3-point recognition" method, peptides comprising from 7 to about 50 amino acids including (1) at least one lysine residue 10 located six to ten amino acid residues from a second lysine residue; (2) at least one histidine residue; and (3) at least 6% lysine residues and having replication, transformation, or redox functions may be identified. [000171 An aspect of the present invention provides a method of identifying a Replikin Scaffold in a virus or organism comprising identifying a series of Replikin 15 Scaffold peptides comprising about 16 to about 30 amino acids comprising (1) a terminal lysine and a lysine immediately adjacent to said terminal lysine; (2) a terminal histidine and a histidine immediately adjacent to said terminal histidine, (3) a lysine within about 6 to about 10 amino acids from another lysine; and (4) at least 6% lysines. [00018] An aspect of the invention may provide a method of identifying a Replikin 20 Scaffold peptide in a virus or organism comprising about 16 to about 30 amino acids comprising (1) a terminal lysine and a lysine immediately adjacent to the terminal lysine; (2) a terminal histidine and a histidine immediately adjacent to the terminal histidine, (3) a lysine within about 6 to about 10 amino acids from another lysine; and (4) at least 6% lysines. 25 [00019] An aspect of the invention may also provide a method of making a preventive or therapeutic virus vaccine comprising identifying a Replikin Scaffold comprising about 16 to about 30 amino acids and synthesizing said Replikin Scaffold as a preventive or therapeutic virus vaccine wherein said Replikin Scaffold further comprises: (1) a terminal lysine and a lysine immediately adjacent to the terminal 30 lysine; (2) a terminal histidine and a histidine immediately adjacent to the terminal histidine; (3) a lysine within about 6 to about 10 amino acids from another lysine; and (4) at least 6% lysines. The Replikin Scaffold may contain influenza virus peptide Replikins. A Replikin Scaffold may further comprise a group of Replikins comprising: 6 I/USU6/03343 Z)-6-ZUU / WO 2006/088962 PCT/US2006/005343 (1) a terminal lysine and a lysine immediately adjacent to the terminal lysine; (2) a terminal histidine and a histidine immediately adjacent to the terminal histidine; (3) a lysine within about 6 to about 10 amino acids from another lysine; and (4) at least 6% lysines. 5 [000201 An aspect of the invention may provide a method of identifying an Exoskeleton Scaffold wherein a Replikin Scaffold is identified in a first strain of virus or organism and the Exoskeleton Scaffold is identified in a later-arising strain of said virus or organism wherein said Exoskeleton Scaffold comprises an amino acid sequence comprising the same number of amino acids as the Replikin Scaffold and 10 further comprising (1) two terminal lysines, (2) two terminal histidines, and (3) no lysine within about 6 to about 10 amino acids from another lysine. 1000211 In an aspect of the invention an isolated or synthesized influenza virus peptide is provided with from 7 to about 50 amino acids, at least one lysine residue located six to ten residues from a second lysine residue, at least one tmu dine residue 15 and at least 6% lysine residues. In a further aspect the peptide compris-s a terminal lysine. In yet a further aspect the peptide is present in an emerging strair -of influenza virus such as the influenza virus strain H5NI. 1000221 In another aspect of the invention an isolated or synthesized influenza virus peptide is provided comprising the H5N 1 peptide 20 KKNSTYPT[KRSYNNTNQEDLLVLWG[HH (SEQ ID NO: 15). 1000231 In another aspect of the invention, an isolated or synthesized influenza virus peptide is provided having about 16 to about 30 amino acids; a terminal lysine and a lysine immediately adjacent to the terminal lysine; a terminal histidine and a histidine immediately adjacent to the terminal histidine; a lysine within about 6 to about 10 25 amino acids from another lysine; and at least 6% lysines. 1000241 In another aspect of the invention, a preventive or therapeutic virus vaccine is provided having at least one isolated or synthesized peptide of influenza virus with at least one lysine residue located six to ten residues from a second lysine residue; at least one histidine residue; and at least 6% lysine residues. In a further aspect of the 30 invention the isolated or synthesized peptide is present in an emerging strain of influenza virus or is present in an H5N I strain of influenza virus. 1000251 In yet a further aspect of the invention, a preventive or therapeutic virus vaccine comprises the peptide KKNSTYPTIKRSYNNTNQEDLLVLWGIHH (SEQ ID NO: 15) having alternatively a synthetic UTOPE tail, an adjuvant, or a combination 7 AMENDEI91YE fE/US i/U IUD0 )343 4>Uo-LUU / PCT/US2006/005343 WO 2006/088962 thereof. In yet a further aspect, the preventive or therapeutic virus vaccine comprises a pharmaceutically acceptable carrier. 1000261 In a further aspect of the invention the preventive or therapeutic virus vaccine comprises the peptide 5 KKNSTYPTIKRSYNNTNQEDLLVLWGIHHUKKKHKKKKKHK (SEQ ID NO: 16) -KLH, where -KLH denotes a key limpet hemocyanin. 1000271 In yet another aspect of the invention a method of stimulating the immune system of a subject to produce antibodies to influenza virus is provided comprising administering an effective amount of at least one isolated or synthesized influenza virus 10 Replikin peptide comprising from 7 to about 50 amino acids comprising (1) at least one lysine residue located six to ten amino acid residues from a second lysine residue; (2) at least one histidine residue; and (3) at least 6% lysine residues. [000281 In a further aspect, in the method of stimulating the immune system the administered Replikin peptide may further comprise a pharmaceutically acceptable 15 carrier and/or adjuvant and prevent or treat an influenza infection. The method of stimulating the immune system may further comprise an isolated or synthesized influenza virus peptide present in an emerging virus or present in an H5N 1 strain of influenza virus. The method may further comprise administration of the peptide KKNSTYPTIKRSYNNTNQEDLLVLWGIHHKKKKHKKKKKHK (SEQ ID NO: 16) 20 -KLH, where -KLH denotes a key limpet hemocyanin. [00029] An aspect of the invention may also provide a method comprising: applying a plurality of criteria to data representing protein sequences; based on the criteria, identifying an arbitrary sub-sequence within the protein sequences; and outputting the identified sub-sequence to a data file; wherein the criteria include: a set {a}of amino 25 acids to be included in the sub-sequence; a set {b}of amino acids to be excluded from the sub-sequence; and a minimum and a maximum permissible gap between members of sets {a) and {b}. Within the method the protein sequences may be obtained via a network. An aspect of the invention may further comprise a machine-readable medium storing computer-executable instructions to perform such a method. 30 100030) An aspect of the invention may further provide a method comprising applying a plurality of criteria to data representing protein sequences; based on the criteria, identifying a sub-sequence within the protein sequences, the identified sub sequence having a predetermined allowed range of distance between lysine amino acids thereof, and a predetermined allowed range of distance between a histidine amino acid a AMNDE - fS /USt6/0343 23-U6-ZUU / PCTUS2006/005343 WO 2006/088962 and a farthest Lysine acid thereof; and outputting an identified sub-sequence to a data file. The protein sequences may be obtained via a network. A machine-readable medium storing computer-executable instructions may perform such a method. BRIEF DESCRIPTION OF THE DRAWINGS 5 1000311 Figure 1 is a bar graph depicting the frequency of occurrence of Replikins in various orgasms. [000321 Figure 2 is a graph depicting the percentage of malignin per milligram total membrane protein during anaerobic replication of glioblastoma cells. [000331 Figure 3 is a bar graph showing amount of antimalignin antibody produced 10 in response to exposure to the recognin 16-mer (SEQ ID NO: 4). [000341 Figure 4A is a photograph of a blood smear taken with ordinary and fluorescent light. Figure 4B is a photograph of a blood smear taken with ordinary and fluorescent light illustrating the presence of two leukemia cells. Figure 4C is a photograph of a dense layer of glioma cells in the presence of antimalignin antibody. 15 Figure 4D and Figure 4E are photographs of the layer of cells in Figure 4C taken at 30 and 45 minutes following addition of antimalignin antibody. 1000351 Figure 4F is a bar graph showing the inhibition of growth of small cell lung carcinoma cells in vitro by antimalignin antibody. [000361 Figure 5 is a plot of the amount of antimalignin antibody present in the 20 serum of patients with benign or malignant breast disease pre-and post surgery. 100037] Figure 6 is a box diagram depicting an aspect of the invention wherein a computer is used to carry out the 3-point-recognition method of identifying Replikin sequences. [000381 Figure 7 is a graph showing the concentration of Replikins observed in 25 hemagglutinin of influenza B and influenza A strain, HIN1, on a year by year basis from 1940 through 2001. 100039] Figure 8 is a graph of the Replikin concentration observed in hemagglutinin of influenza A strains, H2N2 and H3N2, as well as an emerging strain defined by its constituent Replikins, designated H3N2(R), on a year by year basis from 1950 to 2001. 30 1000401 Figure 9 is a graph depicting the Replikin count per year for several virus strains, including the coronavirus nucleocapsid Replikin, from 1917 to 2002. 1000411 :Figure -10 is a chart depicting the mean Replikin count per year for nucleocapsid coronavirus isolates. 9 AffiNDEff~aPT WUS iUSU6/U3'4i )-Uo-ZUU /P WO 2006/088962 PC/US2006/005343 100042] Figure 11 is a chart depicting the Replikin count per year for H5NI Hemagglutinins. [000431 Figure 12 is a conversion table that enables amino acids to be encoded as single alphabetic characters according to a standard supplied by the International Union 5 of Pure and Applied Chemistry (IUPAC). [000441 Figure 13 is a printout of a human cancer protein (SEQ ID NO: 472) obtained by searching a protein database maintained by the National Center for Biotechnology Information (NCBI). [000451 Figure 14 is a conversion table illustrating a correspondence between 10 nucleic acid base triplets and amino acids. [000461 Figure 15 is a graph illustrating a rapid increase in the concentration of Replikin patterns in the hemagglutinin protein of the H5N 1 strain of influenza prior to the outbreak of three "Bird Flu" epidemics. Figure 15 illustrates that increasing replikin concentration ('Replikin Count').of hemagglutinin protein of H5NI preceded 15 three 'Bird Flu' Epidemics. In H5N I influenza, the increasing strain-specific replikin concentration (Replikin Count, Means+/-SD) 1995 to 1997 preceded the Hong Kong H5NI epidemic of 1997 (El); the increase from 1999 to 2001 preceded the epidemic of 2001 (E2); and the increase from 2002 to 2004 preceded the epidemic in 2004 (E3). The decline in 1999 occurred with the massive culling of poultry in response to the El 20 epidemic in Hong Kong. 1000471 Figure 16 is a table illustrating selected examples of Replikin patterns that have been found in various organisms. -00048]. Figure 17 is a high-level block diagram of a computer system incorporating a system and method for identifying Replikin patterns in amino acid sequences, in 25 accordance with an aspect of the present invention. [000491 Figure 18 is a simple flow chart illustrating a general method for locating a Replikin pattern in a sequence of amino acids, according to an aspect of the present invention. 1000501 Figure 19 is a flow chart illustrating a generalized method for locating a 30 plurality of Replikin-like patterns in a sequence of amino acids, according to an aspect of the present invention. l0 AMNENE - S UsuO/UJi'+3 z JO-zuu t PCT/US2006/005343 WO 2006/088962 1000511 Figure 20 is a source code listing containing a procedure for discovering Replikin patterns in a sequence of amino acids, in accordance with an aspect of the present invention. [000521 Figure 21 is a table illustrating Replikin Scaffolds occurring in substantially 5 fixed amino acid positions in different proteins. Figure 21 discloses SEQ ID NOS: 473-531, respectively, in order of appearance. 100053] Figure 22 is a simplified block diagram of a computer system platform useful with the present invention. DETAILED DESCRIPTION OF THE INVENTION 10 Definitions 1000541 As used herein, the term "peptide" or "protein" refers to a compound of two or more amino acids in which the carboxyl group of one is united with an amino group of another, forming a peptide bond. The term peptide is also used to denote the amino acid sequence encoding such a compound. As used herein, "isolated" or "synthesized" 15 peptide or biologically active portion thereof refers to a peptide that is after purification substantially free of cellular material or other contaminating proteins or peptides from the cell or tissue source from which the peptide is derived, or substantially free from chemical precursors or other chemicals when chemically synthesized by any method, or substantially free from contaminating peptides when synthesized by recombinant gene 20 techniques. [00055] As used herein, a Replikin peptide or Replikin protein is an amino acid sequence having 7 to about 50 amino acids comprising: (1) at least one lysine residue located six to ten amino acid residues from a second lysine residue; 25 (2) at least one histidine residue; (3) at least 6% lysine residues. Similarly, a Replikin sequence is the amino acid sequence encoding such a peptide or protein. [000561 As used herein, an "earlier-arising" virus or organism is a specimen of a 30 virus or organism collected from a natural source of the virus or organism on a date prior to the date on which another specimen of the virus or organism was collected. A "later-arising" virus or organism is a specimen of a virus or organism collected from a natural source of the virus or organism on a date subsequent to the date on which another specimen of the virus or organism was collected. 1 AMNDETEE

&-'US

WO 2006/088962 PCT/US2006/005343 [00057] As used herein, "emerging strain" as used herein refers to a strain of a virus, bacterium, fungus, or other organism identified as having an increased increasing concentration of Replikin sequences in one or more of its protein sequences relative to the concentration of Replikins in other strains of such organism. The increase or 5 increasing concentration of Replikins occurs over a period of at least about six months, and preferably over a period of at least about one year, most preferably over a period of at least about three years or more, for example, in influenza virus, but may be a much shorter period of time for bacteria and other organisms. [00058] As used herein, "mutation" refers to change in this structure and properties 10 of an organism caused by substitution of amino acids. In contrast, the term "conservation" as used herein, refers to conservation of particular amino acids due to lack of substitution. [00059] As used herein, "replikin count" refers to the number of replikins per 100 amino acids in a protein or organism. A higher replikin count in a first strain of virus 15 or organism has been found to correlate with more rapid replication of the first virus or organism as compared to a second, earlier- or later-arising strain of the virus or organism having a lower replikin count. [00060] As used herein "Replikin Scaffold" refers to a series of conserved Replikin peptides wherein each of said Replikin peptide sequences comprises about 16 to about 20 30 amino acids and further comprises: (1) a terminal lysine; (2) a terminal histidine and a histidine immediately adjacent to the terminal histidine; (3) a lysine within 6 to 10 amino acid residues from another lysine; and (4) about 6% lysine. "Replikin Scaffold" peptides may comprise an additional lysine immediately adjacent to the terminal lysine. "Replikin Scaffold" also refers to an individual member or a plurality 25 of members of a series of a "Replikin Scaffold." Identification of Replikins [000611 The identification of a new family of small peptides related to the phenomenon of rapid replication, referred to herein as Replikins, provides targets for detection of pathogens in a sample and developing therapies, including vaccine 30 development. In general, knowledge of and identification of this family of peptides enables development of effective therapies and vaccines for any organism that harbors Replikins. Identification of this family of peptides also provides for the detection of viruses and virus vaccine development. 12 WO 2006/088962 PCT/US2006/005343 [00062] For example, identification of this family of peptides provides for the detection of influenza virus and provides new targets for influenza treatment and vaccines including treatment and vaccines for influenza H5N 1. Further examples provided by the identification of this family of peptides include the detection of 5 infectious disease Replikins, cancer immune Replikins and structural protein Replikins. [00063] Rapid replication is characteristic of virulence in certain bacteria, viruses and malignancies, but no chemistry common to rapid replication in different organisms has been described. We have found a family of conserved small protein sequences related to rapid replication, which we have named Replikins. Such Replikins offer new 10 targets for developing effective detection methods and therapies. The first Replikin found was the glioma Replikin, which was identified in brain glioblastoma multiforme (glioma) cell protein, called malignin. [00064] Hydrolysis and mass spectrometry of malignin revealed the novel 16-mer peptide sequence which contains the glioma Replikin. This Replikin was not found in 15 databases for the normal healthy human genome and therefore appeared to be derived from some source outside the body. [00065] We have devised an algorithm to search for the glioma Replikin or homologue thereof Homologues were not common in over 4,000 protein sequences, but were found, surprisingly, in all tumor viruses, and in the replicating proteins of 20 algae, plants, fungi, viruses and bacteria. [00066] We have identified that both 1) Replikin concentration (number of Replikins per 100 amino acids) and 2) Replikin composition correlate with the functional phenomenon of rapid replication. These relationships provide functional basis for the determination that Replikins are related quantitatively as well as qualitatively to the rate 25 of replication. [000671 The first functional basis for Replikins role to rapid replication was discovered by the Applicants in glioma replication. The fact that glioma malignin was found to be enriched ten-fold compared to the five-fold increase in cell number and membrane protein concentration in rapid replication of glioma cells suggests an integral 30 relationship of the Replikins to replication. When the glioma Replikin was synthesized in vitro and administered as a synthetic vaccine to rabbits, abundant antimalignin antibody was produced. This establishes the antigenic basis of the antimalignin antibody in serum (AMAS) test, and provides the first potential synthetic cancer 13 WO 2006/088962 PCT/US2006/005343 vaccine and the prototype for Replikin vaccines in other organisms. With the demonstration of this natural immune relationship of the Replikins to replication and this natural immune response to cancer Replikins, which overrides cell type, based upon the shared specificity of cancer Replikins and rapid replication, both passive 5 augmentation of this immunity with antimalignin antibody and active augmentation with synthetic Replikin vaccines now is possible. [00068] The relationship between the presence of antimalignin antibody and survival in patients was shown in a study of 8,090 serum specimens from cancer patients. The study showed that the concentration of antimalignin antibody increases with age, as the 10 incidence of cancer in the population increases, and increases further two to three-fold in early malignancy, regardless of cell type. In vitro, the antimalignin antibody is cytotoxic to cancer cells at picograms (femtomoles) per cancer cell, and in vivo the concentration of antimalignin antibody relates quantitatively to the survival of cancer patients. As shown in glioma cells, the stage in cancer at which cells have only been 15 transformed to the immortal malignant state but remain quiescent or dormant, now can be distinguished from the more active life-threatening replicating state, which is characterized by the increased concentration of Replikins. In addition, clues to the viral pathogenesis of cancer may be found in the fact that glioma glycoprotein 1 OB has a 50% reduction in carbohydrate residues when compared to the normal 1 OB. This 20 reduction is associated with virus entry in other instances, and so may be evidence of the attachment of virus for the delivery of virus Replikins to the 1 OB of glial cells as a step in the transformation to the malignant state. [000691 Our study concerning influenza virus hemagglutinin protein sequences and influenza epidemiology over the past 100 years has provided a second functional basis 25 for the relations of Replikins to rapid replication. Only serological hemagglutinin and antibody classification, but no strain-specific conserved peptide sequences have previously been described in influenza. Further, no changes in concentration and composition of any strain-specific peptide sequences have been described previously that correlate with epidemiologically documented epidemics or rapid replication. In 30 this study, a four to ten-fold increase in the concentration of strain-specific influenza Replikins in one of each of the four major strains, influenza B, (A)HIN1, (A)H2N2 and, (A)H3N2 is shown to relate to influenza epidemics caused by each strain from 1902 to 2001. 14 WO 2006/088962 PCT/US2006/005343 [00070] We then showed that these increases in concentration are due to the reappearance of at least one specific Replikin composition from 1 to up to 64 years after its disappearance, plus the emergence of new strain-specific Replikin compositions. Previously, no strain-specific chemical structures were known with 5 which to predict the strains that would predominate in coming influenza seasons, nor to devise annual mixtures of whole-virus strains for vaccines. The recent sharp increase in H3N2 Replikin concentration (1997 to 2000), the largest in H3N2's history, and the reappearance of specific Replikin compositions that were last seen in the high mortality H3N2 pandemic of 1968, and in the two high mortality epidemics of 1975 and 1977, 10 but were absent for 20-25 years, together may be a warning of coming epidemics. This high degree of conservation of Replikin structures observed, whereby the identical structure can persist for 100 years, or reappear after an absence of from one to 64 years, indicate that what was previously thought to be change due to random substitution of amino acids in influenza proteins is more likely to be change due to an organized 15 process of conservation of Replikins. [00071] The conservation of Replikins is not unique to influenza virus but was also observed in other sources, for example in foot and mouth disease virus, type 0, HIV tat, and wheat. [00072] A third functional basis for Replikins' role in rapid replication is seen in the 20 increase in rapid replication in HIV. Replikin concentration was shown to be related to rapid replication in HIV. We found the Replikin concentration'in the slow growing low-titre strain of HIV (NS 1, "Bru"), which is prevalent in early stage infection, to be one-sixth of the Replikin concentration in the rapidly-growing high-titre strain of HIV (SI, "Lai")(prevalent in late stage HIV infection). 25 [000731 Further examples demonstrate the relationship of Replikins to rapid replication. In the "replicating protein," of tomato leaf curl gemini virus, which devastates tomato crops, the first 161 amino acids, the sequence that has been shown to bind to DNA, was shown to contain five Replikins. In malaria, legendary for rapid replication when trypanosomes are released from the liver in the tens of thousands from 30 one trypanosome, multiple, novel, almost 'flamboyant' Replikin structures have been found with concentrations of up to 36 overlapping Replikins per 100 amino acids. [00074] The conservation of any structure is critical to whether that structure provides a stable invariant target to attack and destroy or to stimulate. When a structure 15 WO 2006/088962 PCT/US2006/005343 is tied in some way to a basic survival mechanism of the organism, the structures tend to be conserved. A varying structure provides an inconstant target, which is a good strategy for avoiding attackers, such as antibodies that have been generated specifically against the prior structure and thus are ineffective against the modified form. This 5 strategy is used by influenza virus, for example, so that a previous vaccine may be quite ineffective against the current virulent virus. Replikins as Stable Targets for Treatment [00075] Both bacteria and HIV have both Replikin and non-Replikin amino acids. In HIV, for example, there has been a recent increase in drug-resistance from 9% to 10 13% due to mutation, that is, substitution of amino acids not essential to the definition of the Replikin structure. (See detailed analysis of TAT protein of HIV discussed herein). In bacteria, the development of 'resistant strains' is due to a similar mechanism. However, we have found that Replikin structures do not mutate or change to the same degree as non Replikin amino acids (see also discussion of foot and mouth 15 disease virus conservation of Replikins discussed herein; further see discussion of conservation of coronavirus Replikins discussed herein). The Replikin structures, as opposed to the non-Replikin structures are conserved and thus provide new constant targets for treatment. [00076] Certain structures too closely related to survival functions apparently cannot 20 change constantly. Because an essential component of the Replikin structure is histidine (h), which is know for its frequent binding to metal groups in redox enzymes and probable source of energy needed for replication, and since this histidine structure remains constant, this structure remains all the more attractive a target for destruction or stimulation. 25 [00077] From a proteomic point of view, the inventors' construction of a template based on the newly determined glioma peptide sequence led them to the discovery of a wide class of proteins with related conserved structures and a particular function, in this case replication. Examples of the increase in Replikin concentration with virulence of a disease include, influenza, HIV, cancer and tomato leaf curl virus. This newly 30 recognized class of structures is related to the phenomenon of rapid replication in organisms as diverse as influenza, yeast, algae, plants, the gemini curl leaf tomato virus, HIV and cancer. 16 P/US06/05343 25-06-2007 WO 2006/088962 PCrITS2006IO05343 [000781 Replikin concentration and composition provide new quantitative methods to detect and control the process of replication, which is central to the survival and dominance of each biological population. The sharing of immunological specificity by diverse members of the class, as demonstrated with antimalignin antibody for the 5 glioma and related cancer Replikins, suggests that B cells and their product antibodies may recognize Replikins by means of a similar recognition language. [000791 Examples of peptide sequences of cancer Replikins or as containing a Replikin, i.e., a homologue of the glioma peptide, kagvaflhkk (SEQ ID NO: 1), may be found in such cancers of, but not limited to, the lung, brain, liver, soft-tissue, salivary 10 gland, nasopharynx, esophagus, stomach, colon, rectum, gallbladder, breast, prostate, uterus, cervix, bladder, eye, forms of melanoma, lymphoma, leukemia, and kidney. 1000801 Replikins provide for: 1) detection of pathogens by qualitative and quantitative determinations of Replikins; 2) treatment and contro' z:' broad range of diseases in which rapid replication is a key factor by targeting native Aeplikins and by is using synthetic Replikins as vaccines; and 3) fostering increased grow i rates of algal and plant foods. [000811 The first Replikin sequence to be identified was the cancer cell Replikin found in a brain cancer protein, malignin, which was demonstrated to be enriched ten fold during rapid anaerobic replication of glioblastoma multiforme (glioma) cells. 20 (Figure 2) Malignin is a 1OKDa portion of the 250 KDa glycoprotein 10B, which was isolated in vivo and in vitro from membranes of glioblastoma multiforme (glioma) cells. Hydrolysis and mass spectroscopy of malignin revealed a 16-mer peptide sequence, ykagvaflhkkndide (SEQ ID NO:4), which is referred to herein as the glioma Replikin and which includes the shorter peptide, kagvaflhkk (SEQ ID NO: 1), both of 25 which apparently are absent in the normal human genome. 17 AMENDED Yff-V S UuoIva'j I.) -VV-4vv ' PCT/US2006/005343 WO 2006/088962 Table 1 16-mer peptide sequence YKAGVAFLHKKNDIDE (SEQ ID NO: 4) obtained from malignin by hydrolysis and mass spectrometry 5 Fra eutS vence Identmecd -C Auto- Micro Shydrolysis of waved Malignin 30 seconds immobilized on - bromnoacetyl ellulose -1-3 Oyka(g) .5 + - -5 Oykagv(a) 2-6 (y)kagva(f)+ 2-7 (y)kagvaf(l) v. . 4-15 (a)gvaflhkk(n) 5-7 (8)vaf( 1) VV.6-7 (vag(1) -- 6-10 (v)aflhk(k)+ 6-,' 10 (v)aflhk(k) 6-12 (v)aflhkknd 6-12 (v)afhkkn(d) 7-8 (a)fl(h) 10-16 (h)kkndideo+ 11-4 k~ndi(d)+ 12-15(k)ndid(e) 1000821 When the 16-mer glioma Replikin was synthesized and injected as a synthetic vaccine into rabbits, abundant antimalignin antibody was produced. (Bogoch 10 et al., Cancer Detection and Prevention, 26 (Suppl. 1): 402 (2002)). The concentration of antimalignin antibody in serum in vivo has been shown to relate quantitatively to the survival of cancer patients. (Bogoch et al., Protides of Biological Fluids, 31:739-747 (1984). In vitro antimalignin antibodies have been shown to be cytotoxic to cancer cells at a concentration of picograms (femtomolar) per cancer cell. (Bogoch et al., 15 Cancer Detection and Prevention, 26 (Suppl. 1): 402 (2002). 1s AMENDEfEE 4 - 'RUS WO 2006/088962 PCT/US2006/005343 [00083] Studies carried out by the inventors showed that the glioma Replikin is not represented in the normal healthy human genome. Consequently, a search for the origin and possible homologues of the Replikin sequence was undertaken by analysis of published sequences of various organisms. 5 [00084] By using the 16-mer glioma Replikin sequence as a template and constructing a recognition proteomic system to visually scan the amino acid sequences of proteins of several different organisms, a new class of peptides, the Replikins, was identified. The present invention provides a method for identifying nucleotide or amino acid sequences that include a Replikin sequence. The method is referred to 10 herein as a 3-point-recognition method. The three point recognition method comprises: a peptide from 7 to about 50 amino acids including (1) at least one lysine residue located six to ten amino acid residues from a second lysine residue; (2) at least one histidine residue; and (3) at least 6% lysine residues. (Replikin). These peptides or proteins constitute a new class of peptides in species including algae, yeast, fungi, 15 amoebae, bacteria, plant, virus and cancer proteins having replication, transformation, or redox functions. Replikin peptides have been found to be concentrated in larger 'replicating' and 'transforming' proteins (so designated by their investigators, See Table 2) and cancer cell proteins. No sequences were found to be identical to the malignin 16-mer peptide. 20 [00085] The present invention further provides a method for identifying nucleotide or amino acid sequences that include a Replikin sequence comprising from 7 to about 50 amino acids including (1) at least one first lysine located at either terminus of the isolated or synthesized peptide, (2) a second lysine located six to ten residues from the first lysine residue; (3) at least one histidine; and (4) at least 6% lysines. In another 25 aspect of the invention the isolated or synthesized peptides are influenza virus peptides. In yet another aspect of the invention, the isolated or synthesized peptides are H5N1 influenza virus peptides. 19 WO 2006/088962 PCT/US2006/005343 Table 2 Examples of Replikins in various organisms - prototype: Glioma Replikin* KAGVAFLHKK (SEQ ID NO:1) 5 Algae: SEQ ID NO: 34 Caldophera prolifera kaskftkh 35 Isolepisprolifera kaqaetgeikgh Yeast: 36 Schizosaccharomyces pombe ksfkypkkhk 37 Oryza sativa kkaygnelhk 2 Sacch. cerevisiae replication binding protein hsikrelgiifdk Fungi: 38 Isocitrate lyase ICI 1,Penicillium marneffei kvdivthqk 39 DNA-dependent RNA polymerase 11, Diseula dcstructiva kleedaayhrkk 40 Ophiostoma novo-ulm 1,RNA in Dutch elm disease fungus kvilplrgnikgiffkh Amoeba: 41 Entamoeba invadens, histone H2B klilkgdlnkh Bacteria: 42 Pribosomal protein replication factor, Helicobacter pylori ksvhaflk Replication-associated protein Staph. aureus 10 Mycoplasma pulmonic, chromosome replication kkektthnk 43 Macrophage infectivity potentiator, L. legionella kvhffqlkk 90 Bacillus anthracis kihlisvkk 91 Bacillus anthracis hvkkekeknk 92 Bacillus anthracis khivkievk 93 Bacillus anthracis kkkkikdiygkdallh 94 Bacillus anthracis kwekikqh 95 Bacillus anthracis kklqipppiepkkddiih 96 Bacillus anthracis hnryasnivesayllilnew knniqsdlikk 97 Bacillus anthracis havddyagylldknqsdlv tnskk 98 Bacillus anthracis haerlkvqknapk Plants: 44 Arabidopsis thaliana, prolifera kdhdfdgdk 45 Arabidopsis thaliana, cytoplasmic ribosomal kmkglkqkkah 46 Arabidopsis thaliana, DNA binding protein kelssttgeksh Viruses: 9 Replication associated protein A [Maize streak virus] kekkpskdeimrdiish 11 Bovine herpes virus 4, DNA replication protein hkinitngqk 12 Meleagrid herpesvirus 1, replication binding protein hkdlyrllmk 47 Feline immunodeficiency hlkdyklvk 3 Foot and Mouth Disease (0) hkqkivapvk 5 HIV Type I kcfncgkegh 7 HIV Type 2 kcwncgkegh 99 Small Pox Virus (Variola) khynnitwyk 100 Small Pox Virus (Variola) kysqtgkeliih 101 Small Pox Virus (Variola) hyddvrikndivvsrck 102 Small Pox Virus (Variola) hrfklildski 103 Small Pox Virus (Variola) kerghnyyfek Tumor 48 Rous sarcoma virus tyrosine-protein kinase kklrhek Viruses: 49 v-yes, avian sarcoma kklrhdk 50 c-yes, colon cancer, malignant melanoma kklrhdk 51 v-srcC, avian sarcoma kklrhek 52 c-src, colon, mammary, panrcreatic cancer kklrhek 53 Neuroblastoma RAS viral (v-ras) oncogene kqahelak 54 VPI (major capsid protein) [Polyamavirus sp.] kthrfskh 55 Sindbis knihekik 56 El [Human papilloamavirus type 71] khrpllqlk 57 v-erbB from AEV and c-erb kspnhvk 58 v-fms (feline sarcoma) knihlekk 59 c-fis (acute and chronic myelomonocytic tumors) knihlekk 60 large t-antigen I [Polyomavirus sp.l kphlaqslek 61 middle t-antigen [Polyomavirus sp,l- kqhrelkdk 62 small t-antigen [Polyomavirus spJ, kqhrelkdk 20 /U 0U )i L)-Uo-Luu I PCT/US20061005343 WO 2006/088962 63 v-abl, marine acute leukemia kvpvlisptlkh 64 Human T-cell lymphotropic virus typo 2 ksillevdkdish 65 c-kit, GI tumOrs small cell lung carcinoma kagitimyyh' is Hepatitis C hyppkpgclvpak Trans- ' 66 Transforming protein myb ksgkhlgk Forming 67 Transforming protein myc, Burkitt lymphoma krreqlkhk Proteins: 68 Ras-elated GW-binding protein kfevikvih 69 Transforming protein raS(teratocarcinoma) kkkhtykk 70 TRAF-associatcd NF-kB activator TANK kaqkdhlsk 71 RFP transforming protein hlkrvkdlkk 72 Transforming proteinD (S.C.) kygspkhrlik 73 Papilloma virus type 11. transforming protein kikhilgkarfik 74 Protein tryosine kinase (EC 2.7.1.112sik kgdhvkhykirk 75 Transforming protein (exl(-)) keklrdvmvdrhk 76 Transforming protein (N-myc) klqarqqqllkkieh 77 Fibroblast growth factor 4 (Kaposi sarcoma) kkgnrvspmkth Cancer 78 Matrix metaloproteinase 7 (uterine) kciplhfrk Cell 79 Transcription factor 7-like kkkphikk Proteins: 80 Breast cancr antigen NY-BR-87 ktrhdplak 81 BRCA-1-Associated Ring Domain Protein (breast) khhpkdnlik 82 'Autoantigen from a breast tumor' khkrkkfrqk 83 Glioma Replikin (this study) kagvaflhkk 84 Ovarian cancer antigen khkrkkfrqk 85 EE L leukemia kkkskkhkdk 86 Prto-oncogene tyrosine-protein kinase C-ABLE hksekpalprk 87 Adenomatosis polyposis coli kkkkpsrikgdnek 88 Gastric cancer transforming protein ktkkprysptmkvth 89 Transforming prtein (K-RAS 2B),lung khkekmskdgkkkkkksk 100086] Identification of an amino acid sequence as a Replikin or as containing a 5 Replikin, i.e., a homologue of the glioma peptide, kagvaflhkk (SEQ ID NO: 1), requires that the three following requirements be met. According to the three point recognition system the sequences have three elements: (1) at least one lysine residue located six to ten residues from another lysine residue; (2) at least one histidine residue; and (3) a composition of at least 6% lysine within an amino acid sequence of 7 to about 50 i o residues. An exemplary non-limiting Replikin comprises a terminal lysine. [000871 Databases were searched using the National Library of Medicine keyword "PubMed" descriptor for protein sequences containing Replikin sequences. Over 4,000 protein sequences were visually examined for homologues. Sequences of all individual proteins within each group of PubMed-classified proteins were visually scanned for 15 peptides meeting the three above-listed requirements. An infrequent occurrence of homologues was observed in "virus peptides" as a whole (1.5%) (N=953), and in other peptides not designated as associated with malignant transformation or replication such as "brain peptides" and "neuropeptides" (together 8.5%) (N=845). However, surprisingly, homologues were significantly more frequently identified in large 20 "replicating proteins," which were identified as having an established function in 21 AMENDEBUM - i/US WO2006/088962 PCT/US2006/005343 replication in bacteria, algae, and viruses. Even more surprising was the finding that Replikin homologues occurred in 100% of "tumor viruses" (N=-250), in 97% of "cancer proteins" (N=401), and in 85% of "transforming viruses" (N=248). These results suggest that there are shared properties of cancer pathogenesis regardless of cell type 5 and suggest a role of viruses in carcinogenesis, i.e., conversion of cells from a transformed albeit dormant state to a more virulent actively replicating state. [000881 Homologues of the following amino acid sequence, kagvaflhkk (SEQ ID NO: 1), as defined by the three point recognition method, were found in such viruses, or viral peptides, as, but not limited to, adenovirus, lentivirus, a-virus, retrovirus, 10 adeno-associated virus, human immunodeficiency virus, hepatitis virus, influenza virus, maize streak virus, herpes virus, bovine herpes virus, feline immunodeficiency virus, foot and mouth disease virus, small pox virus, rous sarcoma virus, neuroblastoma RAS viral oncogene, polyomavirus, sindbis, human papilloma virus, myelomonocytic tumor virus; murine acute leukemia, T-cell lymphotropic virus, and tomato leaf curl virus. 15 1000891 Furthermore, homologues of the amino acid sequence kagvaflhkk (SEQ ID NO: 1) are present in known classes of coronavirus, which are members of a family of enveloped viruses that replicate in the cytoplasm of host cells. Additionally, the homologue of the amino acid sequence kagvaflhkk (SEQ ID NO: 1) is present in the recently identified class of coronavirus responsible for severe acute respiratory 20 syndrome, or SARS. The replikin is located in the nucleocapsid whole protein sequence of the SARS coronavirus. In addition, the location of the replikins is present in other members of the coronavirus class and, more specifically, are also present in the nucleocapsid protein sequences from these coronaviruses. 1000901 Replikins are present in such bacteria as, but not limited to, Acetobacter, 25 Achromobacter, Actinomyces, Aerobacter, Alcaligenes, Arthrobacter, Azotobacter, Bacillus, Brevibacterium, Chainia, Clostridium, Corynebacterium, Erwinia, Escheria, Lebsiella, Lactobacillus, Haemophilus, Flavobacterium, Methylomonas, Micrococcus, Mycobacterium, Micronomspora, Mycoplasma, Neisseria, Nocardia, Proteus, Pseudomonas, Rhizobium, Salmonella, Serratia, Staphylococcus, Streptocossus, 30 Streptomyces, Streptosporangium, Strepto-virticillium, Vibrio peptide, and Xanthomas. Replikins are present in such fungi as, but not limited to, Penicillium, Diseula, Ophiostoma novo-ulim, Mycophycophta, Phytophthora infestans, Absidia, Aspergillus, Candida, Cephalosporium, Fusarium, Hansenula, Mucor, Paecilomyces, Pichia, 22 AMENDED

-I/US

WO 2006/088962 PCT/US2006/005343 Rhizopus, Torulopsis, Trichoderma, and Erysiphe. Replikins are present in such yeast as, but not limited to, Saccharomyces, Cryptococcus, including Cryptococcusneoformas, Schizo-saccharomyces, and Oryza. Replikins are present in algae such as, but not limited to, Caldophera, Isolepisprolifera, Chondrus, Gracilaria, 5 Gelidium, Caulerpa, Laurencia, Cladophexa, Sargassum, Penicillos, Halimeda, Laminaria, Fucus, Ascophyllum, Undari, Rhodymenia, Macrocystis, Eucheuma, Ahnfeltia, and Pteroclasia. Replikins are present in amoeba such as, but not limited to, Entamoeba (including Entamoeba invadens), Amoebidae, Acanthamoeba and Naegleria. Replikins are present in plants such as, but not limited to, Arabidopsis, 10 wheat, rice, and maize. Auxiliary Specifications 100091] To permit classification of subtypes of Replikins, additional or "auxiliary specifications" to the basic "3-point-recognition" requirements may be added: (a) on a structural basis, such as the common occurrence of adjacent di- and polylysines in 15 cancer cell proteins (e.g., transforming protein P21B(K-RAS 2B), lung, Table 2, SEQ ID NO: 89), and other adjacent di-amino acids in TOLL-like receptors, or b) on a functional basis, such as exhibiting ATPase, tyrosine kinase or redox activity as seen in Table 2. Functional Derivatives 20 [00092] "Functional derivatives" of the Replikins as described herein are fragments, variants, analogs, or chemical derivatives of the Replikins, which retain at least a portion of the immunological cross reactivity with an antibody specific for the Replikin. A fragment of the Replikin peptide refers to any subset of the molecule. Variant peptides may be made by direct chemical synthesis, for example, using 25 methods well known in the art. An analog of a Replikin to a non-natural protein substantially similar to either the entire protein or a fragment thereof. Chemical derivatives of a Replikin contain additional chemical moieties not normally a part of the peptide or peptide fragment. Replikins and Replication 30 [00093] As seen in Figure 2, during anaerobic respiration when the rate of cell replication is increased, malignin is enriched. That is, malignin is found to increase not simply in proportion to the increase in cell number and total membrane proteins, but is enriched as much as ten-fold in concentration, starting with 3% at rest and reaching 23 WO 2006/088962 PCT/US2006/005343 30% of total membrane protein. This clear demonstration of a marked increase in Replikin concentration with glioma cell replication points to, and is consistent with, the presence of Replikins identified with the 3-point recognition method in various organisms. For example, Replikins were identified in such proteins as 5 "Saccharomyces cerevisiae replication binding protein" (SEQ ID NO: 2) (hsikrelgiifdk); the "replication associated protein A of maize streak virus" (SEQ ID NO: 8) (kyivcareahk) and (SEQ ID NO: 9) (kekkpskdeimrdiish); the "replication associated protein of Staphylococcus aureus" (SEQ ID NO: 10) (kkektthnk); the "DNA replication protein of bovine herpes virus 4" (SEQ ID NO: 11) (hkinitngqk); and the 10 "Mealigrid herpes virus 1 replication binding protein" (SEQ ID NO: 12) (hkdlyrllmk). Previous studies of tomato leaf curl gemini virus show that the regulation of virus accumulation appears to involve binding of amino acids 1-160 of the "replicating protein" of that virus to leaf DNA and to other replication protein molecules during virus replication. Analysis of this sequence showed that amino acids 1-135 of this 15 "replicating protein" contain a replikin count (concentration) as high as 20.7 (see section on tomato leaf curl Gemini virus.) [000941 Table 2 shows that Replikin-containing proteins also are associated frequently with redox functions, and protein synthesis or elongation, as well as with cell replication. The association with metal-based redox functions, the enrichment of 20 the Replikin-containing glioma malignin concentration during anaerobic replication, and the cytotoxicity of antimalignin at low concentrations (picograms/cell) (Figure 4C 4F), all suggest that the Replikins are related to central respiratory survival functions, have been found less often subjected to the mutations characteristic of non-Replikin amino acids. 25 Replikins in Influenza Epidemics [00095] Of particular interest, it was observed that at least one Replikin per 100 amino acids was found to be present in the hemagglutinin proteins of almost all of the individual strains of influenza viruses examined. The Replikin sequences that were observed to occur in the hemagglutinin proteins of isolates of each of the four prevalent 30 strains of influenza virus, influenza B, HlN1, H2N2, and H3N2, for each year that amino acid sequence data are available (1902-2001), are shown in Tables 3, 4, 5 and 6. [00096] Both the concentration and type, i.e., composition of Replikins observed, were found to relate to the occurrence of influenza pandemics and epidemics. The 24 WO 2006/088962 PCT/US2006/005343 concentration of Replikins in influenza viruses was examined by visually scanning the hemagglutinin amino acid sequences published in the National Library of Medicine "PubMed" data base for influenza strains isolated world wide from human and animal reservoirs year by year over the past century, i.e., 1900 to 2001. These Replikin 5 concentrations (number of Replikins per 100 amino acids, mean +/- SD) were then plotted for each strain. [000971 The concentration of Replikins was found to directly relate to the occurrence of influenza pandemics and epidemics. The concentration of Replikins found in influenza B hemagglutinin and influenza A strain, HIN1, is shown in Figure 10 7, and the concentration of Replikins found in the two other common influenza virus A strains, H2N2 and H3N2 is shown in Figure 8 (H2N2, H3N2). The data in Figure 8 also demonstrate an emerging new strain of influenza virus as defined by its constituent Replikins (H3N2(R)). [00098] Each influenza A strain has been responsible for one pandemic: in 1918, 15 1957, and 1968, respectively. The data in Figures 7 and 8 show that at least one Replikin per 100 amino acids is present in each of the influenza hemagglutinin proteins of all isolates of the four common influenza viruses examined, suggesting a function for Replikins in the maintenance of survival levels of replication. In the 1990s, during the decline of the H3N2 strain, there were no Replikins in many isolates of H3N2, but a 20 high concentration of new Replikins appeared in H3N2 isolates, which define the emergence of the H3N2(R) strain. See Tables 3, 4, 5 and 6. 25 WO 2006/088962 PCT/US2006/005343 ON N Co 7N (N C C 00 00 0000 w w0 6 0 CD c) = 6 CD CD -- Ne 0) N N OaN 3N as CTN ON ON ON ON ON 0000c 00 000000000c 00 0 00 00 .2 I C-- I N -- I - C -I - - C- C1- C0 V) in n ItC n tn tn kn tn tn W) in CON N C., (IN C-- o C- N c-- N \ N , N a c, N O CsN O ON ON ON ON ON ON ON ON ON ON ON C cu f n i n I n In i n i n I n i 5 0 0 0 0 0 0 0 0 0 0 00 0)0 0)0 A mm C m Ny .> CYC A . CVi kl .226 WO 2006/088962 PCT/US2006/005343 00 oll ~ O 0N0 sO 000 C> OP 00) 00) tn n k a) ~cn z - N N - , - - t-- . C -E! - o) o 0 0 0~~ 0 ,J .5 o 0j0 bb~ 0 't Cl 0 0 cdi bb- - ~ ~ 27 ) WO 2006/088962 PCT/US2006/005343 00 0000 C) ON ~ Q~0 C CC 00 00 0I 00 00 00 00 go 0 U~ 00 0 0 CN 000 Nz cy g co cCCl C 00 - j 1 - ClY 0000 ~ 00 wd 00 00 0 d 010 z S- z aJ zs 4" =, N on u r r E00 _ tj .. 0 _28 WO 2006/088962 PCT/US2006/005343 co 0 00 kn t %n In kn In tn in W) tn In If In 00 00 C-I 0606 o 06 0 oc ce 06 Cn\ \O N N N N N N N tn z0 6;;:;' Z9 Z -x 0 ~ zZ 0' z cj -a z C , aa 12.~. 29 WO 2006/088962 PCT/US2006/005343 all 00 00 00 00 00 00 0C00 (O00 a. a k00 z CY00 0) zz 00 _ z C o~~0o - 7!~ 0C 00 o 9 00 Zo y - M" ~ Or Or 8, v ) 0- n" 2 ,0 -a Q 0 CO aO a -0 30 WO 2006/088962 PCT/US2006/005343 C'.~C7 0'. 0', 0. a, O'.N rl I4 jiC' 0 '. ON 0'. 0, 0~0.- C. q 0 Clq C)0 wl 0 CO C),l Cl C Y ~ l ~ Z ' .0 N 1 0 - c ' C CY '2 .Ei E - Z, 00 00 00 y 131 WO 2006/088962 PCTIUS2006/005343 41 0 000o 00 00 '.0 0 C'4 -q CIO C/)) NU >) 4 >0 z4 Z -C r 32 WO 2006/088962 PCTIUS2006/005343 0> 0)~0 00 00 00 00 00 00 00o 0000 t t Cdad Ir n '.0 00 cc 0 00 00 00 00 0 in ~ In '.n 101 m C1 ~ .~ Z C)d n 00 00L 00 >~ 42 330 WO 2006/088962 PCTIUS2006/005343 4. 0 b0 .0 Oct v 00 0 00 00 0 cc ~ ~ 0 occocooc 0 06 06 0 00 00 0 "0 10 0 w No Q Z e '0 '0 T ' r' 0. .0 0 . 00 W00 0 6 0 z zZ O _ G A0 0' 'z A 0 0 .4 0FI 0 -. > CL~ 34 WO 2006/088962 PCTIUS2006/005343 0 0 CC 0 009 1.00 Co ) C O0 000 00 kC) r- 0 en 00 0 "a \D 00N-tN 2 /I:O 00 00 CY 00C C 00 7.6 Nq NNy CC . wN 35.

WO 2006/088962 PCTIUS2006/005343 CO ONO ONO ON) C4 00 0 0 C' ' 0 . -0 C) 000 0 00 00 0' t 0 000 0 YY .

.. ~ o Z4 00 - " -. 0 00.0 Ww "0 E4) Z O ) 004 0C C) lzNC ZS to0 -5 C0\ w ) C 9b * -. 0O C 0- . - , 2:: CO U zI ~ z 0~ ~> kO cli ci ) z ~~ ~~ 6 ~~.O- WO 2006/088962 PCT/US2006/005343 [00099] Several properties of Replikin concentration are seen in Figure 7 and Figure 8 to be common to all four influenza virus strains. First, the concentration is cyclic over the years, with a single cycle of rise and fall occurring over a period of two to thirty years. This rise and fall is consistent with the known waxing and waning of 5 individual influenza virus strain predominance by hemagglutinin and neuraminidase classification, Second, peak Replikin concentrations of each influenza virus strain previously shown to be responsible for a pandemic were observed to relate specifically and individually to each of the three years of the pandemics. For example, for the pandemic of 1918, where the influenza virus strain, HINI, was shown to be 10 responsible, a peak concentration of the Replikins in H1N1 independently occurred (P1); for the pandemic of 1957, where H2N2 emerged and was shown to be responsible, a peak concentration of the Replikins in H2N2 occurred (P2); and for the pandemic of 1968, where H3N2 emerged and was shown to be the cause of the pandemic, a peak concentration of the Replikins in H3N2 occurred (P3). Third, in the 15 years immediately following each of the above three pandemics, the specific Replikin concentration decreased markedly, perhaps reflecting the broadly distributed immunity generated in each case. Thus, this post-pandemic decline is specific for HIN1 immediately following the pandemic (P1) for which it was responsible, and is not a general property of all strains at the time. An increase of Replikin concentration in 20 influenza B repeatedly occurred simultaneously with the decrease in Replikin concentration in HINI, e.g., EBI in 1951 and EB2 in 1976, both associated with influenza B epidemics having the highest mortality. (Stuart-Harris, et al., Edward Arnold Ltd. (1985). Fourth, a secondary peak concentration, which exceeded the primary peak increase in concentration, occurred 15 years after each of the three 25 pandemics, and this secondary peak was accompanied by an epidemic: 15 years after the 1918 pandemic in an H1NI 'epidemic' year (El); eight years after the 1957 pandemic in an H2N2 'epidemic' year (E2); and occurred seven years after the 1968 pandemic in an H3N2 'epidemic' year (E3). These secondary peak concentrations of specific Replikins may reflect recovery of the strain. Fifth, peaks of each strain's 30 specific Replikin concentration frequently appear to be associated with declines in Replikin concentration of one or both other strains, suggesting competition between strains for host sites. Sixth, there is an apparent overall tendency for the Replikin concentration of each strain to decline over a period of 35 years (H2N2) to 60 years 37 WO 2006/088962 PCT/US2006/005343 (influenza B). This decline cannot be ascribed to the influence of vaccines because it was evident in the case of influenza B from 1940 to 1964, prior to common use of influenza vaccines. In the case of influenza B, Replikin recovery from the decline is seen to occur after 1965, but Replikin concentration declined again between 1997 and 5 2000 (Figure 7). This correlates with the low occurrence of influenza B in recent case isolates. H1N1 Replikin concentration peaked in 1978-1979 (Figure 7) together with the reappearance and prevalence of the H1N1 strain, and then peaked in 1996 coincident with an H1N1 epidemic. (Figure 7). H1N1 Replikin concentration also declined between 1997 and 2000, and the presence of H1N1 strains decreased in 10 isolates obtained during these years. For H2N2 Replikins, recovery from a 35 year decline has not occurred (Figure 8), and this correlates with the absence of H2N2 from recent isolates. For H3N2, the Replikin concentration of many isolates fell to zero during the period from 1996 to 2000, but other H3N2 isolates showed a significant, sharp increase in Replikin concentration. This indicates the emergence of a substrain of 15 H3N2, which is designated herein as H3N2(R). [000100] Figures 7 and 8 demonstrate that frequently, a one to three year stepwise increase is observed before Replikin concentration reaches a peak. This stepwise increase proceeds the occurrence of an epidemic, which occurs concurrently with the Replikin peak. Thus, the stepwise increase in concentration of a particular strain is a 20 signal that particular strain is the most likely candidate to cause an epidemic or pandemic. [0001011 Currently, Replikin concentration in the H3N2(R) strain of influenza virus is increasing (Figure 8, 1997 to 2000). Three similar previous peak increases in H3N2 Replikin concentration are seen to have occurred in the H3N2-based pandemic of 1968 25 (Figure 8), when the strain first emerged, and in the H3N2-based epidemics of 1972 and 1975 (Figure 8). Each of these pandemic and epidemics was associated with excess mortality. (Ailing, et al., Am J. Epidemiol.,113(l):30-43 (1981). The rapid ascent in concentration of the H3N2(R) subspecies of the H3N2 Replikins in 1997 2000, therefore, statistically represents an early warning of an approaching severe 30 epidemic or pandemic. An H3N2 epidemic occurred in Russia in 2000 (Figure 8, E4); and the CDC report of December 2001 states that currently, H3N2 is the most frequently isolated strain of influenza virus worldwide. (Morbidity and Mortality Weekly Reports (MMWR), Center for Disease Control; 50(48):1084-68 (Dec.7, 2001). 38 WO 2006/088962 PCT/US2006/005343 [0001021 In each case of influenza virus pandemic or epidemic new Replikins emerge. There has been no observation of two of the same Replikins in a given hemagglutinin in a given isolate. To what degree the emergence of a new Replikin represents mutations versus transfer from another animal or avian pool is unknown. In 5 some cases, each year one or more of the original Replikin structures is conserved, while at the same time, new Replikins emerge. For example, in influenza virus B hemagglutinin, five Replikins were constantly conserved between 1919 and 2001, whereas 26 Replikins came and went during the same period (some recurred after several years absence). The disappearance and re-emergence years later of a particular 10 Replikin structure suggests that the Replikins return from another virus host pool rather than through de novo mutation. [000103] In the case of HlN1 Replikins, the two Replikins present in the P1 peak associated with the 1918 pandemic were not present in the recovery El peak of 1933, which contains 12 new Replikins. Constantly conserved Replikins, therefore, are the 15 best choice for vaccines, either alone or in combination. However, even recently appearing Replikins accompanying one year's increase in concentration frequently persist and increase further for an additional one or more years, culminating in a concentration peak and an epidemic, thus providing both an early warning and time to vaccinate with synthetic Replikins (see for example, H1lNI in the early 1990's, Figure 20 7; see also, for example, H5N1 1995-2002, Figure 11, "Replikin Count" (number of Replikins per 100 amino acids) refers to Replikin concentration) and Figure 15). [000104] The data in Figures 7, 8 ,11 and 15 demonstrate a direct relationship between the presence and concentration of a particular Replikin in influenza protein sequences and the occurrence of pandemics and epidemics of influenza. Thus, analysis 25 of the influenza virus hemagglutinin protein sequence for the presence and concentration of Replikins provides a predictor of influenza pandemics and/or epidemics, as well as a target for influenza vaccine formulation. It is worth noting again with reference to this data, previously, no strain-specific chemical structures were known with which to predict the strains that would predominate in coming influenza 30 seasons, nor to devise annual mixtures of whole-virus strains for vaccines. [000105] Similar to the findings of strain-specific Replikin Count increases in the influenza group one to three years prior to the occurrence of a strain-specific epidemics, the increase in Replikin Count of the coronavirus nucleocapsid protein has also been 39 WO 2006/088962 PCT/US2006/005343 identified. Replikin Counts of the coronavirus nucleocapsid protein has increased as follows: 3.1 (±1.8) in 1999; 3.9(±l.2) in 2000; 3.9 (1.3) in 2001; and 5.1 (:3.6) in 2002. This pre-pandemic increase supports the finding that a coronavirus is responsible for the current (2003) SARS pandemic. (See Table 7) 5 [0001061 Thus, monitoring Replikin structure and Replikin Count provides a means for developing synthetic strain-specific preventive vaccination and antibody therapies against the 1917-1918 Goose Replikin and its modified and accompanying Replikins as observed in both influenza and coronavirus strains. [000107] Figure 10 depicts the automated Replikin analysis of nucleocapsid 10 coronavirus proteins for which the protein sequence is available on isolates collected from 1962 to 2003. Each individual protein is represented by an accession number and is analyzed for the presence of Replikins. The Replikin Count (number of Replikins per 100 amino acid) is automatically calculated as part of the automated Replikin analysis. For each year, the mean (± Standard deviation (S.D.)) Replikin Count per 15 year is automatically calculated for all Replikin Counts that year. This example of early warning of increasing replication, before an epidemic, of a particular protein (the nucleocapsid protein) in a particular virus strain (the coronavirus) is comparable to the increase seen in strains of influenza virus preceding influenza epidemics and pandemics (Figures 7, 8, 11 and 15). It may be seen that the Replikin Count rose from 1999 to 20 2002, consistent with the SARS coronavirus pandemic, which emerged at the end of 2002 and has persisted into 2003. Figure 9 provides a graph of the Replikin Counts for several virus strains, including the coronavirus nucleocapsid Replikin, from 1917 to 2002. 40 JS06/05343 25-U-ZUU I 0 09 -Z -Z 0 4n~ In M M N A N C4 -qGoa C to C-4 0% > % >. N. w NO' .0 w0 .> .0 . ccC w0 Ad . .0 Q It c E -L -i .0 v > C:b m2 > > Z ' . ul tm cn u 0 0 V) n C7 Q U' - - - - - - - - - - - - - - - - - ~ - 0 -- ~ .O 0%0 0 0 . 0 cc W~ > ~ -'N 0 r. cc ag 4)Cd 5 =% 0 ,A -V 9 - C 2 Q I& wo n u q e ca n gn

-

5.. 0', U- '0 - U3 am at~ cl. X ~ u .1 ~ .- C -l cc ..- -0 z I F/rJS06/05343 25-06-2007 ~ 2 a 26 ww F 0 0 C 00 000 c A 0 00 00 aac -Edd US06/05343 25-06-2007 WO 20061088962 SARS and H3N2-Fujian influenza virus replikins traced back to a 1918 pandemic replikin 5 10001081 The origin of the SARS virus is as yet unknown. We report evidence that certain SARS virus peptides can be traced back through homologous peptides in several strains of influenza virus isolates from 2002 to a sequence in the strain of the 1918 influenza pandemic responsible for the deaths of over 20 million people. 10001091 By quantitative analysis of primary protein sequences of influenza virus 10 and other microorganisms recorded through the last century we have found a new class of peptide structures rich in lysines and histidine, related to the phenomenon of rapid replication itself and to epidemics, rather than to the type of organism (eg. Table 1) and named them Replikins. We have found a new class of peptide structures with the following obligatory algorithm: at least two lysines 6 to 10 residues apart, lysine IS concentration 6% or greater, one histidine, in 7 to 50 amino acids. Because these peptides relate to the phenomenon of rapid replication itself and to epidemics, we named them Replikins. We have found a quantitative correlation of strain-specific replikin concentration (replikin count = number of replikins per 100 amino acids) in the hemagglutinin protein with influenza epidemics and pandemics (Figure 7). No previous 20 correlation of influenza epidemics with strain-specific viral protein chemistry has been reported. Conservation, condensation and concentration of replikin structure also has been found in influenza (eg. in Table 7a), HIV and malaria. The detection of replikins in SARS coronavirus, in addition to tracing its possible evolution, has permitted the synthesis of small SARS antigens for vaccines. 25 10001101 We have found a quantitative correlation of strain-specific replikin concentration (count) in the influenza hemagglutinin proteins with influenza epidemics and with each of the three rndemics of the last century, in 1918, 1957, and 1968. A similar course was observed for each of these three pandemics: after a strain-specific high replikin count, an immediate decline followed, then a 'rebound' increase with an 30 accompanying epidemic occurred. Also, a I to 3 year warning increase in count preceded most epidemics. 1000111] We found that the replikin in the hemagglutinin of an influenza virus isolated from a goose in 1917 (which we named the Goose Replikin) appeared in the next year in the HN I strain of influenza responsible for the 1918 pandemic, with only 44 SUBSTITUTE SHEET AMENDED SHEET - IPEA/US JS06/05343 25-06-2UU I WO 2006/088962 two substitutions as follows: kkg(t/s)sypklsksy(t/v)nnkgkevlvlwgvhh (SEQ ID NO: 323). Table 7a shows that the influenza 1917 Goose Replikin (GR) then was essentially conserved for 85 years, despite multiple minor substitutions and apparent translocations to other influenza strains. We have found that the 1917 influenza GR 5 demonstrated apparent mobility between several influenza strains, appearing in HIN (the pandemic of 1918), in H2N2 (pandemic of 1957-58), in H3N2 (pandemic of 1968, epidemic in China and Russia 2000, Fujian strain epidemic 2003) and in H5N I (epidemic in China 1997). In 1997 its structure was restored in HIN2 exactly to its 1918 structure KKGSSYPKLSKSYVNNKGKEVLVLWGVHH (SEQ ID NO: 324). to 10001121 The SARS c.oronavirus first appeared in the 2002-2003 influenza season. The dual origin in 2002 of SARS replikins, from influenza GR and coronavirus replikins (or from some unknown shared precursor) is suggested by the following events, all of which occurred in 2002: 1) a condensation for the first time in 85 years is seen in the GR-HIN2 Replikin sequence from 29 to 28 amino acids (Table 7aXA 15 similar condensation was found in H3N2 Fujian from 29 to 27 amino acids in the current epidemic (Table 7a)); 2) the replikin count of GR-H IN2 showed a marked decline consistent with GR moving out of H1N2; 3) the replikin count of coronavirus nucleocapsid proteins showed a marked increase; and 4) SARS coronavirus appeared in 2002-2003 with replikins containing the following motifs: 'kkg' and 'k-k', previously 20 seen in GR 1918 and GR-H IN2 2001; 'k-kk', 'kk' and 'kl' seen in influenza GR-H IN2 2001; 'kk' seen in the avian bronchitis coronavirus replikin; and 'kk-kk-k' (SEQ ID NO: 325), 'k-k', 'kk', 'kl' and 'kt' seen in the replikin of porcine epidemic diarrhea coronavirus (Table 7a) (SARS is believed to have made its first appearance in humans as the epidemic pneumonia which erupted in a crowded apartment house where there 25 was a severe back-up of fecal sewage, which was then airborne by ventilating fans). 45 SUBSTITUTE SHEET AMENDED SHEET - IPEAIUS '/US06/05343 25-06-2007 WO 20061088962 Table 7a. Goose Replikin (GR) sequences in different influenza strains from 1917 to 2003; SARS and H3N2-Fujian appearance 2002-2003. 5 Replikins related to the Goose Replikin: SEO ID Replikin Virus or other or anism Continuous amino acid sequences N1 containing replikn Shared motif and/or position- (Number) (Complete replikins Amino acid substitutions- clear background of amino except for Fujian strain) 'Condensed' indicates condensation of acids 10 sequence length in HIN2 and H3N2-Fujian 326 29 1917 HIN_ Influenza Goose Replikin (GR) 743 327 29 1918 GR in HIN1lHuman Influenza .744 328 29 1958 GR H2N2 Influenza 15 -745 29 29 1964,1965,1968 GR in H2N2 Influenza 330 29 1976,'77,'80,'81,'5 GR in HIM Influenza 331 29 1996-2001 GR in H5NI Influenza 74 332 29 1996 OR in HIN Influenza 747 333 29 1997,1998 GR in H INI Influenza 20 748 334 29 1999 GR in HIN2 Influenza 749 335 29 2000 OR in HIN2 Influenza S 50336 29 2001 GR in HIN2 Influenza 337 29 2001 GR in H1N2 Influenza 33 28 2002 GR in HIN2 Influenza (condensed) 339 21 2002-3 Human SARS aucleocapsid protein 25 P 340 29 1968-2001 GR in H3NZ Influenza (complete) 65 341 29 1996 H3N2 Fujian Influenza (incomplete) e t kf -53 342 27 2003 H3N2 Fujian(condensed,incomplete) 343 24 Porcine epidemic diarrhea coronavirus 30 fegsgvp en t 344 27 Avian bronchitis coronavirus 3 fpllv dne 1 74 345 27 2000 shrimp white spot syndrome virus fegsgvpdnen1 sq 346 27 Avian bronchitis coronavirus 75M 347 20 2002-3 Human SARS spike protein . 91 786 348 14 2002-3 Human SARS nucleocapsid protein' 35 refvf d349 20 2002-3 Human SARS spike protein 5 fv gf1 350 14 2002-3 Human SARS nucleocapsid protein 35 11 2002-3 Human SARS spike protein 352 9 2002-3 Human SARS envelope protein 353 9 2002-3 Human SARS spike protein 354 7 2002-3 Human SARS nucleocapsid protein 403 d55 11 Nipah virus, v-protein 356 11 Hendra virus, v-protein 357 10 Sindbis virus 358 10 EEL leukemia 45 359 10 BRCA-1 breast cancer 360 10 Ovarian cancer 361 10 Glioma Replikin 363 9 Smallpox virus 364 9 HIV TAT protein 365 8 Smallpox virus 366 8 B.anthracis, HATPase 367 8 Ebola virus polymerase 46 SUBSTITUTE SHEET AMENDED SHEET - IPEA/US J/UNiUO/U_5i. L:)-U-LUUI WO 2006/088962 j000113] The recent increasingly high replikin count peaks, including the presence of the 1917 Goose Replikin (Figure 7), now in HlN2 (Table 7a), approaching the 1917 replikin count, could be a warning of a coming pandemic which may already have begun since the SARS virus and the H3N2-Fujian virus are the current carriers of the 5 short replikin derivatives of the Goose Replikin seen in Table 7 and 7a to be associated with high mortality. [0001141 Since the Goose Replikin has at least an 85 year history involving most or all of the A-strains of influenza and SARS, it and its components are conserved vaccine candidates for pan-strain protection. Condensed short SARS replikins, 7 to 21 amino 10 acids long, enriched in % lysine and histidine compared to the Goose Replikin, occurred in association with the higher mortality rate of SARS (10-55%) when compared to that (2.5%) of the Goose Replikin, 29 amino acids long. Short replikins here mixed with long replikins in SARS may be responsible !. high mortality. This is also the case for replikins of other organisms such as the ebola av;d smallpox viruses 15 and anthrax bacteria (Table 7a). These short SARS replikins sho: -d surprising homology with short replikins of other organis :.s such as smallpox, anthrax, and ebola which are associated with even higher untreated mortality rates (Table 7a). 10001151 Short synthetic vaccines, besides being much more rapidly produced (days rather than months), and far less expensive, should avoid the side effects attendant on 20 the contamination and the immunological interference engendered by multiple epitopes of thousands of undesired proteins in current whole virus vaccines in general. In any case for influenza, current whole virus vaccines are ineffective in more than half of the elderly. But would short replikins be sufficiently immunogenic? The short glioma replikin'kagvaflhkk'(SEQ ID NO: 1) proved to be a successful basis for a synthetic 25 anti-glioblastoma iultiforme and anti-bronchogenic carcinoma vaccine. It produced anti-malignin antibody, which is cytotoxic to cancer cells at picograms/cell and relates quantitatively to the survival of cancer patients. In order to prepare for a recurrent SARS attack, which appears likely because of the surge we found in the coronavirus nucleocapsid replikin count in 2002, we synthesized four SARS short replikins, found 30 in nucleocapsid, spike, and envelope proteins. We found that these synthetic short SARS replikins when injected into rabbits also produced abundant specific antibody. For example, the 21 amino acid SARS nucleocapsid replikin antibody binds at dilutions 47 SUBSTITUTE SHEET AMENDED SHEET - IPEA/US U iuoIVJ3.3 LYUVU-Luu I WO 2006/088962 greater than I in 204,800. Because of previous unsuccessful attempts by others.to achieve with various small peptides a strong immune response without the unwanted side effects obtained with a whole protein or the thousands of proteins or nucleic acids as in smallpox vaccine, the ability of small synthetic replikin antigens to achieve strong 5 immune responses is significant for the efficacy of these SARS vaccines. 10001161 We examined the relationship of Replikin structure in influenza and SARS viruses to increased mortality, with results as shown in Table 7. The relation of high mortality to short or condensed Replikin sequences is seen in the high mortality organisms shown in Section B of Table 7, in viruses other than influenza and SARS, to and in bacteria, malaria and cancer. In support of the unifying concept of Replikin structure and of the relation of Replikins to rapid replication rather than any cell type or infectious organism, in addition to the prevalence of the basic Replikin structure in a broad range of viral, bacterial, malarial and cancer organisms in which replication is crucial to propagation and virulence, the following homologous sequences have been 15 observed: note the "k"s in positions 1 and 2, note the alignment of "k"s as they would present to DNA, RNA or other receptor or ligand for incorporation or to stimulate rapid replication, note the frequency of "double k"s and "multiple k"s , note the frequency of "g" in position 3 and the occurrence of the triplets "kkg", "hek", "hdk" and "hkk" in the most condensed shortened Replikins associated with the highest mortality organisms, 20 cancer cells and genes as diverse as the smallpox virus, the anthrax virus, Rous sarcoma virus and glioblastome multiforme (glioma), c-src in colon and breast cancer, and c-yes in melanoma and colon cancer. Note also the almost identical Replikin structure for two recently emerging high mortality viruses in Australia and Southeast Asia, Nipah and Hendrah viruses. These two viruses are reported to have similar or identical 25 antibodies formed against them but no structural basis has been known for this up till now, with our finding of their two almost identical Replikins, for this similar antibody. [000117] Table 7 also shows the relationship of five SARS Replikins of 2003 which we have found both to the influenza Goose Replikin of 1917 and to two coronaviruses, the avian bronchitis coronavirus and the porcine epidemic diarrhea virus. The first 30 2003 human SARS Replikin in Table 7 shows certain sequence homologies to the influenza virus goose 1917 and human 1918 Replikins through an intermediary 48 SUBSTITUTE SHEET AMENDED SHEET - IPEA/US IUS06/05343 25-06-2007 WO 2006/088962 structure of influenza H1N2 in 2002 (e.g., see Replikin "k" in positions 1, 18 and 19). The 1917 Goose Replikin sequence is seen in Table 7 to have been largely conserved despite many substitutions in amino acids which are not crucial to the definition of Replikins through 1999 (substitutions are show in italics). The original 29 amino acid 5 1917 Replikin sequence was then found to have been almost exactly restored to its structure of 1917-1918 in the 2001 HIN2 Replikin. However, the 2002 HIN2 influenza Replikin has been shortened from 29 to 28 amino acids and the "shift to the left" of amino acids kevl(i/v)wg (v/i)hh (SEQ ID NO: 367) is clearly evident. [0001181 In 2003, one Replikin was further shortened (or compacted) to the 21 10 amino acid Replikin of the first listed 2003 human SARS virus. The % k of the 2003 SARS Replikin is now 38.1 % (8/21) in comparison to 20.7 % of the Goose Replikin and the 1918 Human Pandemic Replikin. Compared to the influenza 29 amino acid Replikin, three SARS Replikins were found to be further shortened (or compacted) to 19, 11 and 9 amino acid long sequences, respectively. In the SARS 9 amino acid 15 sequences shown, the % k is 44.4% (4/9). With the shortening of the SARS Replikin, the SARS mortality rate in humans rose to 10% in the young and 55.5% in the elderly compared to the 2.5 % mortality in the 1918 influenza pandemic. [000119] The amino acid sequences are shown in Table 7 to emphasize the degree of homology and conservation for S5 years (1917-2002) of the influenza Replikin, for 20 which evidence has first been observed in (he 1917 Goose Replikin. No such conservation has ever been observed before. Table 7 also illustrates that the Replikins in the 2003 human SARS virus, in addition to having homologies to the influenza Replikins which first appeared as the 1917 Goose Replikin and the 1918 Human Pandemic influenza Replikin, show certain sequence homologies to both the 25 coronavirus avian bronchitis virus Replikin (e.g. "k" in positions 1 and 2, end in "h") and to the coronavirus acute diarrhca virus Replikin (e.g. "k" in positions I and 11, "h" at the end of the Replikin). This evidence of relation to both influenza and coronavirus Replikins is of interest because SARS arose in Hong Kong as did several recent influenza epidemics and earlier pandemics, and the SARS virus has been classified as a 30 new coronavirus partly because of its structure, including nucleocapsid, spike, and envelope proteins. Certain epidemiological evidence also is relevant in that SARS made its first appearance in humans as the epidemic pneumonia, which erupted, in a 49 SUBSTITUTE SHEET AMENDED SHEET - IPEA/US I'/USu6/0334i Z-Uo-ZUU/ WO 20061088962 crowded Hong Kong apartment house where there was a severe back-up of fecal sewage, which was airborne by ventilating fans. [0001201 Composition of Replikins in Strains of Influenza Virus B: Of a total of 26 Replikins identified in this strain (Table 3), the following ten Replikins are present in 5 every influenza B isolate examined from 1940-2001. Overlapping Replikin sequences are listed separately. Lysines and histidines are in bold type to demonstrate homology consistent with the "3-point recognition." KSHIFANLK (SEQ ID NO: 104) KSHFANLKGTK (SEQ ID NO: 105) 10 KSHFANLKGTKTRGKLCPK (SEQ ID NO: 106) HEKYGGLNK (SEQ ID NO: 107) HEKYGGLNKSK (SEQ ID NO: 108) HEKYGGLNKSKPYYTGEHAK (SEQ ID NO: 10) HAKAIGNCPIWVK (SEQ ID NO: 110) 15 HAKAIGNCPIWVVKKTPLKLANGTK (SEQ ID NO: I11) HAKAIGNCPIWVKTPLKLANGTKYRPPAK (SEQ ID NO: 112) HAKAIGNCPIWVKTPLKLANGTKYRPPAKLLK (SEQ ID NO: 113) 10001211 Tables 3 and 4 indicate that there appears to be much greater stability of the Replikin structures in influenza B hemagglutinins compared with HIN I Replikins. 20 Influenza B has not been responsible for any pandemic, and it appears not to have an animal or avian reservoirs. (Stuart-Harris et al., Edward Arnold Ltd., London (1985)). Replikins in Influenza over Time 10001221 Only one Replikin "hp(v/i)tigecpkyv-(r/k)(s/t)(tla)k" (SEQ ID NO: 135) is present in every HINI isolate for which sequences are available from 1918, when the 25 strain first appeared and caused the pandemic of that year, through 2000. (Table 4). ("(v/i)" indicates that the amino acid v or i is present in the same position in different years.) Although HINI contains only one persistent Replikin, H IN] appears to be more prolific than influenza B. There are 95 different Replikin structures in 82 years on HINI versus only 31 different Replikins in 62 years of influenza B isolates (Table 30 4). An increase in the number of new Replikin structures occurs in years of epidemics (Tables 3, 4, 5 and 6) and correlates with increased total Replikin concentration (Figures 7, 8, 11 and 15). 50 SUBSTITUTE SHEET AMENDED SHEET - IPEA/US /US06/05343 25-06-2U 1 WO 20061088962 [0001231 Influenza H2N2 Replikins: Influenza H2N2 was responsible for the human pandemic of 1957. Three of the 20 Replikins identified in that strain for 1957 were conserved in each of the H2N2 isolates available for examination on PubMed until 1995 (Table 5). 5 ha(k/q/m)d/n)ilekthngk (SEQ ID NO: 232) ha(k/q/m)(d/n)ilekthngkic(k/r) (SEQ ID NO: 233) kgsnyp(v/i)ak(g/r)synntsgeqmlilwq(v/i)h (SEQ ID NO: 238) 1000124] However, in contrast to H IN1, only 13 additional Replikins have been found in H2N2 beginning in 1961. This paucity of appearance of new Replikins Io correlates with the decline in the concentration of the H2N2 Replikins and the appearance of H2N2 in isolates over the years. (Figure 8). [0001251 Influenza H3N2 was responsible for the human pandemic of 1968. Five Replikins which appeared in 1968 disappeared after 1977, but reappeared in the 1990s (Table 6). The only Replikin structure which persisted for 22 years was 15 hcd(g/q)f(q/r)nekwdlf(v/i)e(s/t)k (SEQ ID NO: 277), which appeared first in 1977 and persisted through 1998. The emergence of twelve new H3N2 Replikins in the mid 1990s (Table 6) correlates with the increase in Replikin concentration at the same time (Figure 8), and with the prevalence of the H3N2 strain in recent isolates together with the concurrent disappear-nce of all Replikins from some of these isolates (Figure 8), 20 this suggests the emergence of the new substrain H3N2(R). The current epidemic in November - December 2003 of a new strain of H3N2 (Fujian) confirms this prediction made first in the Provisional Application US 60/303,396, filed July 9, 2001. [0001261 Figures 7, 8, 11 and 15 show that influenza epidemics and pandemics correlate with the increased concentration of Replikins in influenza virus, which is due 25 to the reappearance of at least one Replikin from one to 59 years after its disappearance. Also, in the A strain only, there is an emergence of new strain-specific Replikin compositions (Tables 4-6, see also increase in number of new Replikins, pre epidemic for H5N 1 in Figures 11 and 15). Increase in Replikin concentration by repetition of individual Replikins within a single protein appears not to occur in 30 influenza virus, but is seen in other organisms. 10001271 It has been believed that changes in the activity of different influenza strains are related to sequence changes in influenza hemagglutinins, which in turn are the products of substitutions effected by one of two poorly understood processes: i) 51 SUBSTITUTE SHEET AMENDED SHEET - IPEA/US WO 2006/088962 PCT/US2006/005343 antigenic drift, thought to be due to the accumulation of a series of point mutations in the hemagglutinin molecule, or ii) antigenic shift, in which the changes are so great that genetic reassortment is postulated to occur between the viruses of human and non human hosts. First, the present data suggests that the change in activity of different 5 influenza strains, rather than being related to non-specific sequence changes, are based upon, or relate to the increased concentration of strain-specific Replikins and strain specific increases in the replication associated with epidemics. In addition, the data were examined for a possible insight into which sequence changes are due to "drift" or "shift", and which are due to conservation, storage in reservoirs, and reappearance. The 10 data show that the epidemic-related increase in Replikin concentration is not due to the duplication of existing Replikins per hemagglutinin, but is due to the reappearance of at least one Replikin composition from 1 to up to 59 years after its disappearance, plus in the A strains only, the emergence of new strain-specific Replikin compositions (Tables 3-6). Thus the increase in Replikin concentration in the influenza B epidemics of 1951 15 and 1977 are not associated with the emergence of new Replikin compositions in the year of the epidemic but only with the reappearance of Replikin compositions which had appeared in previous years then disappeared (Table 3). In contrast, for the A strains, in addition to the reappearance of previously disappeared virus Replikins, new compositions appear (e.g. in HlNI in the year of the epidemic of 1996, in addition to the 20 reappearance of 6 earlier Replikins, 10 new compositions emerged). Since the A strains only, not influenza B, have access to non-human animal and avian reservoirs, totally new compositions probably derive from non-human host reservoirs rather than from mutations of existing human Replikins which appear to bear no resemblance to the new compositions other than the basic requirements of "3-point recognition" 25 (Tables 2-5). The more prolific nature of HlN compared with B, and the fact that pandemics have been produced by the three A strains only, but not by the B strain, both may also be a function of the ability of the human A strains to receive new Replikin compositions from non-human viral reservoirs. [000128] Some Replikins have appeared in only one year, disappeared, and not 30 reappeared to date (Tables 3-6). Other Replikins disappear from one to up to 81 years, when the identical Replikin sequence reappears. Key Replikin 'k' and 'h' amino acids, and the spaces between them, are conserved during the constant presence of particular Replikins over many years, as shown in Tables 2 and 3-6 for the following strain 52 WO 2006/088962 PCT/US2006/005343 specific Replikins: ten of influenza B, the single Replikin of HIN1, and the single Replikin of H3N2 as well as for the reappearance of identical Replikins after an absence. Despite the marked replacement or substitution activity of other amino acids both inside the Replikin structure and outside it in the rest of the hemagglutinin 5 sequences, influenza Replikin histidine (h) appears never to be, and lysine (k) is rarely replaced. Examples of this conservation are seen in the HIN1 Replikin "hp(v/i)tigecpkyv(r/k)(s/t)(t/a)k," (SEQ ID NO: 135) constant between 1918 and 2000, in the H3N2 Replikin "hcd(g/q)f(q,r)nekwdlf(v/i)er(s/t)k" (SEQ ID NO: 277) constant between 1975 and 1998 and in the H3N2 Replikin 10 "hqn(s/e)(e/q)g(t/s)g(q/y)aad(l/q)kstq(a/n)a(i/l)d(q/g)I(n/t)(g/n)k,(1/v)n(r/s) vi(e/c)k" (SEQ ID NO: 276) which first appeared in 1975, disappeared for 25 years, and then reappeared in 2000. While many amino acids were substituted, the basic Replikin structure of 2 Lysines, 6 to 10 residues apart, one histidine, a minimum of 6% lysine in not more than approximately 50 amino acids, was conserved. 15 [000129] Totally random substitution would not permit the persistence of these HIN and H3N2 Replikins, nor from 1902 to 2001 in influenza B the persistence of 10 Replikin structures, nor the reappearance in 1993 of a 1919 18-mer Replikin after an absence of 74 years. Rather than a random type of substitution, the constancy suggests an orderly controlled process, or in the least, protection of the key Replikin residues so 20 that they are fixed or bound in some way: lysines, perhaps bound to nucleic acids, and histidines, perhaps bound to respiratory redox enzymes. The mechanisms, which control this conservation, are at present unknown. H5N1 Influenza Conservation of Replikin Scaffold [000130] There is concern that the current outbreak of high mortality H5NI "bird 25 flu" in several countries may represent the first phase of an overdue influenza pandemic. A recent report suggests that in the first probable person-to-person transmission of H5NI, "sequencing of the viral genes identified no change in the receptor-binding site of hemagglutinin or other key features of the virus. The sequences of all eight viral gene segments clustered closely with other H5NI sequences from 30 recent avian isolates in Thailand." Phylogenetic analysis suggested that from the absence of evidence of "reassortment with human influenza viruses" that H5N1 is not a new variant. However, we now report three recent changes in a specific H5N1 protein 53 WO 2006/088962 PCT/US2006/005343 sequence at sites which had not been changed in the last two H5N1 epidemics and in fact had been conserved since 1959. [000131] Previously, there has been no protein chemistry which correlated with virus epidemics and dormancy. We found that each of the three influenza pandemics of 5 the last century, HIN1, H2N2 and H3N2, retrospectively was predicted by and correlated with an increase in the concentration of a specific class of peptides in the virus, rich in lysine and histidine, associated with rapid replication, called replikins. We have now again found the replikins to be predictive in each of the three H5N1 epidemics, in 1997, 2001, and 2003-2004 (Figure 15). Each year that they appear in 10 isolates, the replikins can now be counted per 100 amino acids as in Figure 15, and their sequences analyzed and compared as in Table 9. Analysis of replikins may be accomplished manually or in a preferred aspect of the present invention automatically by software designed by the inventors for the purpose of counting replikin concentration in available sequence information. 15 [000132] A graph illustrating a rapid increase in the concentration of Replikin patterns in the hemagglutinin protein of the H5N1 strain of influenza prior to the outbreak of three "Bird Flu" epidemics may be seen in Figure 15. A review of Figure 15 illustrates that an increasing replikin concentration ('Replikin Count') in the hemagglutinin protein of H5N1 preceded three 'Bird Flu' Epidemics. For example, an 20 increase in the Replikin Count (Means+/-SD) in 1995 to 1997 preceded the Hong Kong H5N1 epidemic of 1997 (El). An increase in the Replikin Count from 1999 to 2001 preceded the epidemic of 2001 (E2). And an increase in Replikin Count from 2002 to 2004 preceded the epidemic in 2004 (E3). The decline in 1999 occurred with the massive culling of poultry in response to the El epidemic in Hong Kong. 25 [0001331 In addition to the total number of replikins in the virus protein, the structure of each replikin through time is informative. Table 8 shows a replikin first observed in a goose infected with influenza in 1917 (Goose Replikin). Constant length, constant lysines at the amino terminal and histidine residues at the carboxy terminal were conserved in different strains in a fixed scaffold for decades. Homologues of the 30 Goose Replikin appeared from 1917 to 2006 in strains including each responsible for the three pandemics of 1918, 1957, and 19681, HINI, H2N2 and H3N2, and with further substitutions between H1N2, H7N7, H5N2 and H5N1. Even certain substitutions which have occurred in the Goose Replikin tend to be selective and 54 WO 2006/088962 PCT/US2006/005343 retained for years, rather than random. Thus despite the common assumption that amino acid substitutions should occur at random, it would appear that not all substitutions in influenza are, in fact, random. This replikin conservation over decades allows the production of synthetic influenza vaccines which rapidly and inexpensively 5 can be prepared in advance and can be effective for more than one year. [000134] Therefore a target for synthetic influenza vaccines is the conserved Replikin Scaffold in influenza virus. A Replikin Scaffold comprises a series of conserved peptides comprising a sequence of about 16 to about 30 amino acids and further comprising 10 (1) a terminal lysine; (2) a terminal histidine and another histidine in the residue portion immediately adjacent to the terminal histidine; (3) at least one lysine within about 6 to about 10 amino acid residues from at least one other lysine; and 15 (4) at least about 6% lysines within the 16 to about 30 amino acid peptide. A Replikin Scaffold may further comprise a an additional lysine immediately adjacent to the terminal lysine. "Replikin Scaffold" peptides may comprise an additional lysine immediately adjacent to the terminal lysine. "Replikin Scaffold" peptide also refers to 20 an individual member or a plurality of members of a series of a "Replikin Scaffold." 10001351 A non-limiting and preferred target for synthetic influenza vaccines may be a Replikin Scaffold in influenza virus further comprising a sequence of about 29 amino acids and a lysine immediately adjacent to the terminal lysine. [0001361 A non-preferred target for synthetic influenza may be an Exoskeleton 25 Scaffold in a first strain of influenza virus comprising a first peptide of about 29 amino acids and (1) a terminal lysine and a lysine immediately adjacent to the terminal lysine; (2) a terminal histidine and a histidine immediately adjacent to the 30 terminal histidine; (3) no lysine within 6 to 10 amino acid residues from any other lysine wherein an earlier-arising specimen of the first strain or another strain of virus comprises a Replikin Scaffold of about 29 amino acids. 55 JS06/0534 3 25-06-2007 WO 2006/088962 10001371 In the 1997 H5N1 Hong Kong epidemic, the human mortality rate was approximately 27%. In 2004, of the fifty-two people reported to have been infected by H5N 1 in Asia approximately 70% died. Most recently, nine of the eleven cases in Vietnam from December 28, 2004 to January 27, 2005 died. Although the virulence of 5 the virus appears to have increased, any changes thought to be required for further spread human to human, had been thought not yet to have occurred. However, we now have observed recent substitutions in three H5NI replikin amino acid residues at position numbers 18, 24 and 28 of the Goose Replikin scaffold from isolates in Vietnam, Thailand and China in 2004 (see Table 1). Substitution at site number 24 has io not occurred since the appearance of H5NI in 1959 but was present in the last two influenza pandemics caused by other strains, H2N2 in 1957 and H3N2 in 1968, together responsible for over two million human deaths, and in a recent virulent epidemic caused by H7N7 (see Table 8). While these are only hints of possible danger, these data on substitution, combined with the rising Replikin'count shown in Figure 15 15, and the past correlation of such replikin data with pander :ics, does not give the same reassurance as that obtained from phylogenetic analysis '.aat the virus is unlikely to spread human to human. 1000138] With respect to the H5N1 influenza, FIG. 15 illustrates a rapid increase in the concentration of Replikins per 100 amino acids just prior to epidemics in 1997 20 (indicated as El), 2001 (indicated as E2) and 2004 (indicated as E3). TABLE 8: Replikin Scaffold showing ordered substitution in the 89 year conservation of influenza virus replikin peptides related to rapid replication, from a 1917 goose influenza replikin and the 1918 human pandemic replikin to 2006 H5N1 "Bird Flu" homologues. 25 (SEQ ID NOS: 368-429, respectively, in order of appearance) 1- 29 Amino Acids- 4 ] Year Strain kkgt 1917 HIN fluez Goose Rcpliki ~kg sypklskgYffkLkevlvlwgvbh 1918 HINI Human Influenza Pandernic k sypklB gkevlvlwgvbh 1930 HliNI 30 kk gsypk1 gkevivlwgvhb 1933 HONI kkgtsypklsksytnnkgkeylvlwgvhb 1976 HINI kkgtsypk1sksytnnkgkevlvlwgvhh 1977 HINI kkgsypki)csytnnkgkevi gvhh 1979 HINI kkgsypksksytnnkgkev1V hh 1980 HINI 35 kkgtsypk1sksytnnkgkevvlw9vah 1980 HINI kkgisypk1sksytnnkgkevlvwgvha 1981 HINI kkgtsypkisksytnnkgkevlvlwgvhh 1981 HINI kkgtsypklsksytnnkgkevivigvhh 1985 HIN] kk grsk1Sks6ytnnkgkevl. Mgvh 1991 HINI 40 kk. sypk1sksytnnkgkevl h 1992 HINI kksypklsksytnnkgkevl gvhh 1996 HINI 56 SUBSTITUTE SHEET AMENDED SHEET - IPEA/US WO 2006/088962 PCT/US2006/005343 kkg sypklsksytnnkgkevlvi.wgvhh 1996 HINi kkg sypk1sksyinnkgkev1v1wgvhh 1997 HINI kkg sypk1sks ynnkgkevlvlwgvhh 1998 HINI kkg sypklsksytnnkgkevlv wgvhh 1999 HiN1 5 kkg sypksksytnnkgkevlv wgvhh 2000 HINI kkg sypklsksytnnkgkevlviwgvhh 2001 HINI kkg sypklsksytnnkgkevlv wgvhh 2002 HINI kkg sypk~sks nnkgkevlvlwg hh 1999 HIN2 Influenza kkg sypksksjnnk kevlviwg1 hh 2000 H1N2 10 kkg sypklsks nnkgkgv1v1wg hh 2001 H1N2 kkgtsypklsksytnnkkevlvlwgvhh 2001 H1N2 -k/glypllsksyannkkevivlwgvhh 2002 H1N2 -khglyphlsksynnkekevllwgvhh 2002 HlN2 kkgn yp aki syr n i g q 1 iwgvhh 1957 H2N2 Human Influenza Pandemic 15 kkglpypr c g grliwg hh 1957 H2N2 Human Influenza Pandemic kkegsypkl ks: L.inkkevlvp1wg hh1968 H3N2 Human Influenza Pandemic ---------- k sy nA~rkdp a l iiwg hh 1979-2003 H7N7 Influenza kkyp ikr t an ved 11 >lwg hh 2002 H5N2 Influenza kk nafgp tikr sy' sn nqed1 v1wg hh 1959 H5N1 Influenza (Scotland) 20 kknnayp ikr t e lwg hh 1975 H5N1 (Wisconsin) kknnyptikr ninmedll lwg hh 1981 H5N1 (Minnesota) kkghayptikrt a nhvedll lwg hh 1983 H5N1 (Pennsylvania) kknn t yptikrsy n edll lwg hh 1988 H5N1 (Scotland) kkhsyptikr sy ncgedllvlwg hh 1996 H5NI (China) 25 kk yp~tikrjsy nnged3Ilvlwghh 1997 H5NI (China) kkl sypLikrsy uti edllvlwg hh 1998 H5NI (China) kk.s yptiksyn nged llvlwg hh 1999 H5N1 (China) kknsaptik syn nqgedllvlwg hh 2000 H5N1 (China) kk ypik sy n ngedljlvlwg h 2001 H5NI (China) 30 kknna'yptikr sy nqedl lv1wg jhh 2001 H5NI (China) kknsayp ikr sy ntngedl lvlwg h 2002 H5N1 (China) kkstyptikrsy a ntngedllvlwg hh 2002 H5NI (Thailand) kknst yp tik sy ntnqedL1v1wg hh 2002 H5NI (Vietnam) kkistypikisy nnge dllvlwg hh 2003 H5N1 (Vietnam) 35 k styptikroy nynaedllvlwg hh 2003 H5N1 (Thailand) kkn tsyp t ikr oy nt qed1 1v1wg hh 2003 H5N1 (Sindong, China) yp ikr y ngedl lvlwg hh 2003 H5N1 (China) kkit k syn ngedllymwg hh 2004 H5N1 (Vietnam, highly pathogenic) kkny"p tijk syn tnqedl lvlwg bh 2004 H5N I (Vietnam,"highly pathogenic",gull) 40 kknstyp - ikt sy n tnqed lvlwg., hh 2004 H5N1Viietnam highly pathogenic kknstyp tikrjsy n ned) lvlwg hh 2004 H5N1(Thailand, highly pathogenic) kkis p l ikrsyn ned) lvlwg qh 2004 H5N1 (Thailand, highly pathogenic) kk45 ay iik slvlwg hh 2004 H5N1 (China, highly pathogenic) kkn syp ik sxintnh-ed lvlwg hh 2004 H5N1 (China,"highly pathogenic", goose) 45 kksypik sy ntnged1 lvlwg hh 2004 H5N1 Japan kkjayp iki sy n nged lvlwghh 2005 H5N1 Turkey kknyp aiksy nagesilvlwg hh 2006 H5N1 China (Anhui) * Residues identical to Goose Replikin amino acids unshaded; amino acid substitutions 50 shaded lightly and darkly to show scaffold pattern across years and strains. [000139] Table 8, above, provides further support for the role of replikins in epidemics and pandemics in humans and birds. In Table 8, the history of the Goose 57 /US06/05343 23-U6-2U07 WO 2006/088962 Replikin and its homologues are tracked from 1917 to the present outbreak of avian H5NI virus. Table 8 demonstrates conservation of the "scaffold" homology of the Goose Replikin in virulent strains of influenza. 1000140] Table 8 illustrates the history, by year or smaller time period, of the 5 existence in the protein structure of the Goose Replikin and its homologues in other influenza Replikins. Table 8 further illustrates the history of amino acid substitutions in those homologues and the conservation of certain amino acids of the Replikin structure which are essential to the definition of a Replikin and the function of rapid replication supplied by Replikins. io 1000141] A review of Table 8 illustrates that if random substitution of amino acids were to occur in virulent strains of influenza from 1917 through the present, certain framework amino acids of the Goose Replikin would not be conserved from year to year in strains in which epidemics occurred. However, contrary to what would result from random substitution, virulent strains of influenza from year to year consistently 15 contain conserved amino acids at those positions that define a Replikin. That is, if a substitution were to occur in one of the amino acids that define a Replikin, e.g. lysine or a histidine, the definition of the Replikin would be lost. Nevertheless, the Replikin sequence is conserved over more than 85 years. Thus, since there is conservation of certain amino acids over decades, substitution cannot be said to be completely at 20 random. The faci that substitutions do occur in amino acids that are not essential to the definition of a Replikin (i.e., amino acids other than lysines or histidines) demonstrates the importance of the Replikin in the pathogenicity of the strain. [0001421 It may be further noted from Table 8 that when substitutions do occur, they are seen to occur at certain apparently preferred positions of the Replikin Scaffold. 25 Table 8 illustrates recurring substitutions at positions 1, 3-24 and 26-27. Further, while substitutions occur throughout these positions, a lysine continues to exist at a position 6 to 10 amino acids from the second lysine (which has not been substituted in these virulent strains). 10001431 Even when there is a substitution of a lysine position within the 29 amino 30 acid stretch, as is seen in 1957, when K at position 11 shifts to position 10, that new position has been maintained until 2005, as have YP, AY, N (position 15), and LVLWG (SEQ ID NO: 430) to conserve the homologous structure of the Replikin Scaffold with few exceptions. 58 SUBSTITUTE SHEET AMENDED SHEET - IPEA/US WO 2006/088962 PCT/US2006/005343 [000144] Table 8 demonstrates the integrity of the Replikin Scaffold in virulent strains of influenza. As discussed above, degeneration of the Replikin Scaffold into an Exoskeleton Scaffold is seen to decrease pathogenicity. The integrity and conservation of the Replikin Scaffold, therefore, is seen by the fact that there is generally a fixed 29 5 amino acid sequence that begins with two lysines and ends with two histidines. [0001451 It is important to note that an extra K has appeared in the Replikin Scaffold of a 2006 strain of H5N1 in China (Anhui). This presence of an extra K signals an increase in the Replikin count within the Replikin Scaffold. The 2006 China (Anhui) strain has a Replikin count of 6.6 (as discussed below). A Replikin count of 10 6.6 is the highest ever observed for an H5N1 strain and is comparable in the entire A strain of influenza only to the Replikin count of the influenza strain that caused the 1918 Pandemic. If this initial 2006 report is repeated and maintained, it may indicate that the Counts of 4.5 and 4.0 in 2004 and 2005 respectively will be substantially increased, and foretell a continuing or increased epidemic of H5N1 'Bird Flu'. 15 [000146] An aspect of the present invention is a combination of replikin structure and function to track the pathogenicity or rate of replication of a virus, epidemic or pandemic or to predict the occurrence of epidemics or pandemics. An example of this combination is the ability of the Replikin algorithm of the invention to be used to count increases in Replikin counts in influenza strains such as the strain of 1918 and the 20 current H5N1 strain of H5N1. The Replikin Count of the 1918 influenza pandemic and the current outbreak of "Bird Flu" demonstrate the predictive capacity of this exemplary aspect in accordance with and made possible by the invention. Relation of Some Shrimp White Spot Virus Replikins to Influenza Fixed Scaffold Replikin Structures 25 [000147] The inventors have also established a relationship between virulent influenza virus and white spot virus in the Replikin Scaffold portions of the viruses. No relationship between these two viruses has been suggested previously. Although there is extensive substitution, the applicants' finding of several short Replikins of the Shrimp White Spot Syndrome Virus demonstrate significant homologies to the 30 influenza virus Replikin sequences, especially with regard to length and key lysine (k) and histidine (h) residues (Fixed Scaffold or Replikin Scaffold), suggesting that similar mechanisms of Replikin production are used in both virus groups. 59 US06/05343 25-06-2007 WO 2006/088962 Table 8A - Shrimp White Spot Scaffolding (SEQ ID NOS: 431-440, respectively, in order of appearance) 1917 H1NInfluenza goose peptide 5 2002 HIN Swine Influenza 2000 Shrimp White Spot Syndrome Virus 2000 Shrimp White Spot Syndrome Virus 10 1968 H3N2 Human Influenza Pandemic 1979-2003 H7N7 Influenza 1957 H2N2 Human Influenza Pandemic 15 L 1957 H2N2 Human Influenza Pandemic 2002 H5N2 Influenza 1959 H5N1 Influenza niW tonal 1917 Ooosc Rplki rtidus are shown in Amino acid subistions in s and 20 M 10001481 In addition, since many species, including but not limited to swine and birds, are known to provide animal "reservoirs" for human influenza infection, marine forms such as the shrimp virus can now be examined, with early warning diagnostic benefits possible for outbreaks such as swine flu and bird flu. While similarities of 25 some influenza viruses were noted between species, and the transfer of these viruses interspecies was known, there was no previous quantitative method to gauge virus activity. It has not been possible previously to examine potential reservoirs for increased activity which might move into a different species; thus providing an advanced warning. The activity of the Replikins in each species can now be monitored 30 constantly for evidence of increased viral replication rate and thus emergence of epidemics in that species which may then transfer to other species. 10001491 This data further supports the Replikins as a new class of peptides, with a history of its own, and a shared function of rapid replication and disease of its hosts. With the high mortality for its shrimp host, white spot syndrome virus can now have its 35 Replikins examined as earlier forms of the virus Replikins, or as parallel morphological branches, which in either case may well act as reservoirs for bird and animal Replikins such as those in influenza viruses. The diagnostic and preventive uses of these Replkin findings in shrimp follow as they do in influenza and for other organisms containing Replikins. 60 SUBSTITUTE SHEET AMENDED SHEET - IPEA/US /u~oJOIUJ31f.3 JUL WO 2006/088962 Conservation of Replikin Structures [0001501 Whether Replikin structures are conserved or are subject to extensive natural mutation also was examined by scanning the protein sequences of various isolates of foot and mouth disease virus (FMDV), where mutations in proteins of these 5 viruses have been well documented worldwide for decades. Protein sequences of FMDV isolates were visually examined for the presence of both the entire Replikin and each of the component Replikin amino acid residues observed in a particular Replikin. [000151] Rather than being subject to extensive substitution over time as occurs in neighboring amino acids, the amino acids which comprise the Replikin structure are 10 substituted little or not at all, that is the Replilin structure is conserved. 1000152] For example, in the protein VPI of FMDV type 0, the Replikin (SEQ ID NO: 3) "hkqkivapvk" was found to be conserved in 78% of the 236 isolates reported in PubMed, and each amino acid was found to be conserved in individual isolates as follows: his, 95.6%; lys, 91.8%; gin 92.3%; lys, 84.1%; ile, 90.7%; val, 91.8%; ala, is 97.3%; pro, 96.2%; ala, 75.4%; and lys, 88.4%. The high rate of conservation suggests structural and functional stability of the Replikin structure and provides constant targets for treatment. [0001531 Similarly, sequence conservation was found in different isolates of HIV for its Replikins, such as (SEQ ID NO: 5) "kcfncgkegh" or (SEQ ID NO: 6) 20 "kvylawvpahk" in HIV Type I and (SEQ ID NO: 7) "kcwncgkegh" in HIV Type 2 (Table 2). Further examples of sequence conservation were found in the HIV tat proteins, such as (SEQ ID NO: 441) "hclvckqkkglgisygrkk," wherein the key lysine and histidine amino acids are conserved. (See Table 9). 1000154] Similarly, sequence conservation was observed in plants, for example in 25 wheat, such as in wheat ubiquitin activating enzyme E (SEQ ID NOs. 454 - 456). The Replikins in wheat even provided a reliable target for stimulation of plant growth as described within. Other examples of conservation are seen in the constant presence of malignin in successive generations, over ten years of tissue culture of glioma cells, and by the constancy of affmity of the glioma Replikin for antimalignin antibody isolated 30 by immunoadsorption from 8,090 human sera from the U.S., U.K., Europe and Asia (e.g., Figure 5 and U.S. Patent 6,242,578 BI). 10001551 Similarly, conservation was observed in trans-activator (Tat) proteins in isolates of HIV. Tat (trans-activator) proteins are early RNA binding proteins 61 SUBSTITUTE SHEET AMENDED SHEET - IPEA/US 'US06/05343 25-U6-ZUU 1 WO 2006/088962 regulating lentiviral transcription. These proteins are necessary components in the life cycle of all known lentiviruses, such as the human immunodeficiency viruses (HIV). Tat is a transcriptional regulator protein that acts by binding to the trans-activating response sequence (TAR) RNA element and activates transcription Initiation and/or 5 elongation from the LTR promoter. HIV cannot replicate without tat, but the chemical basis of this has been unknown. In the HIV tat protein sequence from 89 to 102 residues, we have found a Replikin that is associated with rapid replication in other organisms. The amino acid sequence of this Replikin is "HCLVCKQKKGLGISYGRKK" (SEQ ID NO: 441). In fact, we found that this 10 Replikin is present in every HIV tat protein. Some tat amino acids are substituted frequently by alternate amino acids (in small size fonts lined up below the most frequent amino acid (Table 9), the percentage of conservation for the predominant Replikin "HCLVCFQKKGLGISYGRKK" (SEQ ID NO: 442)). These substitutions have appeared for most of the individual amino acids. However, the key lysine and 15 histidine amino acids within the Replikin sequence, which define the Replikin structure, are conserved 100% in the sequence; while substitutions are common elsewhere in other amino acids, both within and outside the Replikin, none occurs on these key histidine amino acids. 10001561 As shown in Table 9 it is not the case that lysines are not substituted in the 20 tat protein amino acid sequence. From the left side of the table, the very first lysine in the immediate neighboring sequence, but outside the Replikin sequence, and the second lysine (k ) in the sequence inside the Replikin, but "extra" in that it is not essential for the Replikin formation, are both substituted frequently. However, the 3rd, 4th and 5th lysines, and the one histidine, in parentheses, which together set up the Replikin 25 structure, are never substituted. Thus, these key amino acid sequences are 100% conserved. As observed in the case of the influenza virus Replikins, random substitution would not permit this selective substitution and selective non-substitution to occur due to chance. Table 9 % Replikin CONSERVATION of each constituent amino acid in the first 117 different isolates of HIV tat protein as reported in PubMed: (SEQ ID NOS: 443-453, respectively, in order of apparance) 3(10D) 5786 (100) (100) 6676 (300) 99 5749 (100) 94 ('00) 97 98 85 97 99 (100X100X100)% Neighboring Amino acids tat Replikin k (c) s y 1(b) (c) I v (c) f q k (k) g (1) g i s y g (r) (k) (k)] 62 SUBSTITUTE SHEET AMENDED SHEET - IPEA/US '/USO6tU3343 2V-U.-LUU I WO 2006/088962 below are the amino acid substitutions observed for each amino acid above: h c f q i lb t a a ly hq r wp II i h q v y s I m r s s m s r a f P q 10001571 The conservation of the Replikin structure suggests that the Replikin structure has a specific survival function for the HIV virus which must be preserved and conserved, and cannot be sacrificed to the virus'defense' maneuver of amino acid substitution created to avoid antibody and other 'attack.' These 'defense' 5 functions, although also essential, cannot 'compete' with he virus survival function of HIV replication. [0001581 Further conservation was observed in different isolates of HV for its Replikins such as "kcfncgkegh" (SEQ ID NO: 5) or "kvylawvpahk" (SEQ ID NO: 6) in HIV Type I and "kcwncgkegh" (SEQ ID NO: 7) in HIV Type 2. The high rate of i o conservation observed in FMVD and HIV Replikins suggests that conservation also observed in the Replikins of influenza Replikins is a general property of viral Replikins. This conservation makes them a constant and reliable targeted for either destruction, for example by using specific Replikins such as for influenza, FMVD or HIV vaccines as illustrated for the glioma Replikin, or stimulation. 15 10001591 , as provided in examples found in viruses including influenza viruses, FMDV, and HIV, where high rates of conservation in Replikins suggest that conservation is a general property of viral Replikins and thus making Replikins a constant and reliable target for destruction or stimulation, conservation of Replikin structures occurs in plants. For example, in wheat plants, Replikins are conserved 20 and provide a reliable target for stimulation. Examples of conserved Replikins in wheat plants ubiquitin activating enzyme E include: E3 HKDRLTKKVVDIAREVAKVDVPEYRRH (SEQ ID NO: 454) E2 HKERLDRKVVDVAREVAKVEVPSYRRH (SEQ ID NO: 455) El HKERLDRKVVDVAREVAKMEVPSYRRH (SEQ ID NO: 456) 63 SUBSTITUTE SHEET AMENDED SHEET - IPEA/US WO 2006/088962 PCT/US2006/005343 * * * ** * 10001601 Similarly to conservation found in the HIV tat protein, the Replikin in the wheat ubiquitin activating enzyme E is conserved. As with the HIV tat protein, substitutions of amino acids (designated with an '*') adjacent to the Replikin variant 5 forms in wheat ubiquitin activating enzyme E are common. The key k and h amino acids that form the Replikin structure, however, do not vary whereas the 'unessential' k that is only 5 amino acids (from the first k on the left) is substituted. Anti-Replikin Antibodies 10001611 An anti-Replikin antibody is an antibody against a Replikin. Data on 10 anti-Replikin antibodies also support Replikin class unity. An anti-Replikin antibody response has been quantified by immunoadsorption of serum antimalignin antibody to immobilized malignin (see Methods in U.S. Patent No. 5,866,690). The abundant production of antimalignin antibody by administration to rabbits of the synthetic version of the 16-mer peptide whose sequence was derived from malignin, 15 absent carbohydrate or other groups, has established rigorously that this peptide alone is an epitope, that is, provides a sufficient basis for this immune response (Figure 3). The 16-mer peptide produced both IgM and IgG forms of the antibody. Antimalignin antibody was found to be increased in concentration in serum in 37% of 79 cases in the U.S. and Asia of hepatitis B and C, early, in the first five years of 20 infection, long before the usual observance of liver cancer, which develops about fifteen to twenty-five years after infection. Relevant to both infectious hepatitis and HIV infections, transformed cells may be one form of safe haven for the virus: prolonging cell life and avoiding virus eviction, so that the virus remains inaccessible to anti-viral treatment. 25 [0001621 Because administration of Replikins stimulates the immune system to produce antibodies having a cytotoxic effect, peptide vaccines based on the particular influenza virus Replikin or group of Replikins observed to be most concentrated over a given time period provide protection against the particular strain of influenza most likely to cause an outbreak in a given influenza season, e.g., an 30 emerging strain or re-emerging strain For example, analysis of the influenza virus hemagglutinin amino acid sequence on a yearly or bi-yearly basis, provides data which are useful in formulating a specifically targeted influenza vaccine for that year. It is understood that such analysis may be conducted on a region-by-region 64 WO 2006/088962 PCT/US2006/005343 basis or at any desired time period, so that strains emerging in different areas throughout the world can be detected and specifically targeted vaccines for each region can be formulated. Influenza Vaccines, Treatments and Therapeutics 5 [000163] Currently, vaccine formulations for influenza are changed twice yearly at international WHO and CDC meetings. Vaccine formulations are based on serological evidence of the most current preponderance of influenza virus strain in a given region of the world. However, prior to the present invention there has been no correlation of influenza virus strain specific amino acid sequence changes with 10 occurrence of influenza epidemics or pandemics. [0001641 The observations of specific Replikins and their concentration in influenza virus proteins provides the first specific quantitative early chemical correlates of influenza pandemics and epidemics and provides for production and timely administration of influenza vaccines tailored specifically to treat the prevalent 15 emerging or re-emerging strain of influenza virus in a particular region of the world. By analyzing the protein sequences of isolates of strains of influenza virus, such as the hemagglutinin protein sequence, for the presence, concentration and/or conservation of Replikins, influenza virus pandemics and epidemics can be predicted. Furthermore, the severity of such outbreaks of influenza can be 20 significantly lessened by administering an influenza peptide vaccine based on the Replikin sequences found to be most abundant or shown to be on the rise in virus isolates over a given time period, such as about one to about three years. [000165] An influenza peptide vaccine of the invention may include a single Replikin peptide sequence or may include a plurality of Replikin sequences 25 observed in influenza virus strains. Preferably, the peptide vaccine is based on Replikin sequence(s) shown to be increasing in concentration over a given time period and conserved for at least that period of time. However, a vaccine may include a conserved Replikin peptide(s) in combination with a new Replikin(s) peptide or may be based on new Replikin peptide sequences. The Replikin peptides 30 can be synthesized by any method, including chemical synthesis or recombinant gene technology, and may include non-Replikin sequences, although vaccines based on peptides containing only Replikin sequences are preferred. Preferably, vaccine 65 USUOMJV 4 L-UO-mLU I WO 2006/088962 compositions of the invention also contain a pharmaceutically acceptable carrier and/or adjuvant. [000166] The influenza vaccines of the present invention can be administered alone or in combination with antiviral drugs, such as gancyclovir; interferon; 5 interleukin; M2 inhibitors, such as, amantadine, rimantadine; neuraminidase inhibitors, such as zanamivir and oseltamivir; and the like, as well as with combinations of antiviral drugs. 10001671 The influenza vaccine of the present invention may be administered to any animal capable of producing antibodies in an immune response. For example, 10 the influenza vaccine of the present invention may be administered to a rabbit, a chicken, a pig or a human. Because of the universal nature of replikin sequences, an influenza vaccine of the invention may be directed at a range of strains of influenza or a specific strain of influenza. 10001681 In a non-limiting aspect in accordance with the present invention, an 15 influenza vaccine may be directed to an immune response against animal or human strain of influenza including influenza B, (A)HINI, (A)H2N2 and (A)H3N2, or any human variant of the virus that may arise hereafter, as well as strains of influenza predominantly in animals such as the current avian H5N 1. An influenza vaccine may further be directed to a particular replikin amino acid sequence in any portion 20 of an influenza protein. [0001691 In a non-limiting aspect in accordance with the present invention, an influenza vaccine may comprise a Replikin Scaffold of the H5NI virus such as KKNSTYPTKRSYNNTNQEDLLVLWGlHH (SEQ ID NO: 15). In a further non limiting aspect, an influenza vaccine may comprise a UTOPE such as KKKKH 25 (SEQ ID NO: 457) or KKKKHKKKKKH (SEQ ID NO: 458). In a further alternative, a vaccine may comprise the addition of an adjuvant such as the well known key limpet hemocyanin denoted with the abbreviation -KLH. In yet a further preferred non-limiting aspect, an influenza vaccine may comprise a Replikin Scaffold of influenza H5NI further comprising two UTOPES and an adjuvent 30 sequence such as KKNSTYPT]KRSY TNQEDLNLVLWGHHKKKKHKKKKKHK (SEQ ID NO: 16) -KLH (denoting a key limpet hemocyanin adjuvant) (Vaccine V120304U2). An aspect of the present invention may comprise the Replikin Scaffold previously 66 AMEfNDEDSYf9yPLMUS US06/05343 25-06-2007 WO 20061088962 constructed and shown in Table 8 as one of the Bird Flu Replikins labelled"200 4 H5NI Vietnam, highly pathogenic." With administration of 100 ug of the peptide of Vaccine V120304U2 injected subcutaneously into rabbits and chickens an antibody response was observed from unvaccinated dilutions of less than 1:50 to reach a peak 5 in the third to fourth week after vaccination of from a dilution of 1:120,000 to greater than 1:240,000. (See Example 7.) Repition and Overlapping Replikin Structures [0001701 Analysis of the primary structure of a Plasmodium farciparum malaria antigen located at the merozoite surface and/or within the parasitophorous vacuole to revealed that this organism, like influenza virus, also contains numerous Replikins. However, there are several differences between the observation of Replikins in Plasmodium falciparum and influenza virus isolates. For example, Plasmodium falciparum contains several partial Replikins. Another difference seen in Plasmodium falciparum is a frequent repetition of individual Replikin structures 15 within a single protein, which was not observed with influenza virus. Repetition may occur by (a) sharing of lysine residues between Replikins, and (b) by repetition of a portion of a Replikin sequence within another Replikin sequence. High Concentrations of Replikin Correlates with Rapid Replication [0001721 Tomato leaf curl Gemini virus has devastated tomato crops in China 20 and in many other parts of the world. Its replikins reach high counts because of overlapping replikins as illustrated below in a virus isolated in Japan where the replikin count was 20.7 10001731 The relationship of higher Replikin concentration to rapid replication i5 also confirmed by analysis of HIV isolates. It was found that the slow-growing low 67 AMENDED PEBUilJ

TUS

WO 2006/088962 PCT/US2006/005343 titer strain of HIV (NSI, "Bru," which is prevalent in early stage HIV infection) has a Replikin concentration of 1.1 (+/- 1.6) Replikins per 100 amino acids, whereas the rapidly-growing high titer strain of HIV (S1, "Lai", which is prevalent in late stage HIV infection) has a Replikin concentration of 6.8 (+/- 2.7) Replikins per 100 amino 5 acid residues. Passive Immunity [0001741 In another aspect of the invention, isolated Replikin peptides may be used to generate antibodies, which may be used, for example to provide passive immunity in an individual. Passive immunity to the strain of influenza identified by 10 the method of the invention to be the most likely cause of future influenza infections may be obtained by administering antibodies to Replikin sequences of the identified strain of influenza virus to patients in need. Similarly, passive immunity to malaria may be obtained by administering antibodies to Plasmodium falciparum Replikin(s). [000175] Various procedures known in the art may be used for the production of 15 antibodies to Replikin sequences. Such antibodies include but are not limited to polyclonal, monoclonal, chimeric, humanized, single chain, Fab fragments and fragments produced by an Fab expression library. Antibodies that are linked to a cytotoxic agent may also be generated. Antibodies may also be administered in combination with an antiviral agent. Furthermore, combinations of antibodies to 20 different Replikins may be administered as an antibody cocktail. [000176] For the production of antibodies, various host animals or plants may be immunized by injection with a Replikin peptide or a combination of Replikin peptides, including but not limited to rabbits, mice, rats, and larger mammals. [000177] Monoclonal antibodies to Replikins may be prepared by using any 25 technique that provides for the production of antibody molecules. These include but are not limited to the hybridoma technique originally described by Kohler and Milstein, (Nature, 1975, 256:495-497), the human B-cell hybridoma technique (Kosbor et al., 1983, Immunology Today, 4:72), and the EBV hybridoma technique (Cole et al., Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77 30 96). In addition, techniques developed for the production of chimeric antibodies (Morrison et al., 1984, Proc. Nat. Acad. Sci USA, 81:6851-6855) or other techniques may be used. Alternatively, techniques described for the production of 68 WO 2006/088962 PCT/US2006/005343 single chain antibodies (US 4,946,778) can be adapted to produce Replikin-specific single chain antibodies. [000178] Particularly useful antibodies of the invention are those that specifically bind to Replikin sequences contained in peptides and/or polypeptides of influenza 5 virus. For example, antibodies to any of peptides observed to be present in an emerging or re-emerging strain of influenza virus and combinations of such antibodies are useful in the treatment and/or prevention of influenza. Similarly, antibodies to any Replikins present on malaria antigens and combinations of such antibodies are useful in the prevention and treatment of malaria. 10 [000179] Antibody fragments which contain binding sites for a Replikin may be generated by known techniques. For example, such fragments include but are not limited to F(ab')2 fragments which can be produced by pepsin digestion of the antibody molecules and the Fab fragments that can be generated by reducing the disulfide bridges of the F(ab')2 fragments. Alternatively, Fab expression libraries 15 can be generated (Huse et al., 1989, Science, 246:1275-1281) to allow rapid and easy identification of monoclonal Fab fragments with the desired specificity. [0001801 The fact that antimalignin antibody is increased in concentration in human malignancy (see Figure 5), regardless of cancer cell type, and that this antibody binds to malignant cells regardless of cell type now may be explained by 20 the presence of the Replikin structures herein found to be present in most malignancies (Figure 1 and Table 2). Population studies have shown that antimalignin antibody increases in concentration in healthy adults with age, and more so in high-risk families, as the frequency of cancer increases. An additional two-fold or greater antibody increase, which occurs in early malignancy, has been 25 independently confirmed with a sensitivity of 97% in breast cancers 1-10 mm in size. Shown to localize preferentially in malignant cells in vivo, histochemically the antibody does not bind to normal cells but selectively binds to (Figure 4A,B) and is highly cytotoxic to transformed cells in vitro (Figure 4C-F). Since in these examples the same antibody is bound by several cell types, that is, brain glioma, 30 hematopoietic cells (leukemia), and small cell carcinoma of lung, malignant Replikin class unity is again demonstrated. [000181] Antimalignin does not increase with benign proliferation, but specifically increases only with malignant transformation and replication in breast in 69 WO 2006/088962 PCT/US2006/005343 vivo and returns from elevated to normal values upon elimination of malignant cells (Figure 5). Antimalignin antibody concentration has been shown to relate quantitatively to the survival of cancer patients, that is, the more antibody, the longer the survival. Taken together, these results suggest that anti-Replikin antibodies may 5 be a part of a mechanism of control of cell transformation and replication. Augmentation of this immune response may be useful in the control of replication, either actively with synthetic Replikins as vaccines, or passively by the administration of anti-Replikin antibodies, or by the introduction of non-immune based organic agents, such as for example, carbohydrates, lipids and the like, which 10 are similarly designed to target the Replikin specifically. [0001821 In another aspect of the invention, immune serum containing antibodies to one or more Replikins obtained from an individual exposed to one or more Replikins may be used to induce passive immunity in another individual or animal. Immune serum may be administered via i.v. to a subject in need of treatment. 15 Passive immunity also can be achieved by injecting a recipient with preformed antibodies to one or more Replikins. Passive immunization may be used to provide immediate protection to individuals who have been exposed to an infectious organism. Administration of immune serum or preformed antibodies is routine and the skilled practitioner can readily ascertain the amount of serum or antibodies 20 needed to achieve the desired effect. Synthetic Replikin Vaccines (Active Immunity) [000183] Synthetic Replikin vaccines, based on Replikins such as the glioma Replikin (SEQ ID NO: 1) "kagvaflhkk" or the hepatitis C Replikin (SEQ ID NO: 18) "hyppkpgcivpak", or HIV Replikins such as (SEQ ID NO: 5) "kcfncgkegh" or 25 (SEQ ID NO: 6) "kvylawvpahk" or preferably, an influenza vaccine based on conserved and/or emerging or re-emerging Replikin(s) over a given time period may be used to augment antibody concentration in order to lyse the respective virus infected cells and release virus extracellularly where chemical treatment can then be effective. Similarly, a malaria vaccine, based on Replikins observed in Plasmodium 30 falciparum malaria antigens on the merozoite surface or within the parasitophorous vacuole, for example, can be used to generate cytotoxic antibodies to malaria. Table 7 shows the relation of shortening or compacting of Replikin sequences to mortality rate caused by the organisms which contain these Replikins, to as short as seven 70 LUUDWUi4i L)-Uo-LIJU / WO 2006/088962 amino acids. This correlation has been found by us to be a general phenomenon regardless of the type of organism. We have also found that there may be a progression over time to the shortened Replikin structure, as in influenza and SARS viruses. 5 10001841 There is abundant evidence that there are constant evolutionary and competitive pressures for the emergence of constantly increasing "efficacy" of each infectious organism. Based upon these observations, and by projection, it would appear that if evolutionary pressures are towards shorter and shorter Replikins, with higher and higher concentrations of lysine (k), to as high as 70% as in EEL leukemia 10 (Table 7), then the projected theoretical ideal would be the shortest possible Replikin permitted by the algorithm which defines a Replikin, that is six amino acids (two ks six to ten amino acids apart), with the highest possible % k ( see deduced Replikin "kkkkhk" (SEQ ID NO: 459), which contains 83.3% k, 5/6, and one obligatory "h"). We have therefore, so-to-speak, taken what appears to be, or might 15 be, the next evolutionary step, not apparently as yet taken by the organisms themselves, and devised the resultant deduced Replikins to use as general vaccines. [0001851 These Replikins which we have deduced have maximum %'k's, therefore maximum potential binding capacity, plus the constituent 'h' by definition required for the Replikin, giving the potential for 'h' connection to redox energy 20 systems. These devised Replikins are least likely to be cleaved by organisms because of their short length (proteins are cleaved to 6 to 10 amino acids long in processing for presentation to and recognition by immune cells), therefore most likely to present intact to immune-forming apparatuses in the organism to which they are administered, and, because of their high k content, they are most likely to 25 generate a maximum immune response which mimics and may increase the maximum such response which can be generated against short homologous high mortality Replikins. [0001861 Further, we have found that high % k Replikins generate the highest antibody responses when administered to rabbits. These synthetic peptides, 30 designed by us, are designated as Universal synthetic epitopes, or "UTOPE's", and the vaccines based upon these UTOPEs, are designated "UVAX"s. UVAXs, deduced synthetic vaccines, may be used as sole vaccines or as adjuvants when 71 AMENDEIS9MPWUEUS /US06/05343 25-06-2007 WO 20061088962 administered with more specific Replikin vaccines or other vaccines. The following are examples of deduced UTOPEs and UVAXs: DEVISED SYNTHETIC REPLIKIN SEQ ID NO: (UTOPE OR UVAX) 5 KKKKHK 459 KKKHKK 460 KKHKKK 461 KHKKKK 462 KKKXK1 463 10 KKKKKHK 464 KKKKHKK 465 KKKHiKKK 466 KKHKKKK 467 KHKKKKK 468 15 HKKKKKK 469. [0001871 Recognin and/or Replikin peptides may be administered to a subject to induce the immune system of the subject to produce anti-Replikin antibodies. Generally, a 0.5 to about 2 mg dosage, preferably a 1 mg dosage of each peptide is 20 administered to the subject to induce an immune response. Subsequent dosages may be administered if desired. 1000188] The Replikin sequence structure is associated with the function of replication. Thus, whether the Replikins of this invention are used for targeting sequences that contain Replikins for the purpose of diagnostic identification, 25 promoting replication, or inhibiting or attacking replication, for example, the structure-function relationship of the Replikin is fundamental. 10001891 It is preferable to utilize only the specific Replikin structure when seeking to induce antibodies that will recognize and attach to the Replikin fragment and thereby cause destruction of the cell. Even though the larger protein sequence 30 may be known in the art as having a "replication associated function," vaccines using the larger protein often have failed or proven ineffective. 10001901 Although the present inventors do not wish to be held to a single theory, the studies herein suggest that the prior art vaccines are ineffective because 72 AMEfNDEIEMTEUlUUS WO 2006/088962 PCT/US2006/005343 they are based on the use of the larger protein sequence. The larger protein sequence invariably has one or more epitopes (independent antigenic sequences that can induce specific antibody formation); Replikin structures usually comprise one of these potential epitopes. The presence of other epitopes within the larger protein 5 may interfere with adequate formation of antibodies to the Replikin, by "flooding" the immune system with irrelevant antigenic stimuli that may preempt the Replikin antigens, See, e.g., Webster, R.G., J. Immunol., 97(2):177-183 (1966); and Webster et al., J. Infect. Dis., 134:48-58, 1976; Klenerman et al, Nature 394:421-422 (1998) for a discussion of this well-known phenomenon of antigenic primacy whereby the 10 first peptide epitope presented and recognized by the immune system subsequently prevails and antibodies are made to it even though other peptide epitopes are presented at the same time. This is another reason that, in a vaccine formulation, it is important to present the constant Replikin peptide to the immune system first, before presenting other epitopes from the organism so that the Replikin is not pre 15 empted but lodged in immunological memory. [000191] The formation of an antibody to a non-Replikin epitope may allow binding to the cell, but not necessarily lead to cell destruction. The presence of structural "decoys" on the C-termini of malaria proteins is another aspect of this ability of other epitopes to interfere with binding of effective anti-Replikin 20 antibodies, since the decoy epitopes have many lysine residues, but no histidine residues. Thus, decoy epitopes may bind anti-Replikin antibodies, but may keep the antibodies away from histidine-bound respiratory enzymes. Treatment may therefore be most efficacious in two stages: 1) proteases to hydrolyze decoys, then; 2) anti-Replikin antibodies or other anti-Replikin agents. 25 [0001921 It is well known in the art that in the course of antibody production against a "foreign" protein, the protein is first hydrolyzed into smaller fragments. Usually fragments containing from about six to ten amino acids are selected for antibody formation. Thus, if hydrolysis of a protein does not result in Replikin containing fragments, anti-Replikin antibodies will not be produced. In this regard, 30 it is interesting that Replikins contain lysine residues located six to ten amino acids apart, since lysine residues are known to bind to membranes. [0001931 Furthermore, Replikin sequences contain at least one histidine residue. Histidine is frequently involved in binding to redox centers. Thus, an antibody that 73 WO 2006/088962 PCT/US2006/005343 specifically recognizes a Replikin sequence has a better chance of inactivating or destroying the cell in which the Replikin is located, as seen with anti-malignin antibody, which is perhaps the most cytotoxic anti-cancer antibody yet described, being active at picograms per cell. 5 [000194] One of the reasons that vaccines directed towards a particular protein antigen of a disease causing agent have not been fully effective in providing protection against the disease (such as foot and mouth vaccine which has been developed against the VP 1 protein or large segments of the VP 1 protein) is that the best antibodies have not been produced, that is - it is likely that the antibodies to the 10 Replikins have not been produced. Replikins have not been produced. That is, either epitopes other than Replikins present in the larger protein fragments may interfere according to the phenomenon of antigenic primacy referred to above, and/or because the hydrolysis of larger protein sequences into smaller sequences for processing to produce antibodies results in loss of integrity of any Replikin structure 15 that is present, e.g., the Replikin is cut in two and/or the histidine residue is lost in the hydrolytic processing. The present studies suggest that for an effective vaccine to be produced, the Replikin sequences, and no other epitope, should be used as the vaccine. For example, a vaccine of the invention can be generated using any one of the Replikin peptides identified by the three-point recognition system. 20 [000195] Particularly preferred peptides - for example - an influenza vaccine include peptides that have been demonstrated to be conserved over a period of one or more years, preferably about three years or more, and/or which are present in a strain of influenza virus shown to have the highest increase in concentration of Replikins relative to Replikin concentration in other influenza virus strains, e.g., an 25 emerging strain. The increase in Replikin concentration preferably occurs over a period of at least about six months to one year, preferably at least about two years or more, and most preferably about three years or more. Among the preferred Replikin peptides for use in an influenza virus vaccine are those Replikins observed to "re emerge" after an absence from the hemagglutinin amino acid sequence for one or 30 more years. [0001961 The Replikin peptides of the invention, alone or in various combinations are administered to a subject, preferably by i.v. or intramuscular injection, in order to stimulate the immune system of the subject to produce 74 WO 2006/088962 PCT/US2006/005343 antibodies to the peptide. Generally the dosage of peptides is in the range of from about 0.1 tg to about 10 mg, preferably about 10 gg to about 1 mg, and most preferably about 50 pg to about 500 ug. The skilled practitioner can readily determine the dosage and number of dosages needed to produce an effective 5 immune response. Quantitative Measurement Early Response(s) to Replikin Vaccines [0001971 The ability to measure quantitatively the early specific antibody response in days or a few weeks to a Replikin vaccine is a major practical advantage over other vaccines for which only a clinical response months or years later can be 10 measured. Adjuvants [000198] Various adjuvants may be used to enhance the immunological response, depending on the host species, including but not limited to Freund's (complete and incomplete), mineral gels, such as aluminum hydroxide, surface 15 active substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, key limpet hemocyanin, dintrophenol, and potentially useful human adjuvants such as BCG and Corynebacterium parvum. In addition to the use of synthetic UTOPEs as vaccines in themselves, UTOPEs can be used as adjuvants to other Replikin vaccines and to non-Replikin vaccines. 20 Replikin Nucleotide Sequences [000199] Replikin DNA or RNA may have a number of uses for the diagnosis of diseases resulting from infection with a virus, bacterium or other Replikin encoding agent. For example, Replikin nucleotide sequences may be used in hybridization assays of biopsied tissue or blood, e.g., Southern or Northern analysis, including in 25 situ hybridization assays, to diagnose the presence of a particular organism in a tissue sample or an environmental sample, for example. The present invention also contemplates kits containing antibodies specific for particular Replikins that are present in a particular pathogen of interest, or containing nucleic acid molecules (sense or antisense) that hybridize specifically to a particular Replikin, and 30 optionally, various buffers and/or reagents needed for diagnosis. [000200] Also within the scope of the invention are oligoribonucleotide sequences, that include antisense RNA and DNA molecules and ribozymes that function to inhibit the translation of Replikin- or recognin-containing mRNA. Both 75 WO 2006/088962 PCT/US2006/005343 antisense RNA and DNA molecules and ribozymes may be prepared by any method known in the art. The antisense molecules can be incorporated into a wide variety of vectors for delivery to a subject. The skilled practitioner can readily determine the best route of delivery, although generally i.v. or i.m. delivery is routine. The dosage 5 amount is also readily ascertainable. [000201] Particularly preferred antisense nucleic acid molecules are those that are complementary to a Replikin sequence contained in a mRNA encoding, for example, an influenza virus polypeptide, wherein the Replikin sequence comprises from 7 to about 50 amino acids including (1) at least one lysine residue located six 10 to ten residues from a second lysine residue; (2) at least one histidine residue; and (3) at least 6% lysine residues. More preferred are antisense nucleic acid molecules that are complementary to a Replikin present in the coding strand of the gene or to the mRNA encoding the influenza virus hemagglutinin protein, wherein the antisense nucleic acid molecule is complementary to a nucleotide sequence encoding 15 a Replikin that has been demonstrated to be conserved over a period of six months to one or more years and/or which are present in a strain of influenza virus shown to have an increase in concentration of Replikins relative to Replikin concentration in other influenza virus strains. The increase in Replikin concentration preferably occurs over a period of at least six months, preferably about one year, most 20 preferably about two or three years or more. [0002021 Similarly, antisense nucleic acid molecules that are complementary to mRNA those that are complementary to a mRNA encoding bacterial Replikins comprising a Replikin sequence of from 7 to about 50 amino acids including (1) at least one lysine residue located six to ten residues from a second lysine residue; (2) 25 at least one histidine residue; and (3) at least 6% lysine residues. More preferred are antisense nucleic acid molecules that are complementary to the coding strand of the gene or to the mRNA encoding a protein of the bacteria. Further Aspects of Replikins [000203] In an aspect of the present invention a method of preventing or treating 30. a virus infection comprising administering to a patient in need thereof a preventive or therapeutic virus vaccine is provided comprising at least one isolated Replikin present in a protein of an emerging strain of the virus and a pharmaceutically acceptable carrier and/or adjuvant. In a further aspect of the invention the isolated 76 WO 2006/088962 PCT/US2006/005343 or synthesized peptides are influenza virus peptides. In yet a further aspect of the invention, the isolated or synthesized peptides are H5N1 influenza virus peptides [0002041 The present invention also provides a method of making a preventive or therapeutic virus vaccine comprising: 5 (1) identifying a strain of a virus as an emerging strain, (2) selecting at least one Replikin sequence present in the emerging strain as a peptide template for the virus vaccine manufacture, (3) synthesizing peptides having the amino acid sequence of the at least one Replikin sequence selected in step (2), and 10 (4) combining a therapeutically effective amount of the peptides of step (3) with a pharmaceutically acceptable carrier and/or adjuvant. In a further aspect of the method of making a preventive or therapeutic virus vaccine, the isolated Replikin is from influenza virus. In still a further aspect, the isolated Replikin is from an influenza H5N1 virus. 15 [000205] In another aspect, the invention is directed to a method of identifying an emerging strain of a virus for diagnostic, preventive or therapeutic purposes comprising: (1) obtaining at least one isolate of each strain of a plurality of strains of the virus; 20 (2) analyzing the amino acid sequence of the at least one isolate of each strain of the plurality of strains of the virus for the presence and concentration of Replikin sequences; (3) comparing the concentration of Replikin sequences in the amino acid sequence of the at least one isolate of each strain of the plurality of 25 strains of the virus to the concentration of Replikin sequences observed in the amino acid sequence of each of the strains from at least one earlier time period to provide the concentration of Replikins for at least two time periods, said at least one earlier time period being within about six months to about three years prior to step (1); 30 and (4) identifying the strain of the virus having the highest increase in concentration of Replikin sequences during the at least two time periods. 77 WO 2006/088962 PCT/US2006/005343 1000206] In another aspect of the invention there is provided a process for stimulating the immune system of a subject to produce antibodies that bind specifically to a Replikin sequence, said process comprising administering to the subject an effective amount of a dosage of a composition comprising at least one 5 Replikin peptide. A further aspect of the present invention comprises at least one peptide that is present in an emerging strain of the organism if such new strain emerges. Another aspect of the present invention comprises at least one peptide that is present in influenza H5Nl. [0002071 The present invention also provides antibodies that bind specifically to 10 a Replikin, as defined herein, as well as antibody cocktails containing a plurality of antibodies that specifically bind to Replikins. Another aspect of the present invention provides compositions comprising an antibody or antibodies that specifically bind to a Replikin and a pharmaceutically acceptable carrier. [0002081 In one aspect of the invention there are provided isolated, or separated 15 from other proteins, recombinant, or synthesized peptides or other methods containing a viral Replikin sequence. [000209] The present application also provides isolated, or separated from nucleocapsid proteins, amongst others, recombinant, or synthesized peptides or other methods containing a viral Replikin sequence. 20 [000210] In another aspect of the invention there is provided a process for stimulating the immune system of a subject to produce antibodies that bind specifically to a viral Replikin sequence, said process comprising administering to the subject an effective amount of a dosage of a composition comprising at least one Replikin peptide. Another aspect of the present invention comprises at least one 25 peptide that is present in an emerging strain of the virus if such new strain emerges. [000211] The present invention also provides antibodies that bind specifically to a viral Replikin, as defined herein, as well as antibody cocktails containing a plurality of antibodies that specifically bind to viral Replikins. Another aspect of the present invention provides compositions comprising an antibody or antibodies that 30 specifically bind to a viral Replikin and a pharmaceutically acceptable carrier. [0002121 The present invention also provides therapeutic compositions comprising one or more of isolated Replikin virus peptides and a pharmaceutically acceptable carrier. 78 WO 2006/088962 PCT/US2006/005343 [000213] In another aspect of the invention there is provided an antisense nucleic acid molecule complementary to a virus Replikin mRNA sequence, said Replikin mRNA sequence denoting from 7 to about 50 amino acids comprising: (1) at least one lysine residue located six to ten residues from a second 5 lysine residue; (2) at least one histidine residue; and (3) at least 6% lysine residues. [000214] In yet another aspect of the invention there is provided a method of simulating the immune system of a subject to produce antibodies to viruses, said 10 method comprising: administering an effective amount of at least one virus Replikin peptide. [000215] In another aspect, there is provided a method of selecting a virus peptide for inclusion in a preventive or therapeutic virus vaccine comprising: (1) obtaining at least on isolate of each strain of a plurality of strains of 15 said virus; (2) analyzing the amino acid sequence of the at least one isolate of each strain of the plurality of strains of the virus for the presence and concentration of Replikin sequences; (3) comparing the concentration of Replikin sequences in the amino acid 20 sequence of the at least one isolate of each strain of the plurality of strains of the virus to the concentration of Replikin sequences observed in the amino acid sequence of each of the strains at least one earlier time period to provide the concentration of Replikins for at least two time periods, said at least one earlier time period being 25 within about six months to about three years prior to step (1); (4) identifying the strain of the virus having the highest increase in concentration of Replikin sequences during the at least two time periods; and (5) selecting at least one Replikin sequence present in the strain of the 30 virus peptide identified in step (4) as a peptide for inclusion in the virus vaccine. [000216] In one aspect of the invention there are provided isolated or synthesized influenza virus peptides comprising a Replikin sequence. 79 WO 2006/088962 PCT/US2006/005343 10002171 In another aspect of the invention, there is provided a process for stimulating the immune system of a subject to produce antibodies that bind specifically to an influenza virus Replikin sequence, said process comprising administering to the subject an effective amount of dosage of a composition 5 comprising at least one influenza virus Replikin peptide. A further aspect of the present invention comprises at least one Replikin peptide that is present in an emerging strain of influenza virus. Yet another aspect of the present invention comprises a composition comprising at least one influenza H5Nl Replikin peptide. [000218] The present invention also provides antibodies that bind specifically to 10 an influenza virus Replikin, as defined herein, as well as antibody cocktails containing a plurality of antibodies that specifically bind to influenza virus Replikins. In another aspect of the present invention, there are provided compositions comprising an antibody or antibodies that specifically bind to an influenza Replikin and a pharmaceutically acceptable carrier, 15 [000219] The present invention also provides therapeutic compositions comprising one or more of isolated influenza virus peptides having from 7 to about 50 amino acids comprising: (1) at least one lysine residue located six to ten residues form a second lysine residue; 20 (2) at least one histidine residue; and (3) at least 6% lysine residues, and a pharmaceutical acceptable carrier. [000220] In another aspect of the invention there is provided an antisense nucleic acid molecule complementary to an influenza virus hemagglutinin Replikin mRNA sequence, said Replikin mRNA sequence denoting from 7 to about 50 amino acids 25 comprising: (1) at least one lysine residue located six to ten residues from a second lysine residue; (2) at least one histidine residue; and (3) at least 6% lysine residues. 30 [000221] In yet another aspect of the invention there is provided a method of simulating the immune system of a subject to produce antibodies to influenza virus comprising administering an effective amount of at least one influenza virus Replikin peptide having from 7 to about 50 amino acids comprising: 80 WO 2006/088962 PCT/US2006/005343 (1) at least one lysine residue located six to ten amino acid residues from a second lysine residue; (2) at least one histidine residue; and (3) at least 6% lysine residues. 5 [000222] In another aspect, there is provided a method of selecting an influenza virus peptide for inclusion in a preventive or therapeutic influenza virus vaccine comprising: (1) obtaining at least one isolate of each strain of a plurality of strains of influenza virus; 10 (2) analyzing the hemagglutinin amino acid sequence of the at least one isolate of each strain of the plurality of strains of influenza virus for the presence and concentration of Replikin sequences; (3) comparing the concentration of Replikin sequences in the hemagglutinin amino acid sequence of the at least one isolate of each 15 strain of the plurality of strains of influenza virus to the concentration of Replikin sequences observed in the hemagglutinin amino acid sequence of each of the strains at least one earlier time period to provide the concentration of Replikins for at least two time periods, said at least one earlier time period being within about six months to 20 about three years prior to step (1); (4) identifying the strain of influenza virus having the highest increase in concentration of Replikin sequences during the at least two time periods; (5) selecting at least one Replikin sequence present in the strain of 25 influenza virus peptide identified in step (4) as a peptide for inclusion in an influenza virus vaccine, 10002231 The present invention also provides a method of making a preventive or therapeutic influenza virus vaccine comprising: (1) identifying a strain of influenza virus as an emerging strain; 30 (2) selecting at least one Replikin sequence present in the emerging strain as a peptide template for influenza virus vaccine manufacture, (3) synthesizing peptides having the amino acid sequence of the at least one Replikin sequence selected in step (2), and 81 WO 2006/088962 PCT/US2006/005343 (4) combining a therapeutically effective amount of the peptides of step (3) with a pharmaceutically acceptable carrier and/or adjuvant. [0002241 In another aspect, the invention is directed to a method of identifying an emerging strain of influenza virus for diagnostic, preventive or therapeutic 5 purposes comprising: (1) obtaining at least one isolate of each strain of a plurality of strains of influenza virus; (2) analyzing the hemagglutinin amino acid sequence of the at least one isolate of each strain of the plurality of strains of influenza virus for 10 the presence and concentration of Replikin sequences; (3) comparing the concentration of Replikin sequences in the hemagglutinin amino acid sequence of the at least one isolate of each strain of the plurality of strains of influenza virus to the concentration of Replikin sequences observed in the hemagglutinin amino acid 15 sequence of each of the strains at least one earlier time period to provide the concentration of Replikins for at least two time periods, said at least one earlier time period being within about six months to about three years prior to step (1); and (4) identifying the strain of influenza virus having the highest increase in 20 concentration of Replikin sequences during the at least two time periods. [000225] In yet another aspect of the invention, there is provided a preventive or therapeutic influenza virus vaccine comprising at least one isolated Replikin present in the hemagglutinin protein of an emerging strain of influenza virus and a 25 pharmaceutically acceptable carrier and/or adjuvant. [000226] Also provided by the present invention is a method of preventing or treating influenza virus infection comprising administering to a patient in need thereof a preventive or therapeutic vaccine comprising at least one isolated Replikin present in the hemagglutinin protein of an emerging strain of influenza virus and a 30 pharmaceutically acceptable carrier and/or adjuvant. COMPUTER SOFTWARE FOR IDENTIFYING REPLIKINS AND RELATEDSTRUCTURES 82 WO 2006/088962 PCT/US2006/005343 [000227] Identification of Replikin structures, Replikin Scaffold structures and degenerate Exoskeleton Scaffold structures may be accomplished with the aid of bioinformatics. [000228] Embodiments of the present invention are directed to a system and 5 method for identifying and/or locating complex patterns in an amino acid sequence such as Replikin patterns, Replikin Scaffold structures, Exoskeleton Scaffold structures and other complex patterns in amino acid and nucleic acid sequences. According to an aspect of the present invention, techniques are provided to facilitate queries of protein databases. For protein descriptions received in response to the 10 queries, aspects of the present invention may include a scan of the received protein descriptions to identify and locate Replikin patterns. According to an aspect of the present invention, a Replikin pattern is a sequence of from 7 to about 50 amino acids that include the following three (3) characteristics, each of which may be recognized as an aspect of the present invention: (1) the sequence has at least one lysine residue 15 located six to ten amino acid residues from a second lysine residue; (2) the sequence has at least one histidine residue; and (3) at least 6% of the amino acids in the sequence are lysine residues. Another aspect of the present invention may identify and/or locate a complex amino acid sequence having specified length constraints, which further includes any combination of the following characteristics: (1) a first 20 amino acid residue located more than N positions and less than M positions away from a second amino acid residue; (2) a third amino acid residue located anywhere in the sequence; and (3) at least R percent of a fourth amino acid residue. According to yet another aspect, the present invention may count occurrences of the identified amino acid sequences and may report the counted occurrences, either as raw 25 absolute values or as ratios of the number of identified amino acid sequences per N amino acids in the protein. Still another aspect of the present invention may analyze the evolution of identified amino acid sequence patterns in variants of a given protein over time, and may also analyze the similarities and differences between instances of identified amino acid sequence patterns across a plurality of different 30 proteins over time. As a result of the analysis, yet another aspect of the present invention may identify potential amino acid scaffolding structures that appear to be preserved over time and across different proteins, as component elements of the identified amino acid sequence patterns mutate and/or evolve. 83 WO 2006/088962 PCT/US2006/005343 [000229] Embodiments of the present invention will be described with reference to the accompanying drawings, wherein like parts are designated by like reference numerals throughout, and wherein the leftmost digit of each reference number refers to the drawing number of the figure in which the referenced part first appears. 5 [000230] FIG. 17 is a high-level block diagram of a computer system incorporating a system and method for identifying Replikin patterns in amino acid sequences, in accordance with an aspect of the present invention. As shown in FIG. 17, computer workstation 610 may be a computer having a processor and a memory configured to permit a researcher to search protein databases and to scan protein 10 descriptions for selected amino acid patterns. To accomplish these functions, computer workstation 610 may include protein and amino acid research system 630, which may receive instructions from a user/researcher to conduct protein searching and amino acid scanning operations. According to an aspect, protein and amino acid research system 630 may further include amino acid sequence scanner 640 that 15 scans and searches retrieved protein and amino acid sequences for specific patterns of amino acids, including Replikin patterns. Protein and amino acid research system 630 may communicate with network interface 620 to obtain protein sequences and amino acid sequences from resources on network 660, which may include the Internet. Alternatively, protein and amino acid research system 630 may obtain 20 protein sequences and amino acid sequences from a local protein database 650. In addition, protein and amino acid research system 630 may obtain protein sequences and amino acid sequences directly from other input means, such as keyboard input. Protein and amino acid research system 630 may also communicate with network interface 620 to transmit results to other computers on network 660. 25 Automated Scanning for Replikin Patterns [000231] Embodiments of the present invention may include a generalized method and system for identifying complex patterns of amino acids within proteins. For any protein definition identified or selected by protein and amino acid research system 630, the user may direct aspects of the invention to search for a variety of 30 complex patterns of amino acids. As an example of one pattern of amino acids, the present invention provides a method for identifying nucleotide or amino acid sequences that include a Replikin pattern. FIG. 18 is a simple flow chart illustrating a general method for locating a Replikin pattern in a sequence of amino acids, 84 WO 2006/088962 PCT/US2006/005343 according to an aspect of the present invention. The method 700 may begin after a sequence of amino acids has been obtained. Typically, the sequence of amino acids may be represented by alphabetic characters according to the code supplied in FIG. 12. However, other encodings are envisioned by the present invention as well. 5 1000232] Referring to FIG. 18, once a sequence of amino acids has been obtained, the sequence is searched for a Replikin pattern (710), which comprises a subsequence (or string) of amino acids that includes the following characteristics: (1) the string contains from 7 to about 50 amino acids; (2) the string contains at least one lysine residue located 6 to 10 10 positions from a second lysine residue; (3) the string contains at least one histidine residue; and (4) the string contains at least 6% lysine residues. [000233] Once a string of amino acids is found to match the Replikin pattern, the string may be identified or marked (720) accordingly. 15 [000234] A given sequence of amino acids may contain many subsequences or strings that match the Replikin pattern. Additionally, Replikin patterns may overlap each other. Thus, to locate and identify all possible Replikin patterns in a sequence of amino acids, method 700 may be invoked iteratively for each subsequence of amino acids contained within the original sequence of amino acids. 20 [000235] When method 700 is invoked iteratively to identify and locate all possible Replikin patterns in an amino acid sequence, an aspect of the present invention may count the number of resulting Replikin patterns. A Replikin count may be reported as an absolute number. Additionally, aspects of the invention may also determine a ratio of the number of Replikins per N amino acids in the sequence. 25 For example, an aspect of the present invention may determine that a given protein contains a ratio of 6 Replikins for every 100 amino acids. Replikin ratios have been shown by laboratory experiment and by epidemiological evidence to correlate directly to the rate that a given protein replicates. Rapid replication of proteins may be an indication of disease. For example, the presence of relatively high ratios of 30 Replikin patterns has been correlated to epidemics of influenza. Similarly, an increase in the count of Replikin patterns observed in a protein over time may also be an indication of future disease caused by the organism from which the protein was obtained (see, e.g., FIG. 15). Thus, the ability to detect and count Replikin 85 WO 2006/088962 PCT/US2006/005343 patterns within sequences of amino acids is a significant advantage of the present invention. [0002361 Still referring to FIG. 18, aspects of the present invention may utilize method 700 to identify and locate other complex patterns of amino acids, which 5 exhibit characteristics similar to Replikin patterns. That is, although some aspects of the present invention may specify exact values for: (1) distances between amino acids, (2) acceptable lengths of recognized amino acid sequences, and (3) the percentage or concentration of specific amino acids, these exact values may also be expressed as variables. Thus a researcher may employ an aspect of the present 10 invention to identify sequences of amino acids in a protein that have the following characteristics: (1) the sequence contains from rmin to rmax amino acids; (2) the sequence contains at least one lysine residue located kmin to kmax amino acid residues from a second lysine residue; 15 (3) the sequence contains at least one histidine residue; and (4) the sequence contains at least kpercent lysine residues. [000237] FIG. 19 is a flow chart illustrating a generalized method 800 for locating a plurality of Replikin-like patterns in a given sequence of amino acids, according to an aspect of the present invention. The method 800 begins by locating 20 a first lysine residue in the given sequence (810). Then, the method 800 may determine whether a second lysine residue resides within kmin to kmax positions of the first lysine residue (820). As indicated in FIG. 19, kmin and kmax define the limits on the distance between the first and second lysine residues. For a typical Replikin pattern, kmin will equal 6 and kmax will equal 10. However, these values 25 may be varied by a researcher interested in discovering other similar patterns. [0002381 Once method 800 has identified two lysine residues that are close enough to each other (820), the method 800 may examine every histidine residue that resides within rmax positions of both the first and second lysine residues (830). When method 800 is employed to identify and locate typical Replikin patterns, rmax 30 will usually be set to equal 50. For every histidine residue that resides within rmax positions of the two lysine residues identified in steps (810) and (820), method 800 will construct the shortest string of amino acid residues that includes the first lysine residue, the second lysine residue, and the identified histidine residue (840). Then, 86 WO 2006/088962 PCT/US2006/005343 method 800 will determine whether the length of that shortest string is within the desired range - that is, whether it contains at least rmin amino acid residues and no more than rmax amino acid residues (850). Finally, if the identified string of amino acids also contains at least kpercent of lysine residues (860), the string will be 5 identified as matching the desired Replikin-like pattern (870). [000239] Still referring to FIG. 19, it is apparent that method 800 may identify several Replikin-like patterns from a single given amino acid sequence, This may happen because method 800 may examine more than one histidine residue that resides within rmax positions of the two identified lysine residues. Each identified 10 histidine residue may, in combination with the two lysine residues, match the desired Replikin-like pattern. [000240] One aspect of the method illustrated by FIG. 19 is shown in FIG. 20, which is a source code listing containing a procedure for discovering all Replikin patterns present in a given sequence of amino acids, in accordance with an aspect of 15 the present invention. The "match" procedure shown in FIG. 20 is programmed in an interpreted shell language called "Tel" and recognizes Replikins in a straightforward fashion. As known in the art, the "Tool Command Language" or Tcl (pronounced "tickle") is a simple interpreted scripting language that has its roots in the Unix command shells, but which has additional capabilities that are well-suited 20 to network communication, Internet functionality and the rapid development of graphical user interfaces. [000241] Alternative methods of recognizing Replikin patterns are also covered by the teachings of the present invention. For example, the match procedure shown in FIG. 20 could be implemented in other programming languages such as Java or C 25 or C++. Additionally, alternative aspects of the Replikin recognizing algorithm may identify the characteristics of a Replikin pattern in any order, and may also traverse component amino acid sequences and subsequences using recursive techniques, iterative techniques, parallel processing techniques, divide-and-conquer techniques or any combination thereof. 30 Protein Search Engine [000242] Returning to FIG. 17, the present invention may include a search engine to access and interact with amino acid and protein databases, either locally or over a network such as the Internet, to retrieve protein definitions. For example, 87 WO 2006/088962 PCT/US2006/005343 protein and amino acid research system 630 may accept protein search criteria from a user, and may then access a plurality of on-line amino acid and protein database search engines to retrieve protein definitions that match the supplied search criteria. Protein database search criteria may comprise any text string that may form a valid 5 search term in any of the on-line protein or amino acid search engines. Typically, these search criteria relate to text that may be found in the printout that describes each specific protein. For example, if the user supplied the search criteria "influenza type A," aspects of the present invention may forward this text string to a plurality of Internet protein and amino acid search engines, each of which may then return 10 any protein descriptions found in their databases that contained the terms "influenza type A." Employing amino acid sequence scanner 640, each of the returned protein descriptions may be scanned for the presence of Replikin patterns. [000243] Additional aspects of the present invention may permit a user to select or de-select a plurality of Internet protein search engines and to customize the search 15 criteria and protein retrieval capabilities of the present invention for each of the selected on-line protein search engines. Moreover, aspects of the invention may also permit a user to access a local protein database 650 or to supply a specific protein definition directly, for example, by supplying a local file name containing the protein definition, or by other methods known in the art for supplying parameters to 20 computer software. [000244] Another aspect of the present invention may include a search engine to access and interact with amino acid and protein databases on the Internet to retrieve protein definitions or amino acid sequence definitions. After accepting protein or amino acid sequence search criteria from a user, the present invention may access a 25 plurality of amino acid and protein database search engines, through on-line access, to retrieve protein definitions or amino acid sequence definitions that match the supplied search criteria. [000245] Initial existing protein search criteria based on existing definitions may comprise any text string that may form a valid search term in any of the on-line 30 protein or amino acid search engines. Typically, these search criteria relate to text that may be found in the printout that describes each specific protein. For example, if the user supplied the search criteria "influenza type A," the present invention would forward this text string to the plurality of Internet protein and amino acid 88 WO 2006/088962 PCT/US2006/005343 search engines, each of which would then return any protein definitions in their databases that contained the terms "influenza type A." [000246] A non-limiting aspect of the present invention comprising a protein search engine entitled "Genome Explorer" is included in Appendix A. The Tcl 5 procedure named "GenomalEnquirer" may control the macro level operation of the protein search engine (see "proc GenomalEnquirer {database term additionalCriteria})." Within the procedure GenomalEnquirer, a series of specific on-line protein search engines may be accessed and queried using the user-supplied protein search terms and additional criteria. Additional aspects of the invention may 10 permit a user to select or de-select a plurality of Internet protein search engines and to customize the search criteria and protein retrieval capabilities of the present invention for each of the selected on-line protein search engines. Moreover, aspects of the invention may also permit a user to access local protein databases or to supply a specific protein definition directly, for example, by supplying a local file name 15 containing the protein definition, or by other methods known in the art for supplying parameters to computer software. [000247] Instructions for running the Genome Explorer are included in Appendix B. Screen snapshots of the Genome Explorer application are included in Appendix C. 20 Replikin Analysis [000248] Embodiments of the present invention may be employed not only to identify and locate Replikin patterns in amino acid sequences. Embodiments may also be used to discover and analyze similarities in the structure of Replikin patterns occurring in different proteins, or to analyze different Replikin patterns occurring in 25 the same protein over time. FIG. 21 for example, is a table illustrating a Replikin Scaffold or "fixed scaffold" structure that was preserved in a "Bird Flu" influenza virus over an 87 year period from 1917 to 2004. Embodiments of the present invention may assemble a number of discovered Replikin patterns in proteins, including Replikin patterns discovered in variants of the same protein. Along with 30 each Replikin pattern, aspects of the present invention may also associate a date when each protein was first identified. When directed by a researcher, an aspect of the present invention may include sorting and displaying a plurality of selected Replikin patterns according to content, date or other criteria, in order to reveal 89 WO 2006/088962 PCT/US2006/005343 substantially fixed amino acid structures that have been preserved in Replikin patterns over time and which may be present in different proteins as well as variants of the same protein. Further, when directed by a researcher, an aspect of the invention may employ known methods of pattern analysis to compare a plurality of 5 selected Replikin patterns in order to identify such fixed amino acid structures automatically. As an example, in FIG. 21, the illustrated Replikin patterns appear to demonstrate - in this case - a relatively fixed scaffold structure of (usually) 29 amino acids that begins with a pair of lysine residues (kk) at the amino terminal, ends with a pair of histidine residues (hh) at the carboxyl terminal, and contains a 10 lysine residue in either position 8, 10 or 11. This conservation of scaffold structure over decades permits synthetic vaccines to be prepared rapidly and inexpensively. To synthesize such vaccines after a Replikin scaffolding structure has been identified, a researcher may select elements of that scaffolding structure that are conserved over time and which are also present in a current variant of a protein. A 15 vaccine may then be prepared based on the selected elements from the scaffolding structure. Because such vaccines are based on conserved scaffolding structures, they may be effective for multiple years and may also be developed well in advance of an anticipated outbreak. [0002491 The discovery of Replikins themselves, as well as aspects of the present 20 invention for identifying and locating Replikin patterns, provides targets for the identification of pathogens, as well as facilitates the development of anti-pathogen therapies, including vaccines. In general, knowledge of and identification of the Replikin family of peptides enables development of effective therapies and vaccines for any organism that harbors Replikins. Specifically, identification of Replikins 25 provides for the detection of viruses and virus vaccine development, including the influenza virus. Further, identification of Replikins also provides for the detection of other pathogens, such as malaria, anthrax and small pox virus, in addition to enabling the development of therapies and vaccines that target Replikin structures. Additional examples provided by the identification of Replikins include the 30 detection of infectious disease Replikins, cancer immune Replikins and structural protein Replikins. [0002501 Embodiments of the present invention enable important Replikin patterns of amino acids to be recognized, located and analyzed in manners that are 90 WO 2006/088962 PCT/US2006/005343 not found in the prior art. Using prior art capabilities, researchers have been limited in by existing techniques for describing sequences of amino acids. Indeed, limitations of the prior art have in some ways dampened research in this field, since heretofore it has not been possible to specify sequences of amino acids that comprise 5 non-linear attributes. Until the development of the methods and aspects of the present invention, descriptions of amino acid sequences were limited to linear sequences containing, at most, repetitive substrings and logical constraints on substring content. Embodiments of the present invention enable a new class of amino acid sequences to be discovered, located and analyzed using tools not found 10 in the prior art. This new class of amino acids is characterized by attributes such as specific amino acid concentration and distance relationships between specific amino acids. These attributes transcend simple contiguous ordering and thus are not easily described, discovered or located by existing methods known in the art. [0002511 For example, rather than examining strict amino acid sequence matches 15 (homologies) as is done by other widely used programs such as BLAST, the present inventors have discovered a unique quantitative "language" related to rapid replication which defines a new class of amino acid grouping. Novel computer programs described herein detect instances of this new language. [0002521 These programs include functionality to search electronic data for 20 amino acid sub-sequences meeting predetermined criteria. The data, which may be obtained online, may include data defining a specified group of protein sequences. The criteria may include: i) the occurrence within a protein sequence of two amino acids, in this case Lysine(K) and histidine(H) in specific concentrations in the sequence 25 ii) the spacing of one of these (K) to a second K in the sequence, and iii) the concentration of one or more amino acids (e.g. K) in a percentage greater than a defined percentage. [0002531 Amino acid sequences meeting the above criteria relate to a particular biological function such as rapid replication. 30 [0002541 The programs include the capability to identify Replikin sub-sequences in genome sequences. One source of the genome sequences may be published genome sequences obtained from online, electronic databases, using search criteria provided by a user. In aspects of the invention, the databases may be NCBI 91 JS06/05343 25-06-2007 WO 2006/088962 (National Center for Biotechnology Information) or LANL (Los Alamos National Laboratory) databases. The programs further include the capability to search for arbitrary sub-sequences (i.e., not only Replikin sub-sequences), based on user supplied criteria. 5 [0002551 In one aspect, a program herein entitled "Genome Explorer" may generate a user interface to prompt a user for search terms. Genome Explorer may apply the search terms to online databases, such as NCBI or LANL databases, to obtain raw sequence data. Additional data may be further obtained, such as article names, protein source, strain, serotype and year of discovery for all the raw 10 sequences which match the search terms. Once the raw data has been acquired, Genome Explorer may further apply additional search criteria to identify Replikin sub-sequences within the raw sequences. The search criteria can be specified by the user in such a way as to implement relatively strict, u: relatively relaxed definitions of what can be included in the set of matching sub-sec.ences to be reported by 15 Genome Explorer. As it identifies Replikin sub-sequenm -s, Genome Explorer may compile ongoing statistics and display a progress bar in a user interface. When Genome Explorer completes its processing, it may save resulting statistics in a data file. For example, the data file may be an HTML file that can be opened in any word processor for inspection of results. 20 [000256] In another aspect, a program herein entitled "Dr. Peptide," search criteria may be applied to identify sub-sequences other than Replikin sub-sequences. With Dr. Peptide it is possible to search for, e.g., all instances of the sequence hk......hk (SEQ ID NO: 470), separated by not more than 15 amino acids, in publicly available genome databases. Such searches allow the creation of new 25 statistical profiles and new groupings of proteins based on meeting these criteria. Dr. Peptide may include much the same functionality as Genome Explorer. For example, like Genome Explorer, Dr. Peptide may, via a user interface, prompt a user for search terms and apply the search terms to online databases, such as NCBI or LANL databases, to obtain raw sequence data. Additional data may be further 30 obtained, such as article names, protein source, strain, serotype and year of discovery for all the raw sequences which match the search terms. Once the raw data has been acquired, Dr. Peptide may further process the data to identify arbitrary 92 AMENDEDs1F

'DEMEOUS

US06/053 43 25-06-2007 WO 2006/088962 sub-sequences and present its output in a data file, for example in the form of HTML pages that can be opened in any word processor. 10002571 Below is a description of one example of a logic sequence that could be included in the Genome Explorer program. In the description, an "initial server 5 inquiry" refers to search criteria to be applied to one or more network elements, such as server computers, storing electronic data representing protein sequences. The network elements may be included in private networks or, for example, the Internet. The data may be in the form of a "protein page," i.e., a quantum of data representing protein sequences. The character "k" represents a lysine amino acid, and the 10 character "h" represents a histidine amino acid. Genome Explorer Logic Seuence [0002581 Initialize user interface procedures and input fields for search parameters. Construct user interface. 15 wait for user to specify search parameters. Search parameters include: (1) words or phrases to be matched in the initial server inquiry to obtain summaries and protein pages, (2) The allowed distance between ks, expressed as range kmnin...kmax for a sub 20 sequence to qualify for a set. (3) The allowed range of distances between an h and the farthest k, expressed as kmin+l..hmax, for a sub-sequence to qualify for the set. (4) The allowed fraction of k's in the sub-sequence, expressed as x percent or larger, for the sub-sequence to qualify for the set. 25 [0002591 Once search parameters are specified, Initialize output files in HTML format - these will be used to display reports. Compare specified search parameters with previous search. If the search parameters are identical, reuse cached protein pages as data input. 30 If the search parameters are not identical (cached protein pages are not relevant), Send the inquiry to the server (NCBI or LANL). If it did not return all summaries, Re-send the inquiry requesting all summaries. For each summary, 35 Fetch and save the protein page retrieved. For each 'protein page retrieved, If frornNCBI, Parse ASN page. Extract found sequence data (seq-data.ncbieaa). 40 Extract article names (descr.*.article.title.*.name). Extract protein source (source.org.taxname). Extract strain (subtype). Derive year discovered. Derive serotype. 93

AMENDEIMPIRWWS

WO 2006/088962 PCT/US2006/005343 If from LANL, Parse HTML page for strain, definition, source, year, serotype, and raw nucleotide sequence. Convert nucleotides to amino acids 5 by mapping every three nucleotides in sequence to the corresponding amino acid. Save parsed value for this protein. For each parsed page, update user interface as to progress via progress bar, and: For each sequence data found on the page, 10 Scan the amino acid sequence data for each sub-sequence matching (a) The distance between k's is in the range kmin... kmax as defined in parameter (2) from the user interface above. (b) The distance between an h and the farthest k is in the range kmin+1..hmax as defined in parameter (3) from the user interface above. 15 (c) The fraction of k units in the sub-sequence, expressed as x percent or larger as defined in parameter (4) from the user interface above. and save the range of each matching sub-sequence, including overlaps. Ignore sequences with no matches. Accept the sequence with the most sub-sequence matches. 20 If a sequence was accepted, Catalog each sequence by the year it was discovered. For each additional set of criteria, Check the additional criteria against other parsed fields. If does not match, do not accept the page. 25 If the page was accepted, Add it as a passed page. Create an HTML page showing the full sequence and all matched sub-sequences. If the page was not accepted, Add it as a failed page. 30 For each unique matched replikin sequence, Create an amino acid history HTML page, Show every protein it occurs in ordered by year. Create a statistics HTML page displaying the following: For each year, 35 Show number of matched proteins and replikin sub-sequences. Update user interface to reflect that the operation is complete; Re-initialize input fields to allow next set of search parameters to be specified by user. [0002601 In view of the foregoing description, it may be understood that Genome 40 Explorer implements a method including applying a plurality of criteria to data representing protein sequences, and based on the criteria, identifying a sub-sequence within the protein sequences, the identified sub-sequence having a predetermined allowed range of distance between Lysine amino acids thereof, and a predetermined allowed range of distance between a histidine amino acid and a farthest Lysine acid 45 thereof. An identified sub-sequence may be output to a data file. 94 WO 2006/088962 PCT/US2006/005343 [000261] The functionality of the herein aspects may be provided on various computer platforms executing program instructions. One such platform 1100 is illustrated in the simplified block diagram of FIG. 22. There, the platform 1100 is shown as being populated by a processor 1160, which communicates with a number 5 of peripheral devices via a bus subsystem 1150. These peripheral devices typically include a memory subsystem 1110, a network interface subsystem 1170, and an input/output (I/O) unit 1180. The processor 1160 may be any of a plurality of conventional processing systems, including microprocessors, digital signal processors and field programmable logic arrays. In some applications, it may be 10 advantageous to provide multiple processors (not shown) in the platform 1100. The processor(s) 1160 execute program instructions stored in the memory subsystem 1110. The memory subsystem 1110 may include any combination of conventional memory circuits, including electrical, magnetic or optical memory systems. As shown in FIG. 22, the memory system may include read only memories 1120, 15 random access memories 1130 and bulk storage 1140. Memory subsystem 1110 not only stores program instructions representing the various methods described herein but also may store the data items on which these methods operate. Network interface subsystem 1170 may provide an interface to outside networks, including an interface to communications network 1190 comprising, for example, the Internet. 20 I/O unit 1180 would permit communication with external devices, which are not shown. [000262] Several aspects of the present invention are specifically illustrated and described herein. However, it will be appreciated that modifications and variations of the present invention are covered by the teachings of the present invention 25 without departing from the spirit and intended scope of the invention. Additionally, the teachings of the present invention may be adaptable to other sequence recognizing problems that have heretofore been addressed using sequential linear analyses limited to the identification of specific sequences of component elements. [000263] Using the exemplary software contained in Appendix A, the inventors 30 have discovered in a non-limiting aspect in accordance with the present invention that the nucleocapsid protein of the shrimp white spot virus has an exceptionally high Replikin Count as compared to all other viruses and organisms surveyed for replikins up to the present time (with the exception of malaria). While Replikins 95 WO 2006/088962 PCT/US2006/005343 have been shown to be essential accompaniments of rapid replication in fungi, yeast, viruses, bacteria, algae, and cancer cells, the inventors have provided the first demonstration of the presence of replikins in marine organisms other than algae. And, as with algae, the presence of replikins is again related to rapid infestations. In 5 shrimp, the white spot virus has destroyed millions of dollars of harvest of shrimp, first in eastern countries, and now in western hemisphere countries. At present, there is no effective prevention or treatment. Other examples of Replikin high mortality marine viral disease have been demonstrated by us in fish such as carp and hemorrhagic disease in salmon, and are probably widespread in marine ecology and 10 disease. [000264] The presence of repeat sequences of the Replikins of the nucleocapsid protein of shrimp white spot syndrome virus (WSSV) accounts for the unusually high Replikin Count of 103.8. This virus Replikin Count is much higher than the Replikin Counts of for example influenza viruses which usually range from less than 15 1 up to 5 or 7, and is comparable only to the record Replikin Count (so far) observed in Plasmodium Falciparum (malaria) of 111. Interestingly, while the shrimp white spot syndrome organism is a virus, and the Pl. Falciparum is a trypanosome, both spend an essential part of their reproductive cycles in red blood cells, an unusual host cell whether in shrimp (white spot virus) or man (malaria), both are fulminating 20 rapidly replicating diseases with high mortality rates of their hosts, and both appear to use the same methods of increasing their high Replikin Counts to such record highs, namely, Replikin Repeats and Replikin Overlap. [000265] As illustrated in Table 10, examples of Replikin Repeats and Replikin Overlap were found by the applicants in the above nucleocapsid protein of the 25 shrimp white spot syndrome virus as seen below. 497 Replikins were discovered in the white spot virus using the exemplary software provided in Appendix A. Of those 497, the replikins illustrated below in Table 10 were selected for their short sequences and high concentration of lysine which, as demonstrated throughout this application, appears to be associated with high mortality. The chosen sequences are 30 easier and less expensive to synthesize than the longer sequences that are not included in Table 10. [0002661 Table 10 illustrates intramolecular Replikin Repeats and Replikin Overlap in shrimp white spot syndrome virus (WSSV) nucleocapsid protein (VP35) 96 WO 2006/088962 PCT/US2006/005343 gene with a Replikin Count (number of replikins per 100 amino acids) of 103.8 (497 total replikins per 479 amino acids). TABLE 10 - Intramolecular Replikin Repeats and Replikin Overlap in shrimp 5 white spot syndrome virus (WSSV) nucleocapsid protein (VP35) gene with Replikin Count = Number of Replikins per 100 amino acids = 497/479 = 103.8 and with thymidine kinase and thymidylate kinase activity. Individual Replikins at Different Positions in the same Molecule, in order of appearance 10 in the sequence Replikin ID Number (to be assigned) 15 k30 v31 h32 133 d34 v35 k36 k30 v31 h32 133 d34 v35 k36 g37 v38 g4oll 1 _3 UP q 40' 141142 h43 4415 k66 k67 n68 v69 k70 s71 a72 k73 a 74 175 p76 1i77 k70 s71 a72 K73 q'741175 p76 h7717 k79 20 k160 k161 n162 v163 k164 s165 al66 k67 g168 1169 p170 h171 k239 k240 n241 v242 k243 s244 a245 246 a247 1248 p249 425d k303 k304 n305 v306 k307 s308 a309k13,10 p311 1312 v313h314 k397 k398 n399 v400 k401 s402 a403 k404 y405 1 406.v407 h408 *Note in the shrimp virus the repeated use of identical whole Replikin sequences 25 (underlined) and partial Replikin sequences (shaded) in different positions in the one molecule (each amino acid is numbered according to its order in the sequence). [000267] Now that we have been able to identify these Replikins using the software described in this application, we can synthesize each of them and use them 30 as targets for antibody and other inhibitory products and for specific synthetic vaccines against the shrimp white spot syndrome virus, specifically directed against each repeating Replikin. [0002681 The phenomenon of repeats is well known in protein structure. What is unique and specific in this case is that these are Replikin repeats. Thus while repeat 35 of a specific Replikin sequence increases the Replikin Count within a specific molecule and is associated with more rapid replication as in the case of ATPase in Pl.Falciparum in malaria, thus has apparent survival value for the molecule and the organism which contains it, at the same time it provides an increasing vulnerability, an 'Achilles Heel' so to speak. Thus the Replikin Repeat provides a higher 40 concentration per molecule, additional target sites for attack by specific antibodies as generated by specific synthetic vaccines produced against these Replikins and other specific anti-Replikin agents. These new targets were previously unavailable because they could not be identified. 97 WO 2006/088962 PCT/US2006/005343 Complex Amino Acid Analysis [0002691 A further aspect of the present invention comprises a protein search engine directed to recognizing generalized amino acid and nucleic acid patterns on line databases. Appendix D is an exemplary protein search engine directed to 5 recognizing complex amino acid patterns such as Scaffold Exoskeletons. Appendix D is entitled "Dr. Peptide." Appendix D is an exemplary non-limiting aspect of the present invention and is designed to recognize generalized amino acid patterns in addition to the Replikin pattern. [0002701 Below is a description of one example of a logic sequence that could be 10 included in the Dr. Peptide program. In the description, an "initial server inquiry" refers to search criteria to be applied to one or more network elements, such as server computers, storing electronic data representing protein sequences. The network elements may be included in private networks or, for example, the Internet. The data may be in the form of a "protein page," i.e., a quantum of data representing 15 protein sequences. Dr. Peptide Logic Sequence 10002711 Initialize user interface procedures and input fields for search parameters. Construct user interface. 20 wait for user to specify search parameters, including: (1) words or phrases to be matched in the initial server inquiry to obtain summaries and protein pages, (2) a set of specific amino acids which must be included in any sub-sequences qualifying for a set. 25 (3) a set of specific amino acids which must be excluded from any sub-sequences qualifying for the set. (4) minimum m and maximum n sizes for the permissible size spacing gap which is to be applied to the set inclusion and exclusion criteria (2) and (3). [000272] Once search parameters are specified, 30 Query: If the saved protein pages are not relevant, Send the inquiry to the server (NCBI or LANL). If it did not return all summaries, Re-send the inquiry requesting all summaries. 35 For each summary, 98 WO 2006/088962 PCT/US2006/005343 Fetch and save the protein page. For each protein page, If from NCBI, Parse ASN page. 5 Extract found sequence data (seq-data.ncbieaa). Extract article names (descr. *.article.title. *.name). Extract protein source (source.org.taxname). Extract strain (subtype). Derive year discovered. 10 Derive serotype. If from LANL, Parse HTML page for strain, definition, source, year, serotype, and raw nucleotide sequence. Convert nucleotides to amino acids 15 by mapping every three nucleotides in sequence to the corresponding amino acid. Save parsed value for this protein. For each parsed page, For each sequence data found on the page, 20 Scan the amino acid sequence data for each sub-sequence matching. The match patterns are a sequence of alternative steps: (a) An amino acid in the amino acid sequence data is in a set of specific amino acids as defined in user parameter (2) above. (b) An amino acid in the amino acid sequence data is not in the set of specific amino 25 acids defined in user parameter (3) above. (c) An amino acid in the amino acid sequence data has a spacing gap of m to n amino acids from another amino acid in the amino acid sequence data as defined in user parameter (4) above. The initial sub-sequence set is all possible terminal sequences, or "tails" of the 30 sequence data at the first pattern step, While the set of sub-sequences is not empty, Remove one sub-sequence and record how far in the pattern string its evaluation has reached. If the amino acid at the current pattern step 35 - Is in a set of specific amino acids, If the next amino acid of the sub-sequence is also in the set of amino acids, Add the elongated sub-sequence and next pattern step to the sub-sequence set. - Is not in a set of specific amino acids. 40 If the next amino acid of the sub-sequence is not one of the set of amino acids, Add the elongated sub-sequences and next pattern step to the sub-sequence set. - Has a gap of m to n any amino acids. First, elongate each sub-sequence for each possible length m through n 45 Then add each elongated version of the sub-sequence to the sub-sequence set If the above pattern is exhausted, The sub-sequence is a matched sub-sequence. Ignore sequences with no matches. 99 WO 2006/088962 PCT/US2006/005343 Accept the sequence with the most matches. If a sequence has been accepted, Catalog each sub-sequence by the year it was discovered. For each additional criteria, 5 Check the additional criteria against other parsed fields. If it does not match, do not accept the page. If the page was accepted, Add it as a passed page. Create an HTML page showing the full sequence and all matched subsequences. 10 If the page was not accepted, Add it as a failed page. [0002731 In view of the foregoing description, it may be understood that Dr. Peptide implements a method including applying a plurality of criteria to data 15 representing protein sequences, and based on the criteria, identifying arbitrary sub sequences within the protein sequences. An identified sub-sequence may be output to a data file. The criteria may include: a set {a}of amino acids to be included in the sub-sequence; a set of amino acids to be excluded from the sub-sequence; and 20 a minimum and a maximum permissible gap between members of sets {a} and {b}. [000274] A non-limiting and exemplary aspect of the invention employs the complex amino acid analysis aspect of the invention to analyze Replikin Scaffold sequences in earlier strains of influenza that have degenerated into non-replikin 25 sequences but maintained the scaffold structure of the Replikin Scaffold. As an example of the use of the exemplary and non-limiting software program in Appendix D to recognize generalized amino acid patterns, the inventors first discovered by visual scanning of protein sequences (now by Dr. Peptide software) that what was in earlier-arising specimens of a particular influenza species a 30 Replikin Scaffold, was in later specimens changed as follows: 1) The length of 29 amino acids was preserved; 2) The first two amino acid positions (1 and 2) were preserved, i.e. KK; 3) The last two amino acid positions (28 and 29) were preserved, i.e. 35 HH; 4) But there was no longer a K which was 6 to 10 amino acids from KK (needed for the definition of a Replikin). 100 WO 2006/088962 PCT/US2006/005343 [000275] Thus this Scaffold is no longer a Replikin Scaffold, but now is a Scaffold Exoskeleton so to speak. While Replikin Scaffolds are associated with high Replikin Counts and the occurrence of epidemics, Scaffold Exoskeletons are associated with virus dormancy and the reduction or end of the epidemic. Thus 5 Scaffold Exoskeletons appear to be degenerative structures left as residues when Replikin Scaffolds and specific viral outbreaks are declining, thus a useful diagnostic structure for this purpose. This confirms the revelation and use of Replikin Scaffolds as 1) targets for anti-rapid replication agents such as antibodies or small inhibitory RNAs and 2) the basis of anti-viral vaccines. Software according 10 to aspects of the present invention may comprise logic to obtain and analyze protein sequences to identify sequences having characteristics 1, 2, 3 and 4 above. For example, Scaffold Exoskeletons can now be detected and counted in any protein sequence by the exemplary software in Appendix D. [000276] Another non-limiting aspect in accordance with the present invention is 15 a method of identifying a Replikin Scaffold comprising indentifying a series of peptides comprising about 17 to about 30 amino acids and further comprising (1) identifying a terminal lysine; (2) identifying a terminal histidine and another histidine in the residue potion immediately adjacent to the terminal histidine; 20 (3) identifying at least one lysine within about 6 to about 10 amino acid residues from at least one other lysine; and (4) identifying at least about 6% lysines. 1000277] In a non-limiting aspect in accordance with the present invention the method of identifying a Replikin Scaffold may comprise identifying a single or 25 plurality of individual members of the series of a Replikin Scaffold. [000278] In a preferred non-limiting aspect in accordance with the present invention the method of identifying a Replikin Scaffold further comprises the identification of a second lysine immediately adjacent to the terminal lysine. Software according to aspects of the present invention may comprise logic to obtain 30 and analyze protein sequences to identify sequences using steps 1, 2, 3 and 4 above. The Tel Programming Language [000279] Tel (the "Tool Command Language," pronounced "tickle") is a simple interpreted scripting language that has its roots in the Unix command shells, but 101 WO 2006/088962 PCT/US2006/005343 which has additional capabilities that are well-suited to network communication, Internet functionality and the rapid development of graphical user interfaces. Tcl was created by John K. Ousterhout at the University of California at Berkeley in 1988. Originally conceived as a reusable, embeddable language core for various 5 software tools, it is now widely used in applications including web scripting, test automation, network and system management, and in a variety of other fields. [000280] In aspects, Genome Explorer and Dr. Peptide may be coded in Tcl/Tk, a scripting programming language that includes powerful facilities for internet access, user interface design, and string manipulation. Because Tcl/Tk has been 10 ported to nearly all available computer architectures and is familiar to those skilled in the art, programs written in Tel/Tk can be run on nearly any operating system. Source code for specific implementations of Genome Explorer and Dr. Peptide are provided in Appendices A and D. The specific implementations are provided by way of illustration and example only, and the present invention is not in any way 15 limited to the specific implementations illustrated. OTHER USES OF THE THREE POINT RECOGNITION METHOD [000281] Since "3-point-recognition" is a proteomic method that specifies a particular class of proteins, using three or more different recognition points for other peptides similarly should provide useful information concerning other protein 20 classes. Further, the "3-point- recognition" method is applicable to other recognins, for example to the TOLL 'innate' recognition of lipopolyssacharides of organisms. The three point recognition method may also be modified to identify other useful compounds of covalently linked organic molecules, including other covalently linked amino acids, nucleotides, carbohydrates, lipids or combinations thereof. In 25 this aspect of the invention a sequence is screened for subsequences containing three or more desired structural characteristics. In the case of screening compounds composed of covalently linked amino acids, lipids or carbohydrates the subsequence of 7 to about 50 covalently linked units should contain (1) at least one first amino acid, carbohydrate or lipid residue located seven to ten residues from a second of the 30 first amino acid, carbohydrate or lipid residue; (2) encoding at least one second amino acid, lipid or carbohydrate residue; and (3) at least 6% of the first amino acid, carbohydrate or lipid residue. In the case of screening nucleotide sequences, the subsequence of about 21 to about 150 nucleotides should contain (1) at least one 102 WO 2006/088962 PCT/US2006/005343 codon encoding a first amino acid located within eighteen to thirty nucleotides from a second codon encoding the first amino acid residue; (2) at least one second amino acid residue; and (3) encodes at least 6% of said first amino acid residue. [000282] Several aspects of the present invention are specifically illustrated and 5 described herein. However, it will be appreciated that modifications and variations of the present invention are encompassed by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention. EXAMPLE 1 10 PROCESS FOR EXTRACTION, ISOLATION AND IDENTIFICATION OF REPLIKINS AND THE USE OF REPLIKINS TO TARGET, LABEL OR DESTROY REPLIKIN-CONTAINING ORGANISMS a) Algae [0002831 The following algae were collected from Bermuda water sites and 15 either extracted on the same day or frozen at -20 degrees C and extracted the next day. The algae were homogenized in a cold room (at 0 to 5 degrees C) in 1 gram aliquots in neutral buffer, for example 100 cc. of 0.005M phosphate buffer solution, pH 7 ("phosphate buffer") for 15 minutes in a Waring blender, centrifuged at 3000 rpm, and the supernatant concentrated by perevaporation and dialyzed against 20 phosphate buffer in the cold to produce a volume of approximately 15 ml. The volume of this extract solution was noted and an aliquot taken for protein analysis, and the remainder was fractionated to obtain the protein fraction having a pK range between 1 and 4. [000284] The preferred method of fractionation is chromatography as follows: 25 The extract solution is fractionated in the cold room (40 C) on a DEAE cellulose (Cellex-D) column 2.5x 11.0 cm, which has been equilibrated with 0.005M phosphate buffer. Stepwise eluting solvent changes are made with the following solutions: Solution 1- 4.04 g. NaH2PO4 and 0.5g NaH2PO4 are dissolved in 15 litres of 30 distilled water (0.005 molar, pH 7); Solution 2 - 8.57 g. NaH2PO4 is dissolved in 2,480 ml. of distilled water; Solution 3 - 17.1 g. of NaH2PO4 is dissolved in 2480 ml of distilled water (0.05 molar, pH 4.7); Solution 4 - 59.65 g. of NaH2PO4 is dissolved in 2470 ml distilled water 103 WO 2006/088962 PCT/US2006/005343 (0.175 molar); Solution 5 - 101.6 g. of NaH2PO4 is dissolved in 2455 ml distilled water (pH 4.3); Solution 6 - 340.2 g. of NaH2PO4 is dissolved in 2465 of distilled water 5 (1.0 molar, pX-i 4.1); Solution 7 - 283.63 g. of 80% phosphoric acid (H3PO4) is made up in 2460 ml of distilled water (1.0 molar, pH 1.0). [0002851 The extract solution, in 6 to 10 ml volume, is passed onto the column and overlayed with Solution 1, and a reservoir of 300 ml of Solution 1 is attached 10 and allowed to drip by gravity onto the column. Three ml aliquots of eluant are collected and analyzed for protein content at OD 280 until all of the protein to be removed with Solution 1 has been removed from the column. Solution 2 is then applied to the column, followed in succession by Solutions 3, 4, 5, 6 aid 7 until all of the protein which can, be removed with each Solution is removed from the column. 15 The eluates from Solution 7 are combined, dialyzed against phosphate buffer, the protein content determined of both dialysand and dialyzate, and both analyzed by gel electrophoresis. One or two bands of peptide or protein of molecular weight between 3,000 and 25,000 Daltons are obtained in Solution 7. For example the algae Caulerpa mexicana, Laurencia obtura, Cladophexa prolifera, Sargassum 20 natans, Caulerpa verticillata, Halimeda tuna, and Penicillos capitatus, after extraction and treatment as above, all demonstrated in Solution 7 eluates sharp peptide bands in this molecular weight region with no contaminants. These Solution 7 proteins or their eluted bands are hydrolyzed, and the amino acid composition determined. The peptides so obtained, which have a lysine composition of 6% or 25 greater are Replikin precursors. These Replikin peptide precursors are then determined for amino acid sequence and the Replikins are determined by hydrolysis and mass spectrometry as detailed in U.S. Patent 6,242,578 Bl. Those that fulfill the criteria defined by the " 3-point-recognition" method are identified as Replikins. This procedure can also be applied to obtain yeast, bacterial and any plant Replikins. 30 b) Virus [000286] Using the same extraction and column chromatography separation methods as above in a) for algae, Replikins in virus-infected cells are isolated and identified. 104 /USUD0/U34i L-U0-4Uv WO 2006/088962 c) Tumor cells in vivo and in vitro tissue culture 10002871 Using the same extraction and column chromatography separation methods as above in a) for algae, Replikins in tumor cells are isolated and identified. For example, Replikin precursors of Astrocytin isolated from malignant brain 5 tumors, Malignin (Aglyco lOB) isolated from glioblastoma tumor cells in tissue culture, MCF7 mammary carcinoma cells in tissue culture, and P3J Lymphoma cells in tissue culture each treated as above in a) yielded Replikin precursors with lysine content of 9.1%, 6.7%, 6.7%, and 6.5% respectively. Hydrolysis and mass spectrometry of Aglyco lOB as described in Example 10 U.S. 6,242,578 BI 10 produced the amino acid sequence, ykagraflhkkndiide (SEQ ID NO: 471) the 16 mer Replikin. EXAMPLE 2: [0002881 As an example of diagnostic use of Replikins: Aglyco lOB or the 16 mer Replikin may be used as antigen to capture and quantify the amount of its 15 corresponding antibody present in serum for diagnostic purposes are as shown in Figures 2, 3, 4 and 7 of U.S. 6,242,578 B1. {0002891 As an example of the production of agents to attach to Replikins for labeling, nutritional or destructive purposes: Injection of the 16-mer Replikin into rabbits to produce the specific antibody to the 16-mer Replikin is shown in Example 20 6 and Figures 9A and 9B of U.S. 6,242,578 B1. 10002901 As an example of the use of agents to label Replikins: The use of antibodies to the 16-mer Replikin to label specific cells which contain this Replikin is shown in Figure 5 and Example 6 of U.S. 6,242,578 Bl. [0002911 As an example of the use of agents to destroy Replikins: The use of 25 antibodies to the 16-mer Replikin to inhibit or destroy specific cells which contain this Replikin is shown in Figure 6 of U.S, 6,242,578 B . EXAMPLE 3 10002921 Analysis of sequence data of isolates of influenza virus hemagglutinin protein or neuraminidase protein for the presence and concentration of Replikins is 30 carried out by visual scanning of sequences or through use of a computer program based on the 3-point recognition system described herein. Isolates of influenza virus are obtained and the amino acid sequence of the influenza hemagglutinin and/or neuraminidase protein is obtained by any art known method, such as by sequencing the hemagglutinin or neuraminidase gene and deriving the protein sequence 105 AMENDEDsYPffWr0aWRJS WO 2006/088962 PCT/US2006/005343 therefrom. Sequences are scanned for the presence of new Replikins, conservation of Replikins over time and concentration of Replikins in each isolate. Comparison of the Replikin sequences and concentrations to the amino acid sequences obtained from isolates at an earlier time, such as about six months to about three years earlier, 5 provides data that are used to predict the emergence of strains that are most likely to be the cause of influenza in upcoming flu seasons, and that form the basis for seasonal influenza peptide vaccines or nucleic acid based vaccines. Observation of an increase in concentration, particularly a stepwise increase in concentration of Replikins in a given strain of influenza virus for a period of about six months to 10 about three years or more is a predictor of emergence of the strain as a likely cause of influenza epidemic or pandemic in the future. [0002931 Peptide vaccines or nucleic acid-based vaccines based on the Replikins observed in the emerging strain are generated. An emerging strain is identified as the strain of influenza virus having the highest increase in concentration of Replikin 15 sequences within the hemagglutinin and/or neuraminidase sequence during the time period. Preferably, the peptide or nucleic acid vaccine is based on or includes any Replikin sequences that are observed to be conserved in the emerging strain. Conserved Replikins are preferably those Replikin sequences that are present in the hemagglutinin or neuraminidase protein sequence for about two years and preferably 20 longer. The vaccines may include any combination of Replikin sequences identified in the emerging strain. [0002941 For vaccine production, the Replikin peptide or peptides identified as useful for an effective vaccine are synthesized by any method, including chemical synthesis and molecular biology techniques, including cloning, expression in a host 25 cell and purification therefrom. The peptides are preferably admixed with a pharmaceutically acceptable carrier in an amount determined to induce a therapeutic antibody reaction thereto. Generally, the dosage is about 0.1 mg to about 10 mg. [0002951 The influenza vaccine is preferably administered to a patient in need thereof prior to the onset of "flu season." Influenza flu season generally occurs in 30 late October and lasts through late April. However, the vaccine may be administered at any time during the year. Preferably, the influenza vaccine is administered once yearly, and is based on Replikin sequences observed to be present, and preferably conserved in the emerging strain of influenza virus. Another 106 WO 2006/088962 PCT/US2006/005343 preferred Replikin for inclusion in an influenza vaccine is a Replikin demonstrated to have re-emerged in a strain of influenza after an absence of one or more years. EXAMPLE 4 [0002961 Analysis of sequence data of isolates of coronavirus nucleocapsid, or 5 spike, or envelope, or other protein for the presence and concentration of Replikins is carried out by visual scanning of sequences or through use of a computer program based on the 3-point recognition method described herein. Isolates of coronavirus are obtained and the amino acid sequence of the coronavirus protein is obtained by any method known in the art, such as by sequencing the protein's gene and deriving 10 the protein sequence therefrom. Sequences are scanned for the presence of new Replikins, conservation of Replikins over time and concentration of Replikins in each isolate. Comparison of the Replikin sequences and concentrations to the amino acid sequences obtained from isolates at an earlier time, such as about six months to about three years earlier, provides data that are used to predict the emergence of 15 strains that are most likely to be the cause an outbreak or pandemic, and that form the basis for coronavirus peptide vaccines or nucleic acid based vaccines. Observation of an increase in concentration, particularly a stepwise increase in concentration of Replikins in a given class, or strain, of coronavirus for a period of about six months to about three years or more is a predictor of emergence of the 20 strain as a likely cause of an epidemic or pandemic, such as SARS, in the future. [000297] Peptide vaccines or nucleic acid-based vaccines based on the Replikins observed in the emerging strain of coronaviruses are generated. An emerging strain is identified as the strain of coronavirus having the highest increase in concentration of Replikin sequences within the nucleocapsid sequence during the time period. 25 Preferably, the peptide or nucleic acid vaccine is based on or includes any Replikin sequences that are observed to be conserved in the strain. Conserved Replikins are preferably those Replikin sequences which are present in the nucleocapsid protein sequence for about two years and preferably longer. The vaccines may include any combination of Replikin sequences identified in the emerging strain. 30 [000298] For vaccine production, the Replikin peptide or peptides identified as useful for an effective vaccine are synthesized by any method, including chemical synthesis and molecular biology techniques, including cloning, expression in a host cell and purification therefrom. The peptides are preferably admixed with a 107 WO 2006/088962 PCT/US2006/005343 pharmaceutically acceptable carrier in an amount determined to induce a therapeutic antibody reaction thereto. Generally, the dosage is about 0.1 mg to about 10 mg. [000299] The coronavirus vaccine may be administered to a patient at any time of the year. Preferably, the coronavirus vaccine is administered once and is based 5 on Replikin sequences observed to be present, and preferably conserved, in the classes of coronavirus. EXAMPLE 5 [000300] Analysis of sequence data of isolates of Plasmodium falciparum antigens for the presence and concentration of Replikins is carried out by visual 10 scanning of sequences or through use of a computer program based on the 3-point recognition method described herein. Isolates of Plasmodium falciparum are obtained and the amino acid sequence of the protein is obtained by any art known method, such as by sequencing the gene and deriving the protein sequence therefrom. Sequences are scanned for the presence of Replikins, conservation of 15 Replikins over time and concentration of Replikins in each isolate. This information provides data that are used to form the basis for anti-malarial peptide vaccines or nucleic acid based vaccines. [0003011 Peptide vaccines or nucleic acid-based vaccines based on the Replikins observed in the malaria causing organism are generated. Preferably, the peptide or 20 nucleic acid vaccine is based on or includes any Replikin sequences that are observed to be present on a surface antigen of the organism. The vaccines may include any combination of Replikin sequences identified in the malaria causing strain. [000302] For vaccine production, the Replikin peptide or peptides identified as 25 useful for an effective vaccine are synthesized by any method, including chemical synthesis and molecular biology techniques, including cloning, expression in a host cell and purification therefrom. The peptides are preferably admixed with a pharmaceutically acceptable carrier in an amount determined to induce a therapeutic antibody reaction thereto. Generally, the dosage is about 0.1 mg to about 10 mg. 30 [000303] Then malaria vaccine is preferably administered to a patient in need thereof at any time during the year, and particularly prior to travel to a tropical environment. 108 US06/05343 25-06-2007 WO 20061088962 [0003041 Another aspct includes an antisense nucleic acid molecule complementary to the coding strand of the gene or the mRNA encoding organism for the replikins in organisms including, but not limited to, viruses, trypanosomes, bacteria, fungi, algae, amoeba, and plants, wherein said antisense nucleic acid 5 molecules is complementary to a nucleotide sequence of a replikin containing organism. EXAMPLE 6 10003051 Amino acid sequences of five short SARS Replikins found in nucleocapsid, spike, and envelope proteins of the SARS coronavirus were. 10 synthesized and tested on rabbits to test immune response to Replikin sequences in the SARS coronavirus. The following Replikin sequences were tested: (1) 2003 Human SARS nucleocapsid (SEQ [D NO: 303); (2) 2003 Human SARS spike protein (SEQ ID NO: 304); (3) 2003 Human SARS spike protein (SEQ ID NO: 305); 2003 Human SARS spike protein; (SEQ ID NO: 306); (4) 2003 SARS 15 envelope protein (SEQ ID NO: 307); and (5) 2003 Human SARS nucleocapsid protein (SEQ ID NO: 308). Each synthesized peptide was injected subcutaneously into a rabbit. The tested rabbits produced measurable specific antibody to each of the five sequences that bound at dilutions of greater than I in 10,0000. The 21 amino acid SARS nucleocapsid replikin antibody (SEQ. ID NO: 33) was 20 demonstrated to bind at dilutions greater than I in 204,800. Because of previous unsuccessful attempts by others to achieve with various small peptides a strong immune response without the unwanted side effects obtained with a whole protein or the thousands of proteins or nucleic acids as in smallpox.vaccine, the ability of small synthetic replikin antigens to achieve strong immune responses was shown to be 25 significant for the efficacy of SARS vaccines. EXAMPLE 7 10003061 A 41 amino acid replikin sequence KKNSTYPTIKRSYNNTNQEDLLVLWGHKKKKHKKKKKHK (SEQ ID NO: 16) -KLH with the addition of a key limpet hemocyanin adjuvant on the C-terminal 30 end (denoted as -KLH) was designated Vaccine V120304U2. The vaccine was designed by the inventors from the 29 amino acid replikin Scaffold of H5N I "Bird Flu" Influenza Replikins labeled "2004 H5NI Vietnam, highly pathogenic" in Table 8 with the addition of two UTOPE units (KKKKHK) (SEQ ID NO: 459) on the C 109 AMENDEI0lPEVUVlLJS C \NRonbhDCC\SCG\6M5X54. DOC-9IWI1210( I terminal end of the H5NI scaffold and an additional adjuvant (key limpet hemocyanin (sequence KLI1)) covalently linked on the C-terminal end of the two UTOPE units. 100 Ig of Vaccine V120304U2 was injected subcutaneously into rabbits and chickens. The antibody response was measured before vaccination and 5 at from one week after injection to eight weeks after injection. An antibody response was noted at one week and reached a peak in the third to fourth week after vaccination. Peak antibody responses ranged from a dilution of 1:120,000 to a dilution of greater than 1:240,000. Antibody titers were determined with an enzyme linked immunosorbent assay (ELISA) with Peptide-GGG (goat gamma 10 globulin) bound in solid phase (0.1 ptg/100 pl/well) on high binding 96 well plates. The serum was first diluted 50 fold and then further diluted in 2-fold serial dilutions. The ELISA titer result was determined from the estimated dilution factor that resulted from an optical density at 405 nm of 0.2 and derived from nonlinear regression analysis of the serial dilution curve. Detection was obtained 15 using a horse radish peroxidase conjugated secondary antibody and AIBTS substrate (ABTS is a registered trademark of Boehringer Mannheim. Gmb-I). Results from tests on two chickens and two rabbits are provided in Table 11. Individual well results from the test on rabbit D4500 are provided in Table 12. In combination with the results reported in Example 6, in a total of six tests of 20 Replikin sequences for antibody responses in rabbit or chicken, all six sequences provided a measurable antibody response and have proved antigenic. The reference in this specification to any prior publication (or information derived from it), or to any matter which is known, is not, and should not be taken as an 25 acknowledgment or admission or any form of suggestion that that prior publication (or information derived from it) or known matter forms part of the common general knowledge in the field of endeavour to which this specification relates. 30 Throughout this specification and the claims which follow, unless the context requires otherwise, the word "comprise", and variations such as "comprises" and 110 C:\NRPonbrDCC\SCG\36X3. LDOC-91V/2111 "comprising", will be understood to imply the inclusion of a stated integer or step or group of integers or steps but not the exclusion of any other integer or step or group of integers or steps. I IOA WO 2006/088962 PCT/US2006/005343 Table 11 Chickens injected with 100 pg V120304U2 on day 1. ELISA titer of antibody production on day 18 Animal Bleed Day ELISA Titer U0682 Prior to <50 (Control) administration of vaccine u0682 18 days after >204,800 administration U0683 Prior to <50 (Control) administration of vaccine u0683 18 days after >204,800 administration Rabbits injected with 100 pg V120304U2 on day 1. ELISA titer of antibody production on day 20 Animal Bleed Day ELISA Titer D4500 Prior to <50 vaccine (Control) administration d4500 20 days after >204,800 administration D4501 Prior to 100 vaccine (Control) administration d4501 20 days after >204,800 administration 111 WO 2006/088962 PCT/US2006/005343 Table 12 Rabbits injected with 100 pg V120304U2 on day 1. OD450 results for titers on days 7, 20 and 28 in individual wells Animal Test Day Well Well Well Well Well Well 1 2 3 4 5 6 d4500 Day 7 0.11 0.10 0.09 0.08 0.07 0.07 Day 20 0.49 0.38 0.23 0.19 0.22 0.17 Day 28 2.77 1.41 0.92 0.56 0.43 0.42 Well Well Well Well Well Well 7 8 9 10 11 12 d4500 Day 7 0.06 0.06 0.06 0.06 0.6 0.6 Day 20 0.02 0.16 0.17 0.15 0.19 0.28 Day 28 0.17 0.14 0.12 0.11 0.11 0.10 Well Well Well Well Well Well 1 2 3 4 5 6 d4501 Day 7 0.25 0.18 0.15 0.11 0.09 0.08 Day 20 0.50 0.23 0.20 0.16 0.18 0.18 Day 28 1.75 0.84 0.61 0.50 0.34 0.35 Well Well Well Well Well Well 7 8 9 10 11 12 d4501 Day 7 0.07 0.07 0.07 0.06 0.06 0.06 Day 20 0.16 0.18 0.16 0.17 0.17 0.25 Day 28 0.20 0.14 0.12 0.12 0.11 0.13 112 WO 2006/088962 PCT/US2006/005343 A~pfdi"" 'EYfplorer" Tcl Application with Replikin recognizer. ############################################################### # GENOME-EXPLORER #######################################4############################# # Copyright (C) 2003 by Samuel Bogoch and Elenore Bogoch # All rights reserved. # The basic design style is aggunlative. Originally a batch tool to do the search, # with a Tk gui tacked on, and then various parts, such as being able to stop a # search, additional criteria, etc. set Contents [file dir [file dir [lindex $tcl_libPath 0]]] set env(PATH) $env(PATH) :$Contents/Resources/bin lappend env(TCLLIBPATH) $Contents/Resources/lib set versionDate 2004-01-24 proc ::tk::mac::OpenDocument {args} global Contents set files {} set paths {} foreach document $args set path [exec osascript -e "get POSIX path of file \"$document\""] if ([string equal [file extension $path] .tclj} lappend files [file tail $path] lappend paths $path } if ([string equal [tk messageBox -message "Install upgrade: [join $files (, )]?" -type yesno -title "Install Upgrade"] yes] } foreach path $path { file copy -force $path $Contents/Resources/Scripts/[file tail $path] exit } proc loadParser {{ global env Contents if {[catch {load [lindex $env(TCLLIBPATH) end]/ncbiasn.dylib) rs]) log "load ncbiasn.dylib failed: $rs" uplevel #0 source "$Contents/Resources/Scripts/asn-parser.tcl" log "sourced asn-parser.tcl" else { log "loaded ncbiasn.dylib" after idle loadParser set logFile "-/Library/Preferences/com.omyx.genome.log" proc log message { global log logFile puts $message if { l[info exists log] } (set log [open $logFile w]} set message "[clock format [clock seconds] -format {%Y/%m/%d %H:%M:%S}] . $message" puts $log $message flush $log catch { .log.t.t insert end $message\n .log.t.t see end ########################################0########################### # PREFERENCES # #################################################################### 113 WO 2006/088962 PCT/US2006/005343 :)prfs/i6 c/com.omyx.genome prefs.tcl" array set preference {debuggingMode 0 outputDirectory -/Desktop openHTML 0} catch {source $prefsFile} if {[catch {cd $preference(outputDirectory))]} cd -/Desktop set preference(outputDirectory) -/Desktop set ch [open $prefsFile w] puts $ch "array set preference {" foreach (p v) [array get preference] puts $ch "[list $p] [list $v]" puts $ch "}" close $ch foreach p sortt [array names preference]] log "preference $p = $preference($p)" proc ::tk::mac::ShowPreferences {{ global preference preferences toplevel preferences wm title .preferences Preferences wm geometry .preferences 400x150+100+100 array set ::preferencel [array get ::preference] place [ label .preferences.lwd -text "Working directory:" -anchor e -x 0 -y 0 -relwidth 0.45 -height 25 place [ button .preferences.bwd -textvar preferencel(outputDirectory) -anchor w -command set T [tkchooseDirectory -initialdir $preferencel(outputDirectory) \ -title "Output Directory" -mustexist 0] if {[string length $T]} { set preferencel(outputDirectory) $T } -relx .50 -y 0 -relwidth 0.45 -height 25 place [ label .preferences.ldebug -text "Save fetched pages:" -anchor e ] -x 0 -y 30 -relwidth 0.45 -height 25 place [ checkbutton .preferences.cdebug -text " " -variable preferencel(debuggingMode) -relx .50 -y 30 -relwidth 0.45 -height 25 place [ label .preferences.lcache -text "Purge query cache:" -anchor e -x 0 -y 55 -relwidth 0.45 -height 25 set cachesize [llength [glob -nocomplain ".cache/*"]] place [ button .preferences.bcache -text "Purge $cachesize pages" -command catch { foreach fn [glob -nocomplain ".cache/*" ".cache/.query" (.cache/.[A-Z]*}] {file delete -force $fn} } .preferences.bcache configure -state disabled -relx .50 -y 55 -relwidth 0.45 -height 25 if {!$cachesize} { .preferences.bcache configure -state disabled place label .preferences.lopen -text "Open results in browser:" -anchor e -x 0 -y 80 -relwidth 0.45 -height 25 place [ checkbutton .preferences.copen -text " " -variable preferencel(openHTML) -relx .50 -y 80 -relwidth 0.45 -height 25 place [ button .preferences.cancel -text Cancel -command destroy .preferences unset -nocomplain preferences -x 0 -rely 1.0 -y -35 -width 100 place [ button .preferences.reset -text Reset -command array set preferences [array get preference] -relx 1.0 -x -210 -rely 1.0 -y -35 -width 100 114 WO 2006/088962 PCT/US2006/005343 10ce!6 L button .preferences.okay -text Okay -command array set preference [array get preferences] file mkdir $preference(outputDirectory) cd $preference (outputDirectory) set ch [open $prefsFile w] puts $ch "array set preference {" foreach (p v} [array get preference] puts $ch "[list $p] [list $v] " } puts $ch "}" close $ch destroy .preferences unset -nocomplain preferences foreach p [lsort [array names preference]] log "preference $p = $preference($p)" } } -relx 1.0 -x -100 -rely 1.0 -y -35 -width 100 place [ button .preferences.crap -text " " -command ] -relx 1.0 -x -100 -rely 1.0 -y -35 -width 100 lower .preferences.crap #######################################4############################## # INITIALISE GLOBALS # ################################################4#################### set seen 0; # Number of entries examined. set pass 0; # Number of those passing the criteria. set fail 0; # Number of those failing the criteria. set stop 0; # If the search was terminated, array set replikinRef {}; # All references to each specific replikin. array set coding { GCT a GCC a GCA a GCG a TGT b TGC c GAT d GAC d GAA e GAG e TTT f TTC f GGT g GGC g GGA g GGG g CAT h CAC h ATT i ATC i ATA i AAA k AAG k TTG 1 TTA 1 CTT 1 CTC k CTA 1 CTG 1 ATG m AAT n AAC n CCT p CCC p CCA p CCG p CAA q CAG q CGT r CGC r CGA r CGG r AGA r AGG r TCT s TCC s TCA s TCG s AGT s AGC s ACT t ACC t ACA t ACG t GTT v GTC v GTA v GTG V TGG w TAT y TAC y TAA * TAG * TGA * set db "NCBI"; # Data base searched, NCBI or LANL. set query ""; # The query string. set passpercent ""; # 100 * pass/seen. set failpercent ""; # 100 * fail/seen. #################################################################### # DATABASE QUERY # #################################################################### 115 WO 2006/088962 PCT/US2006/005343 #11 "'helg qLe' 9 ar2I icEHcache, <otputfolder>/. cache that # contains, variously, returned HTML pages, parsed sequence data, # and the query term that fetched all these pages. # Drive the query. If the query term-has changed from that cached # term or there is no cache, the cache is cleared and new pages are # fetched from NCBI or LANL as requested. The term is sent to the # query page and the initial response parsed. If there are more 20 # returns, the query is repeated with the all the returns requested. # Each return is a short paragraph which is parsed for the accession # number and sequence data page url. The page is fetched and saved. # The second pass then parses the fetched pages. NCBI and LANL # pages have different parsers. # If the third pass if the cache had to be filled, or the first # pass if the cache was okay and no pages were fetched, the parsed # sequence data is checked against the criteria. # Matching goes through three steps. (1) The query term is used # to fetch sequence data from NCBI or LANL. Only pages fetched # contribute to the 'seen' count. (2) The peptide sequence is # compared to repilikin criteria. (3) Those matching can then be # passed through the additional criteria, if any. proc GenomalEnquirer {database term additionalCriteria) global seen pass fail stop failing coding replikinRef global kmin kmax hmax percent log "GenomalEnquirer database=$database term='$term' additionalCriteria='$additionalCriterial" set seen 0 set pass 0 set fail 0 set stop 0 status "" 0 0 array unset replikinRef * set failing [open fail.htm w] puts $failing {<html><head><title>Failing Sequences</title></head><body>} puts $failing "<p><b>query:</b> $term" regsub -all {\s+} $term {+} term set queryerror 0 if {[catch {open .cache/.query r} fd]{ set fetching 1 else { foreach {olddatabase oldterm) [read $fd] break close $fd set fetching [expr {![string equal $olddatabase $database] | ![string equal $oldterm $term])}] if {$fetching) set previousTerm $term package require http file mkdir .cache foreach fn [glob -nocomplain ".cache/*" ".cache/.query"] {file delete -force $fn} set rc [catch while 1 { status "Fetching from $database" 0 0 update; if {$stop} {status "Stopped." 0 0; return} switch $database NCBI { set url http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Search&db=Protein&term=$term&doptcmdl=Doc Sum LANL{ set url http://www.flu.lanl.gov/search2/resultNhtml?search=l&field=ALL&num=20&hspecies=Any&seg=any&nu corpro=nuc&orderby=dateasc&text=$term } set token [::http::geturl $url] set queryPage [;:http::data $token] ::http::cleanup $token log "got [string length $queryPage] bytes" 116 WO 2006/088962 PCT/US2006/005343 ;~d{I: /p eYe~ie ('deuggingMode) ){ set fn [open query.htm w] puts $fn $queryPage close $fn if ([regexp {Your request could not be processed due to a problem) $queryPage]) status "Server busy: retrying in 5 seconds.. 1 0 0 after 5000 continue switch $database NCBI { regexp {<td align="center" width="50%"><div class="?(?:pagerimedium2)"?>Items 1-[0-9]+ of ([0-9]+)</div></td>) $queryPage all numentries LANL{ regexp {Got <B>([0-9]+)</B> hits total) $queryPage all numentries } if {![info exists numentries] |1 [string equal $numentries ""]| $numentries==0) { status "Nothing matched the query." 0 0 return } if {$numentries>20) log "Fetching $numentries entries from $database" update; if {$stop) (status "Stopped." 0 0; return) status "Fetching $numentries queries from $database." 0 0 switch $database NCBI { set token [::http::geturl "http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?SUBMIT=y" \ -query [::http::formatQuery \ db Protein \ term [string map {+ { }} $term] \ dispmax $numentries \ ]] LANL set token [::http::geturl "http://www.flu.lanl.gov/search2/resultN.html?search=l&field=ALL&num=$numentries&hspecies=Any& seg=any&nucorpro=nuc&orderby=dateasc&text=$term"] } set queryPage [::http::data $token] ::http::cleanup $token log "got [string length $queryPage] bytes" if {$::preference(debuggingMode)) set fn [open query.htm w] puts $fn $queryPage close $fn if {[regexp {Your request could not be processed due to a problem) $queryPage]) status "Server busy: retrying in 5 seconds..." 0 0 after 5000 continue break set work {} set queryPage [string map {"\"" '} $queryPage] file mkdir ,cache/.$database switch $database NCBI { foreach {uselesslink relativeurl accessionnumber) [regexp -all -inline (<a href='(/entrez/viewer.fcgi[?]db=protein&val=[^<>']*) '>([^<]*)/a>} $queryPage] \ { set accessionurl http://www.ncbi.nlm.nih.gov:80$relativeurl set accessionlink "<a href='$accessionurl'>$accessionnumber</a>" lappend work $accessionlink $accessionurl $accessionnumber } 117 WO 2006/088962 PCT/US2006/005343 foreach (accessionlink accessionurl accessionnumber) \ [regexp -all -inline {<A HREF='(viewrecord.html[?] [^<>]*)'[^<>]*>([^<]*)</A>) $queryPage] \ set accessionurl "http://www.flu.lanl.gov/search2/$accessionurl" set accessionlink [string map ("HREF='" "HREF='http://www.flu.lanl.gov/search2/") $accessionlink] lappend work $accessionlink $accessionurl $accessionnumber } while ([llength $work]>0) set working $work set work {} set i 0; set n [expr {[llength $working]/3)] foreach {accessionlink accessionurl accessionnumber) $working if {$stop) {status "Stopped." 0 0; return) if {![info exists repeat($accessionnumber)]) set repeat($accessionnumber) 0 incr i; status "Fetching $accessionnumber" $i $n set accessionurl [string map {& &) $accessionurl] set token [::http::geturl [string map (query.fcgi viewer.fcgi) $accessionurl]&view=asn] set page [::http::data $token] ::http::cleanup $token if {$::preference(debuggingMode)) set fn [open page.htm w] puts $fn $page close $fn } if ([regexp {try again later) $page]) lappend work $accessionlink $accessionurl $accessionnumber log "server busy: $accessionnumber" continue } set ch [open .cache/.$database/$accessionnumber w] puts $ch [list $accessionlink $accessionurl $page] close $ch if {[llength $work]>0) status "Retrying [expr lengthgh $work]/3)] queries in 5 seconds..." 0 0 after 5000 } set fd [open .cache/.query w] puts $fd [list $database $term] close $fd rs] set queryerror $rc if {$rc && ![string equal {couldn't open socket: host is unreachable) $rs]) error $rs set paths [glob -nocomplain -join .cache {.[A-Z]*) *] set i 0; set n [llength $paths] foreach path $paths { if {$stop) {status "Stopped." 0 0; return) set database [file tail [file dirname $path]] set accessionnumber [file tail $path] incr i; status "Parsing $accessionnumber" $i $n set ch [open $path r] set rc [catch {foreach {accessionlink accessionurl page) [read $ch] break)] close $ch if {$rc) file delete $path continue } unset -nocomplain PAGE set PAGE(%accession) $accessionnumber set PAGE(%link) $accessionlink set PAGE(%uri) $accessionurl switch $databasel .NCBI 118 WO 2006/088962 PCT/US2006/005343 t'drN ~ ~ ~ 1 921"is <pre>Seq- entry :="$page] if {$starts>=0} { incr starts 5 set stops [string first "</pre>" $page $starts] if {$stops>o) {incr stops -1) else (set stops end) set page [string range $page $starts $stops] array set PAGE [ncbiasn $page] } set PAGE(%seqdata) {} foreach (p seqdata) [array get PAGE *.seq-data.ncbieaa.*] set q [string map {inst.seq-data.ncbieaa id.0.ddbj.accession) $p] if {[info exists PAGE($q)] && ![string equal $PAGE($q) $accessionnumberl} continue regsub -all {\s+) $seqdata {} seqdata lappend PAGE(%-seqdata) [string tolower $seqdata] if ([llength $PAGE(%-seqdata)]==0) { log "sequence-data not found: $accessionnumber" continue } set PAGE(%definition) {} foreach (p v) [array get PAGE *.descr.*.article.title.*.name.>] {lappend PAGE(%definition) $v) set PAGE(%definition) [join $PAGE(%definition) " "] set PAGE(%source) unknown foreach {p v} [array get PAGE *.source.org.taxname.>] set PAGE(%source) $v set PAGE(%strain) unknown foreach (p v} [array get PAGE *.subtype.>] if {[string equal $v strain]) { set PAGE(%strain) $PAGE([string map (.subtype.> .subname.>) $p]) if ([string equal $PAGE(%strain) unknown] && [regexp {$([^()]*(\([^)]*$)?)\)\s*$) $PAGE(%source) -> strain] set PAGE(%strain) $strain set slash [string last / $PAGE(%strain)] if {$slash>=0 && [regexp -start $slash {/([0-9]+)) $PAGE(%strain) -> y]) if {$y<30} {incr y 2000) elseif {$y<100) {incr y 1900) set PAGE(%year) $y else { set PAGE(%year) unknown } foreach (p x) [array get PAGE *.subtype.>] if { [string equal $x isolate]} { set p [string map {.subtype.> .subname.>) $p] if ([info exists PAGE($p)]} { set dates [regexp -inline -all (/(\d+)) $PAGE($p)] if lengthgh $dates]>0) { set y [lindex $dates end] regexp {'[^/]+) $PAGE($p) PAGE(%country) if {$y<30} {incr y 2000) elseif {$y<100) {incr y 1900) if ($PAGE(%year) eq "unknown" 11 $PAGE(%year)>$y} (set PAGE(%year) $y) } foreach (p y} [array get PAGE *.year.>] if {$y<30} {incr y 2000) elseif {$y<100} {incr y 1900) if ($PAGE(%year) eq "unknown" 1| $PAGE(%year)>$y) {set PAGE(%year) $y) if {[regexp {$([^)]+)$\s*$} $PAGE(%strain) -> s]} set PAGE(%serotype) $s elseif {[regexp {$(^)+)$\)\s*$} $PAGE(%source) -> s]) set PAGE(%serotype) $s else { set PAGE(%serotype) unknown foreach (p x) [array get PAGE *.subtype.>] J if ([string equal $x serotype]) { set p [string map (.subtype.> .subname.>) $p] if ([info exists PAGE($p)]} 119 WO 2006/088962 PCT/US2006/005343 "'He AGE(%serotype) $PAGE($p) break } } } } LANL{ foreach (all attribute value} \ [regexp -all -inline {<tr><td><span class="bold">([^<>]+)</span></td>\s*<td>((?:[^<] 1<[^/] ]</[^t] ]</t[^d] ]</td[^>])*)</td>\s*</tr> $page] \ regsub -all {<[^<>]*>) $value () value set value [string trim [string map (  " " & & < < > > ' &quot "\""$} value]] regsub -all {\s+} $value { } value set PAGE([string map {" " -} $attribute]) $value switch $attribute { Strain (set PAGE(%strain) $value) Definition {set PAGE(%definition) $value) Source (set PAGE(%source) $value) "Collection Year" (set PAGE(%year) $value) Serotype (set PAGE(%serotype) $value) "Raw Sequence" { regsub -all {\s+) $value ( value set PAGE(%-seqdata) "'" for {set c 0) ($c<[string length $value]} (incr c 3) set codon [string range $value $c [expr {$c+2}]] if ([info exists coding($codon)]} { append PAGE(%seqdata) $coding($codon) } else { append PAGE(%seqdata) ? } I} }} set ch [open .cache/$accessionnumber w] foreach (p v) [array get PAGE] { puts $ch "[list $p] [list $vj" } close $ch file delete $path set paths [glob -nocomplain ".cache/*"] set i 0; set n [llength $paths] if {$n==0 && $queryerror} { error "could not connect to $database" foreach path $paths set accessionnumber [file tail $path] if {$stop) (status "Stopped." 0 0; return) incr i status "Searching $accessionnumber" $i $n unset -nocomplain PAGE set ch [open .cache/$accessionnumber r] array set PAGE [read $ch] close $ch if (I[info exists PAGE(%kmin)] $PAGE(%kmin)!=$kmin $PAGE(%kmax) !=$kmax $PAGE(%hmax) !=$hmax $PAGE(%percent) I=$percent set results {} set sequence {} foreach sequence $PAGE(%seqdata) set result [match $sequence] if ([llength $result]) { lappend results [list [llength $result] $result $sequence] } 120 WO 2006/088962 PCT/US2006/005343 set results sortt -integer -index 0 $results] set PAGE(%result) [lindex $results end 1] set PAGE(%sequence) [lindex $results end 2] } else { set PAGE(%result) {} set PAGE(%sequence) {} array set PAGE [list \ $kmin $kmin \ %kmax $kmax \ %hmax $hmax \ %percent $percent \ incr seen set PAGE(%accept) [expr {[llength $PAGE(%result)]>0)] if ($PAGE(%accept)) { foreach {kind descr string) $additionalCriteria if {![string equal $kind =)} continue catch (unset exists) foreach {p extent) [array get PAGE $descr] regsub -all {[^-[:alnum:]@V_+=/]+} $extent { } extent foreach word [split [string tolower [string trim $extent]] set exists($word) 1 regsub -all {{^-[:alnum:]@%_+=/]+) $string { } string set any 0 foreach word [concat [split [string tolower [string trim $string]] unknown] set any [info exists exists($word)] if {$any} break if {!$any} {set PAGE(%accept) 0; break) if ($PAGE(%accept)) foreach (kind descr string) $additionalCriteria if ([string equal $kind =]} continue set PAGE(taccept) 1 catch (unset exists) foreach (p extent) [array get PAGE $descr] regsub -all {[^-[:alnum:]_]+} $extent { } extent foreach word [split [string tolower [string trim $extent]] "] set exists($word) 1 regsub -all ([^-[:alnum:]_]+) $string { } string foreach word [split [string tolower [string trim $string]] '] if $kind[info exists exists($word)] set PAGE(%accept) 0; break if {$PAGE(laccept)) break set ch [open .cache/$accessionnumber w] foreach {p v) [array get PAGE] { puts $ch ",[list $p] [list $v]" close $ch if ($PAGE(%accept)) reportStep [pass] incr pass else { fail incr fail update aasequenceHistoryReport replikinReport close $failing status "Completed." 1 1 A-i 121 WO 2006/088962 PCT/US2006/005343 proc reportStep (r) upvar 1 PAGE PAGE upvar 1 mindate mindate upvar 1 maxdate maxdate upvar 1 nrep nrep upvar 1 sumrep sumrep upvar 1 sumsqrep sumsqrep upvar 1 reference reference if {! [string equal $PAGE [%year) unknown] } if (![info exists mindate] || $PAGE(%year)<$mindate} {set mindate $PAGE(%year)} if {![info exists maxdate] || $PAGE(%year)>$maxdate} {set maxdate $PAGE(%year)} if ([info exists nrep($PAGE(%year))]} set nrep($PAGE(%year)) 0 set sumrep($PAGE(year)) 0.0 set sumsqrep($PAGE(%year)) 0.0 set reference($PAGE(%year)) "" incr nrep($PAGE(%year)) 1 set sumrep($PAGE(%year)) [expr ($sumrep($PAGE(%year)) + $r}] set sumsqrep($PAGE (%year)) [expr {$sumsqrep($PAGE(%year)) + $r*$r}] append reference($PAGE (%year)) " \n$PAGE(%link) <a href='pass/$PAGE(%accession).htm >[llength $PAGE(%result)]</a>" proc aasequenceHistoryRepcrt {} global replikinRef file mkdir replikin-ref foreach fn [glob -nocomplain "replikin-ref/*"] if {[catch {file delete $fn} rs]) {puts $rs} foreach aasequence [lsort [array names replikinRef *.fid]] set f [open $replikinRef($aasequence) w] set aasequence [lindex [split $aasequence .1 0] log "Writing sequence history: $aasequence" puts $f "<html><head><title>Replikin History</title></head><bcdy> <H1>Replikins, Inc.</H1> <H2>Sequence History by Year</H2> <H2>$aasequence</H2> <p>All occurences of the sequence by year:</p> <dl> foreach x sortt [array names replikinRef $aasequence:*]] puts $f "<dt>[lindex [split $x :) l1</dt><dd>[join $replikinRef($x) {, }].</dd>" puts $f "</dl>" close $f proc replikinReport {{ upvar 1 term term upvar 1 mindate mindate upvar 1 maxdate maxdate upvar 1 nrep nrep upvar 1 sumrep sumrep upvar 1 sumsqrep sumsqrep upvar 1 reference reference if {[info exists mindate]} catch (file delete -force $term,htm} log "Writing report $term.htm" set stats [open $term.htm w] puts $stats "<html><head><title>Replikin Analysis</title></head><body> <H1>Replikins, Inc.</Hl> <H2>Replikin Count by Year</H2> <H3> [string map (+ { }} $term]</H3> <TABLE> <TR> <TH align='center' valign='top'>Year</TH> <TH align='center' valign=top'>PubMed Accession Number-Replikin Count</TH> <TH align='center' valign='top'>No. of Isolates per year</TH> <TH align=center' valign='top'>Mean Replikin Count per year</TH> 122 WO 2006/088962 PCT/US2006/005343 CmTH alglzel'teitet' valign='top'>S.D.</TH> </TR> set Y {} if ([info exists nrep(unknown)]} {lappend Y unknown) for (set y $mindate) ($y<=$maxdate) (incr y} {lappend Y $y) foreach y $Y ( if ([info exists nrep($y)]} set mean [expr ($sumrep($y)/$nrep($y)}] # var = (sum (x - m)^2)/(n-1) # (sum (x^2 - 2xm + m^2))/(n-1) # = (sum x^2 - 2m sum x + m^ 2 sum l)/(n-l) # = (sumsq - 2*m*sum + n m^2)/(n-1) if ($nrep($y)==l} ( set sd 0.0 else { if ([catch {expr (sqrt(abs(($sumsqrep($y) + $nrep($y)*$mean*$mean 2*$mean*$sumrep($y))/($nrep($y)-l)))}} sd]) set sd 0.0 puts $stats "<TR> <TD align='center' valign='top'>$y</TD> <TD align='left' valign='top'>$reference($y)</TD> <TD align='right' valign='top'>$nrep($y)</TD> <TD align='right' valign='top'>{format %.lf $mean]</TD> <TD align='right' valign='top'>{format %.lf $sd]</TD> </TR>" else { puts $stats "<TR><TD align='center' valign='top'>$y</TD><TD></TD><TD></TD><TD></TD><TD></TD></TR>" puts $stats "</TABLE>" close $stats if ($::preference(openHTML)} { log "open location \"file://127.0.0.l[file join [pwd] $term.htm]\"" exec osascript -e "open location \"file;//127.0.0.1[file join [pwd $term.htm]\" ########################## ##########################4### # # # MATCH # # # ######################################################### 4 # Check the sequence data against the primary match criteria: # # Discover a subsequence h ...k. .. k, k.. .h... k, or k.. .k.. .h such that # (1) The distance between ks is in the range kmin..kmax. # (2) The distance between an h and the farthest k is in the range kmin+l..hmax. # (3) The fraction of k in the subsequence is percent or larger. # # The sequence is searched for all possible subsequences that match, # and all these subsequences are returned. set kmin 6; set kmax 10 set hmax 50 set percent 6 proc match {sequence} global kmin kmax global hmax global percent set pos 0 set L {} array set F {} foreach e [regexp -all -indices -inline k $sequence] 123 WO 2006/088962 PCT/US2006/005343 for (set i 1) ($i<[llength $L]} (incr i) set kO [lindex $L [expr ($i-1)]] # rule 1. for (set j $i; set wideenough 0} {!$wideenough && $j<[llength $L]) (inar j) set ki [lindex $L $j) if {$kl-$kO<$kmin) continue if ($kl-$k>$kmax) break # rule 2. set offset {expr $kl-$hmax] if ($offset<o} (set offset 0) while 1 { set h [string first h $sequence $offset] if ($h<O I| $h>$kO+$hmax) break if {$h<$kO} set b $h else { set b $kO if ($h>$kl) set e $h } else { set e $k1 # rule 3. set subsequence [string range $sequence $b $e] set nk [regexp -all k $subsequence] if (double($nk)/double([string length $subsequence])*100>=$percent} set "F($b $e)" 1 incr offset Sort -integer -index 0 [array names F] #############4############################################f########### # # # PASS # # # ##################################################################### # Handle a passing page. The sequence is formatted on its own # formatted html page. * Some attempt is made to color code separate matched subsequences # and overlaps. Perhaps less clear than intended, but it does look # pretty, doesn't it? proc pass (} { global replikinRef upvar 1 PAGE PAGE file mkdir pass set passing [open pass/$PAGE(%accessicn) .htm w puts $passing {<html><head><title>Replikin Analysis</title></head><body>} puts $passing {<H1>Replikins, Inc.</Hl>) puts $passing {<H2>Replikin Analysis</H2>} puts $passing "<dl>" puts $passing "<dt>PubMed Code:<dd>$PAGE(%link)" puts $passing "<dt>Description:<dd$PAGE(%definition) " puts $passing "<dt>Isolated:<dd>$PAGE(%year) " if ([info exists PAGE(%country)]) (puts $passing "in $PAGE(%country)"} if ( l[string equal $PAGE(%source) unknown] } (puts $passing "<dtySource:<dd>$PAGE (%source) "I if {![string equal $PAGE(%strain) unknown]) (puts $passing "<dt>Strain:<dd>$PAGE (%strain) "} 124 WO 2006/088962 PCT/US2006/005343 t a e) unknown]) {puts $passing "<dt>Serotype: <dd>$PAGE (%serotype) ") puts $passing "</dl>" set n 0 array set where {Amino-terminal {} Mid-molecule {} Carboxy-terminal {} foreach ch [split $PAGE(%sequence) "'] { set C($n) " <b>$ch</b><SUP>[expr ($n+l}]</SUP>" set S($n) {} set B($n) {} set E($n) {} incr n se~t s 0 foreach pq $PAGE(%result){ foreach {p q) $pq break if {$p<=$n/3} { lappend where(Amino-terminal) $p $q elseif {$p<=(2*$n)/3} { lappend where(Mid-molecule) $p $q ) else { lappend where(Carboxy-terminal) $p $q } set B($p) [linsert $B($p) 0 $s] lappend E($q) $s for (set i $p} ($i<=$q} {incr i) foreach t $S($i) { set G($s,$t) 1 set G($t,$s) 1 lappend S($i) $s set G($s) {} incr s set W {} for (set i 0) {$i<$s} {incr i} lappend W [list $i [llength [array names G $i,*]]] } foreach w [lsort -index 1 -integer $W{ set i [lindex $w 0] array unset A * foreach ij [array names G $i,*] set j [lindex [split $ij ,] end] if ([info exist K($j)]) (set A($K($j)) 1) } for {set k 0) {$k<$s} (incr k} if {![info exists A($k)]} break } set K($i) [expr {$k%4)] } array set colour { 0 #000000 1 #00A000 2 #000000 3 #0080A0 4 #C00000 5 #80A000 6 #A000A0 7 #4040A0 8 #C08000 9 #C0C000 10 #00A080 11 #608060 12 #F08000 13 #C0A040 14 #A04080 15 #808080 for (set i 0) {$i<$n) (incr i) set NS 0 foreach j $S($i) set NS [expr {$NS I (1<<$K($j))}j set S($i) $colour($NS) 125 WO 2006/088962 PCT/US2006/005343 puts $passing "<p>" set string {} for (set i 0) ($i<$n} (incr i} #foreach j $B($i) (lappend string "<" $colour([expr (1<<$K($j)}])} lappend string $C($i) $S($i) #foreach j $E($i) (lappend string ">" $colour([expr (1<<$K($j)}])) set wascol #000000 puts -nonewline $passing "<font color='#000000'>" set i 1 foreach (ch col) $string if {![string equal $col $wascol]} puts $passing "</font>" puts -nonewline $passing "<font color='$col'>" set wascol $col puts -nonewline $passing " $ch" incr i puts $passing "</font></p><dl>" set aatag 1 foreach w (Amino-terminal Mid-molecule Carboxy-terminal} puts $passing <dt>$w</dt><dd> if ([llength $where($w)]) { foreach (p q) $where($w) set aasequence [string range $PAGE(%sequence) $p $q] if (![info exists replikinRef($aasequence.fid)]} if (![info exists replikinRef(+fid)]) set replikinRef(+fid) 1 } set replikinRef($aasequence.fid) replikin-ref/[format %O5d $replikinRef(+fid)].htm incr replikinRef(+fid) lappend replikinRef($aasequence:$PAGE(%year)) \ "<a href='../pass/$PAGE(%accession).htm#$aatag'>$PAGE(%accession) position [expr ($p+1}]</a>" puts $passing "<p><a name='$aatag'><a href=../$replikinRef($aasequence.fid)>" for (set i $p} ($i<=$q) (incr i} puts $passing $C($i) puts $passing "</a></a></p>" incr aatag puts $passing </dd> else { puts $passing "Zero replikins." } puts $passing </dl> set replikincount [expr (double(100*[llength $PAGE (%result)])/double($n)}] puts $passing "<p>Replikin Count = Number of Replikins per 100 amino acids = lengthh $PAGE(%result)]/$n = [format %.lf $replikincount]</p>" close $passing return $replikincount ##################################################################### # FAIL # # # #################################################################### 4 Add the sequence to the failing page. proc fail } ( upvar 1 PAGE PAGE global failing kmax if ([info exists failing]} set failing [open fail.htm w] 126 WO 2006/088962 PCT/US2006/005343 pI t&"$fiiThf{<htm eddtitle>Failing Sequences</title></head><body>) } puts $failing "<p>$PAGE (%link)" set d $kmax set i 0 foreach e [split $PAGE (%sequence) ""] switch $e h { set j [string last k $PAGE(%sequence) $i] if ($j<O} set j else set j <sub>[expr ($i-$j)]</sub> } set k [string first k $PAGE (%sequence) $i] if ($k<o) set k" else { set k <sub>[expr ($k-$i)]</sub> } incr d if {$d<=$kmax} puts -nonewline $failing " <sup>$d</sup><i>$j$e$k</i>" ) else { puts -nonewline $failing " <i>$j$e$k</i>" k( puts -nonewline $failing " <b>$e</b>" set d 0 ) default incr d if ($d<=$kmax) puts -nonewline $failing " <sup>$d</sup>$e" ) else { puts -nonewline $failing " $eO } incr i puts $failing } #################################################################### # SEARCH CONTROL GUI # #################################################################### wm title . (Genome Explorer) . configure -menu menubar menu .menubar menu .menubar.file -tearoff 0 .menubar add cascade -label File -menu .menubar.file .menubar.file add command -label "Show Spy" -command (activitylog) .menubar.file add command -label Quit -command (exit 0} -accel Command-Q bind . <Command-q> (exit 0} menu .menubar.edit -tearoff 0 .menubar add cascade -label Edit -menu .menubar.edit .menubar.edit add command -label Cut -command (event generate %W <<Cut>>} -accel Command-X .menubar.edit add command -label Copy -command (event generate %W <<Copy>>) -accel Command-C .menubar.edit add command -label Paste -command (event generate %W <<Paste>>) -accel Command-V .menubar.edit add command -label Clear -command (event generate %W <<Delete>>) menu .menubar.help -tearoff 0 .menubar add cascade -menu .menubar.help .menubar.help add command -label "Genome Explorer Help" -command exec open $Contents/Resources/Documents/Instructions.rtfd .menubar.help add command -label Manifest.rtf -command 127 WO 2006/088962 PCT/US2006/005343 $ et rces/Documents/Mani fest.rtf wm resizable 1 0 bind <Command-k> (console show} bind <Command-l> {console hide} pack [frame .query] -side top -fill x pack [menubutton .query.db -textvariable db -menu .query.db.m] -side left menu .query.db.m -tearoff 0 .query.db.m add command -label www.ncbi.nlm.nih.gov -command {set db NCBI) .query.db.m add command -label www.flu.lanl.gov -command {set db LANL} pack [label .query.label -text "query: "] -side left pack [frame .query.padd -width 20] -side right pack [entry .query.text -textvariable query] -side left -fill x -expand 1 pack [frame .k] -side top -fill x pack [label .k.1 -text "k distance: "1 -side left pack [entry .k.min -textvariable kmin -width 5] -side left pack [label .k.2 -text "min "] -side left pack [entry .kmax -textvariable kmax -width 5] -side left pack [label .k.3 -text "max"] -side left pack [frame .h] -side top -fill x pack [label .h.1 -text "h distance: "] -side left pack [entry .h.max -textvariable hmax -width 5] -side left pack [label .h.3 -text "max"] -side left pack [frame percent] -side top -fill x pack [label .percent.label -text "percent: %"] -side left pack [entry .percent.text -textvariable percent -width 5] -side left pack [frame count] -fill x -side top pack [label .count.1 -text "pass: "] -side left pack [label .count.2 -textvariable pass -width 5] -side left pack [label .count.3 -textvariable passpercent -width 6] -side left pack [label .count.4 -text ' fail: "] -side left pack [label .count.5 -textvariable fail -width 5] -side left pack [label .count.6 -textvariable failpercent -width 6] -side left pack [label .count.7 -text " total: "] -side left pack [label .count.8 -textvariable seen -width 5] -side left pack [canvas .status -height 20 -bd 0 -background white] -side top -fill x trace variable pass w changed trace variable fail w changed proc changed {namel name2 op) global pass passpercent global fail failpercent global seen if {$seen==0) set passpercent set failpercent else { set passpercent ([expr {(100*$pass)/$seen}]%) set failpercent {{expr ((100*$fail)/$seen}]%) .status create rectangle 0 0 0 19 -fill red -tags bar .status create text 3 19 -anchor sw -fill black -tags text proc status (message i n) { if ([string length $message] && ![string match +* $message]) log $message if {[string match +* $message]} {set message [string range $message 1 end]} .status dchars text 0 end .status insert text 0 $message if {$n==0} { .status coords bar 0 0 0 19 else { .status coords bar 0 0 [expr {([winfo width .status]*$i)/$n)] 19 update pack [frame .button] -side top -fill x pack [button .button.quit -text Quit -command exit 0 }] -side left pack [button .button.search -text Search -command 128 WO 2006/088962 PCT/US2006/005343 cn. bn,_lesa normal .button.search configure -state disabled log "Seaching..." GenomalEnquirer $db $query [collectCritera] .button.stop configure -state disabled .button.search configure -state normal }] -side right pack [button .button.stop -text Stop -command set stop 1 status "Stopping... " 0 0 }] -side right * ADDITIONAL CRITERIA # Additional criteria can be added to the search. These are # substrings that must be included or excluded from the sequence # data description, either in any description or in specific # description. These criteria are checked in order: the first that # fails, fails everything. If all pass, the page passes. # # It might seem you would want a way to do nested includes # and excludes, rather than conjoining of them. However after doing # boolean algebra, it turns out that would be same result. # # This is encoded with timers so that if anything is changed, a # pass is made through the cached data after a few seconds. image create photo praised -data { R01GODlhHgAeAPcAAP///////v///f///P//+///+v//+P//9///9v//9f//7f//6///6v//6f//6P// Sv//4v7+//7+/v7++v7+f7+4/7+3P7+2/396/394v393/z8/Pz89Pv7/vv7zfr63vnS+vnf+fj4yvf3 +/f3+Pf39/f32ff30/f3off3x/b25PXlw/To5PPz9fLy6/LySvHx+vHx+PHx9/Hx8/Hxle/v8e3t4uvr 7erq+Orq9+rqyOnp7unpoofn6+bm7+TkxuLi6OLi5eLi2OLil+Dg3d/fld/fot/fy97et7eodzc79zc ldvb6tra6tra4tnZ7tnZz9jYONfX7tfX7dfX4NbW7dbW59XV6NXV59XV4NXV39XV3tXV29TU/tTU59TU 3tPT4NLS7dLS3tHR69DQ8NDQtDQ48/P/8/P7c7O6s7O6c705s70483N/83N8c3N6c3N6M3N583N48zM /8zM6MzM48vL/SvL~srK/8rK8srK78nJ/8nJ8cnJ3cjI/8jI+8jIB8fH/8fH/cfH88fH8MbG/8bG8sbG 7cbG58bG2MXF/8TE/8TE+8TE+sTE98TElsPD/8PD+8LC/8LC/sLC/cLC/MHB/BHB4sDA/8DAOr+//7+/ /L+/1r+/OryB/by8/Lu7/Lm537i4+7i40LW19bWlzLS077Ky+K+v5a6u862t9K2t86ys~qys66yszqur zaqq~qmp8ampGqmplqenGKen5qen3Kam56Sk6KOjxqKi5qKilqGh7KGh5aCg35+f5Z6e6JycwJubxJqa lZqaz5mZ6JaWxpGR3ZGR1JCQ25CQv4+P246024600yM2YuL2YuLyIqK2ImJ2ImJ14eHOoWF2IWFzoSE OoODIKCXpgx3d3ynd3xw ACwAAAAAHgAeAAAI/wABCBw40MCFFDqQImShAcKCgISpwIIIGHH3H4cPmyZcsXMI30CPlAgOJECz8G WcI0psKDlw9s0FKEqRERDCYFDhDBBpOkRZLCTCCYQefRYswSVFhUsAKQpaQAhVKEIdRgZIiuaB4gtBP rFQHWjOqFVKlFhlzylkqdepQsVfbQvozYuABI5jaIg36VuBYvYsgdYkg0IQjSID5Vo2rVxESAAaWSEOc li/ItoUObdBgCJ0lz6AtaULTFOCOX4tCg5YOCQgNQFpiy44NpkgBgjA4UZktO8uePkO8QGDQojxBgsC SFTg4LhxFqZCZblyIKdliTJkoVpjpfrl7zFk3fiCO/07eFi9sGBBYP56jFa+gtwxkOaN/ftOlEjsMObN /ftlXILLLi+I4owOOSSoYDWzlCTWNs8oKOEOrLjCgSB42KHhhnbsUUVpONRyBocbnkHKIwBIgRhlIF4m 1RzI1ACAD4oAStZiZLUJVyCe5CCSBG4WwiCNgbVDTwA7MFJjW4rBlSNScwAzikRMUBIkWCO+aUcqxkgw UR2b5NikZTm2cQo2IJiUCCl4BDkmAFYFsogfcxSjTAnWOWGLJ3YUQkYTUwT6BCK9niNHGKdyU4uVlJHQy zCqgHMPMMsskI00svGgTzA3tCRSCGK8Q0Bwl2VgDjTCZzJBTQAA7 image create photo xraised -data R01GODlhHgAeAOYAAP///9bW1t7e3ufn5+/v7/f39/f37///997elvf35///797ezufnlv//5/f33ufn zv//3vf3lu/vzufnxv//1vf3zu/vxv//zvf3xv//xufn7+/v9/f3/9bW3t7e5+fn98703tbWS97e78bG lufn/97e99bW78bG3s705729lrWlztbW987077293q2tztbW/6Wlxs7098bG77W13s70/8bG972975yc xpSUva2t3sbG/6Wllr2995yczq2t56Wl3r29/7W195yclq2t75SUzqW157W1/Syc3q2t94SEvZSUpyc 54ExpSU3oyMpyc74SEzpSU54yM3 S nt7zoSE3nt71gAA AA LCwAAAAAHgAeAAAH/4AAgOGEEBkWEwsL DxIRDYSQkYIHFxMoQJg~mEA6LAwQkpIQEzWcOqeopSkCCqGDFSgdNamoqCAhNQmuGDUIFAWltaogGQor QAaSETIBFILApqlAHRgHAAosQAWQDSi+hAUxoafEhAomOhyDBwggzpAb4sMZluAtL4MONTUkkvGYHTJI EtKEhggA7DgF+SCpgA5q9QgJoUJDhwwAEGadWijJAIaBTipqGhCBFkdXEimmAsFgnA4eJVAKmigSlYwO woDEdNUjJC0aPkK4VMVQUg4rNVMVQZETRIVQGuT9XOLBJZAAAl2FG3pkAK2rWVFuTUVDCIFoYGUOGovK hQJUaZMj7VjRUCqSEQDQxYVERArMujpoOCEAQMPDsIT6VtTZsBSMQTFkbIikGO50Qi2OONEwiACPk4Iq p/o7aAaVKCkgiQDCYzIAOV93triCSIakSzw+wK7Fg8RsJEnUSRoRZEhBYaObRMGxzZWHHOCSqKBZIoK tQAKpBCChBON74ufTLlBGPskEC6IQKnCngmOEc0lBQIAOw== pack [frame .0] -side bottom -fill x pack (button .0.more -image praised -bd 0 -highlightthickness 0 -command addCriteria 129 WO 2006/088962 PCT/US2006/005343 pack [label .0.version -text $versionDate -anchor s -font {Helvetica 9)] -fill x -side bottom pack [label .0.text -text "More criteria..."] -side left proc triggerSearch {) { global timedSearchToken if ([info exists timedSearchToken]) {after cancel $timedSearchToken) set timedSearchToken [after 5000 timedSearch] proc triggerLogChanged (k) global timedLogToken if {[info exists timedLogToken($k)]} (after cancel $timedLogToken($k)} set timedLogToken($k) [after 5000 "logToken $k"] proc timedSearch {} global timedSearchToken timedLogToken query db stop if ([string equal [.button.search cget -state) disabled]} set stop 1 after 500 timedSearch else { foreach k [array names timedLogToken] (logToken $k) unset -nocomplain timedSearchToken .button.stop configure -state normal .button.search configure -state disabled GenomalEnquirer $db $query [collectCritera] .button.stop configure -state disabled .button.search configure -state normal } proc logToken (k) global timedLogToken criteria if ([info exists timedLogToken($k)]} after cancel $timedLogToken($k) unset timedLogToken($k) log "Changed criteria $k words to '$criteria($k.words)" array set criteria (count 0} proc addCriteria {} global criteria incr criteria(count) set k $criteria(count) log "Adding criteria $k" frame .$k pack [button .$k.more -image xraised -bd 0 -highlightthickness 0 -command log (Deleting criteria $k) destroy .$k array unset criteria $k.* "] -side left pack [menubutton .$k.clude -textvariable criteria($k.cludelabel) -menu .$k.clude.m] side left menu .$k.clude.m -tearoff 0 .$k.clude.m add command -label include -command set criteria($k.cludelabel) include set criteria($k.clude) {!} log {Change criteria $k cludelabel to 'include'} triggerSearch .$k.clude.m add command -label exclude -command set criteria($k.cludelabel) exclude set criteria($k.clude) {} log {Change criteria $k cludelabel to 'exclude'} triggerSearch .$k.clude.m add command -label "must have" -command set criteria($k.cludelabel) {must have) set criteria($k.clude) = log (Change criteria $k cludelabel to 'must have'} triggerSearch st set criteria($k.cludelabel) exclude set criteria($k.clude) {} pack [entry .$k.words -textvariable criteria($k.words)] -side left -fill x -expand 1 130 WO 2006/088962 PCT/US2006/005343 bs s " iggerSearch; triggerLogChanged $k" set criteria($k.words) pack [menubutton .$k.extent -textvariable criteria($k.extentlabel) -menu .$k.extent.m] -side left menu .$k.extent.m -tearoff 0 .$k.extent.m add command -label "in all lines" -command set criteria($k.extentlabel) (in all lines) set criteria($k.extent) * log (Change criteria $k extentlabel to *') triggerSearch set criteria($k.extentlabel) (in all lines) set criteria($k.extent) * .$k.extent.m add command -label "in definition" -command set criteria($k.extentlabel) (in definition) set criteria($k.extent) %definition log (Change criteria $k extentlabel to '%definition') triggerSearch it .$k.extent.m add command -label "in source" -command set criteria($k.extentlabel) (in source) set criteria($k.extent) %source log (Change criteria $k extentlabel to '%source') triggerSearch .$k.extent.m add command -label "in strain" -command set criteria($k.extentlabel) (in strain) set criteria($k.extent) %strain log {Change criteria $k extentlabel to '%strain'} triggerSearch .$k.extent.m add command -label "in serotype" -command set criteria($k.extentlabel) (in serotype) set criteria($k.extent) %serotype log {Change criteria $k extentlabel to '%serotype') triggerSearch .$k.extent.m add command -label "in publications" -command set criteria($k.extentlabel) (in publications) set criteria($k.extent) *.pub.* log (Change criteria $k extentlabel to '*,pub.*') triggerSearch .$k.extent.m add command -label "in annotations" -command set criteria($k.extentlabel) (in annotations) set criteria($k.extent) *.annot.* log (Change criteria $k extentlabel to '*.annot.*') triggerSearch .$k.extent.m add command -label "in keywords" -command set criteria($k.extentlabel) (in keywords) set criteria($k.extent) *.keywords.* log (Change criteria $k extentlabel to '*.keywords.*') triggerSearch pack forget .0 pack .$k .0 -side top -fill x focus ,$k.words proc collectCritera {{ global criteria set sort {} set collected () foreach w [winfo children .] if ([string match (.[123456789]*) $w]j { lappend sort [lindex [split $w . 1] foreach i sortt -unique -integer $sort] lappend collected $criteria($i.clude) $criteria($i.extent) $criteria($i.words) return $collected 131 WO 2006/088962 PCT/US2006/005343 # ACTIVITY LOG ##################################################################### # Log user activity and http responses. This is intended to spy # on the program user, (hence the menu to show this is called Spy), # but nonmaliciously. The intent is to retain information to help # naive users to use the program. Nothing is transmitted from the # user's machine without her knowledge. proc activitylog {} { global logFile if ([winfo exists .log]} set rc [catch { destroy .log destroy .logmenu toplevel .log -menu .logmenu wm title .log {Activity Log) menu .logmenu menu .logmenu.file -tearoff 0 .logmenu add cascade -label File -menu .logmenu.file .logmenu.file add command -label Quit -command {exit 0) -accel Command-Q bind .log <Command-q> (exit 0) menu .logmenu.edit -tearoff 0 .logmenu add cascade -label Edit -menu .logmenu.edit .logmenu.edit add command -label Cut -command (event generate %W <<Cut>>) accel Command-X .logmenu.edit add command -label Copy -command (event generate %W <<Copy>>) -accel Command-C .logmenu.edit add command -label Paste -command (event generate %W <<Paste>>) -accel Command-V .logmenu.edit add command -label Clear -command (event generate %W <<Delete>>) menu .logmenu.help -tearoff 0 .logmenu add cascade -menu .logmenu.help .logmenu.help add command -label "Genome Explorer Help" -command exec open $Contents/Resources/Documents/Instructions.rtfd .logmenu.help add command -label Manifest.rtf -command exec open $Contents/Resources/Documents/Manifest.rtf wm geometry .log 500x400 wm resizable .log 1 1 bind .log <Command-k> {console show) bind .log <Command-l> (console hide) place [ frame .log.t -x 0 -relwidth 1.0 -relheight 1 -height -35 pack [ scrollbar .log.t.y -orient vertical -command ".log.t.t yview" -side right -fill y pack [ text .log.t.t -yscrollcommand ".log.t.y set" -side top -fill both -expand 1 catch { set ch [open $logFile r) .log.t.t insert 1.0 [read $ch] catch close $ch place button .log.clear -text "Remove Log" -command if ([info exists log]} close $log unset log file delete $logFile destroy .log -x 25 -width 150 -rely 1.0 -y -30 -height 25 place 132 WO 2006/088962 PCT/US2006/005343 " ot "Email Log" -command emailOmyxLog ] -relx 1.0 -x -175 -width 150 -rely 1.0 -y -30 -height 25 } rs] if {$rc) {log $rs) wm deiconify .log raise .log proc emailOmyxLog {} global pr mail -from support@omyx.com -to wyrmwif@rawbw.com -host smtp.rawbw.com \ "Subject: Transmittal of Genome-Explorer Log\n\n[.log.t.t get 1.0 end]" log "Start Genome-Explorer $versionDate" # Send a mail message with SMTP. # mail [-fromi-sender user@host] [-tol-receiver user@host] # [-host emailMailer] [-port smtpport] # [-ml--] message # The sender, receiver, and message must be specified. The # default smtp host is the sender's host, and the default port # is the smtp port. # The message includes any headers and the body. No headers, # even From: and To:, are set by this proc. proc mail {args} set host {} set message {} set port 25 set receiver {} set sender {} while lengthgh $args]>o) set p [lindex $args 0]; set args rangee $args 1 end] switch -glob -- $p -[FfSs]* (set sender [lindex $args 0]; set args rangee $args 1 end]) -[TtRr]* (lappend receiver [lindex $args 0]; set args rangee $args 1 end] - [Hh]* (set host [lindex $args 0]; set args rangee $args 1 end]} -[Pp]* {set port [lindex $args 0]; set args rangee $args 1 end]} - [Mm]* {set message [lindex $args 0]; set args rangee $args 1 end]} -* {error "unknown option mail option: $p") if {[llength $args]!=1) error "too many parameters: only the one message is expected." } set message [lindex $args 0]; set args rangee $args 1 end] default if {[llength $args]!=o} error "too many parameters: only the one message is expected." set message $p } set receiver [join $receiver ,] set R (; set r ""; set s 0 foreach c [split $receiver ""] switch -glob $s:$c 0:< { set r ""; set s m 0:,{ set r [string trim $r] if ([string length $r]) (lappend R $r) 0:* append r $c 0:( 133 WO 2006/088962 PCT/US2006/005343 } [1-9]*:( { incr s } [1-9]*:) { decr s [1-9]* { set r [string trim $r] if {[string length $r]} {lappend R $r) set s e append r $c e:, set s 0 ) set r [string trim $r] if ([string length $r]} (lappend R $r) if {[string equal $host ""] && [regexp {@({^>]+) (>?)) $sender all host1} set host host1 set channel [socket $host $port] fconfigure $channel -translation (crlf crlf) smtpCommandResponse $channel -none puts $channel "HELO $host" while 1 { flush $channel set response [smtpCommandResponse $channel -none] if ([string compare [string range $response 0 2] 250] = ) (break) smtpCommandResponse $channel "MAIL FROM:<$sender>" foreach r $R { smtpCommandResponse $channel "RCPT TO:<$r>" smtpCommandResponse $channel "DATA" puts $channel "Date: [clock format {clock seconds] -format {%e %b %y %H:%?M %Z)]" puts $channel "From: $sender" puts $channel "To: $receiver" foreach line [split $message \n] if ([string match .* $line]) {set line .$line) puts $channel $line } smtpCommandResponse $channel "." smtpCommandResponse $channel "QUIT" close $channel return proc smtpCommandResponse (channel command) if (![string equal $command -none]) { puts $channel $command; flush $channel } set line {} set rc [gets $channel line] if {$rc < o) { close $channel error "Connection broken." } switch -glob -- $line 1* - 2* - 3* return $line 4* - 5* { close $channel; error $line 134 WO 2006/088962 PCT/US2006/005343 close $channel; error "SMTP: unrecognised response: $line" proc bgerror args set ei $::errorlnfo set message [join $args "1 puts $message\n$ei status $message 0 1 log $ei .button.stop configure -state disabled .button.search configure -state normal # FINAL LOOK AND FEEL #################################################################f### catch {console hide} bind . <Control-t> set db NCBI set query {influenza H5N1 20031 .button.stop configure -state normal .button.search configure -state disabled GenomalEnquirer $db $query EcollectCritera] .button.stop configure -state disabled .button.search configure -state normal proc tkAboutDialog {} set rc [catch destroy .about catch (image delete aboutImage} toplevel .about wm geometry .about +50+50 wm overrideredirect .about 1 pack [canvas .about.image -width 576 -height 160 -highlightthickness 0] image create photo aboutImage -data { R01GODlhQAKgAOYAAP////f///f3//f39+/39+/v9+/v7+fv9+fv7+fe79/f397n79bn79be79be58/P zB7e587W58bW57+/v73W3r303rXO3rXG3rXGlq+vr63G3q3Glq29lq29zqW9lqW9zp+fn5yl2pSlzpSt zpStxo+Pj4ytxoylxoSlxoScvX9/f3ucvXOcvXOUvXOUtXBwcGuUtWuMtWOMtWOErW~gYFqErVKErVJ7 rVJ7pVBQUEp7pUpzpUJzpUJznEBAQDlrpTlrnDFrnDFjnDFjlDAwMCljnCljlClalCFalCFajCFSlCFS jCAgIBhaj]hSjBBSjBBKjBBKhBAQEAhKjAhKhACUzgAA AA aCwAAAAAQAKgAAAH/4BVgoOEhYaHiImK i4yNjotQjgIAlJWWl5iZmpucnZ6foKGioGSlpqeoqaqrrK2upZGxsrOOtbaKk6+6u7y9vr/AwcLDxJy3 x8jJyrO5xc7PONHS09TVpMvY2drKzdbe3+Dh4uPe2+bn6Izd5Ozt7u/w8QDp9PXo6/L5+vv8/af2AAMm w+evoMGDCNsJXMgwFsGEECNKnNjrkZCLGDNq3MixoePIEOKHEnyYqHFFOq

X

MkyO6MhQWDKjElzps2a OG/qzMmT586fPoMCHVrzZMujSJNOfBTk4hCMT51CnSqlqpCoV61mnTrESNchQ4D00DGjxgwdW9NiXauV rVW3av/bCjGqtK7du/AswrwqtC/RmUCAdE2CBAiOFBokRGiw4EC3Ckf8Sv4 7OSddvJgza4 7GlG/ct3JB VwXbtQYJCANAVTAi+rNruLBDp728ubbt265eNnla+a+QmECMOEFSIOQElJtW86bMvLdQ2rijS5/eqfPy 17LZAj 8iAOIqyL+fxm49Hjv5ueqoql+/Xu/u33yb64SCYwOBVZBjwpfvnP8Q60wFKCBelronU3ah~eBd K+Dpd+B5EJonIYADVmjhSrplRVN4/SUBQwW6KMchh/715xeFF6ao4kHWhZeVeAg6hQNyqDQII4wRlgej XCiu6OOP8binoWeTAZHECr6sNuT/XiaWOFSPQEYpJTgFvuigeUE44cEv+Vl5Y4wS7mgVlFOWaeYzGS7X 1IY+hYeEA6kIgEAEFWwQAgotlKDDDkIMpx+bTDrZpGXpnWnooWg68t6VX2r3FJKlVICCDUDoxxppLwKh YaMvgilmmOgtQiOipJaqipC8ERnohuKFIoAEJMggRBRPIIGVVLvVtN9+a4Z1RKV7CeofmaYWa2woVYrH aKd7JcHAJwgIdlVUlo0aq4sHZhtEDSY4Ue2n4GpF7LHklntJmiMC+qdTN3jiQAlowNfriFs5aO+qLgqB BA6wdqXpoMMWau7ABGPS4rI4kigECZ2QcISaMDKqK77Mxqff/xEkCDCDEOZoGm6OU4lbsC8GlECDDz7k UIIBCCmgwsswx~zAKiCnCKqvJKo3OUxXbDJAjogsWuqUeWKa8UzOassEjocQMAJRlwK8GQij7wLCFJY ofXWUoBw0ARbh731BKv4oPXNinqmbMI2XUWBJgIccamqFgc7sbpSrT3vEEgwfMAKUUgL8uCh4mLlMyBw jXLYGRgEtthhk62K2Vag3Uiw~bZd9FMcaLKC1Nji5NR7Snu57ol/HlFDAAAMoIO/wj4n8OHCPKB15UY QIPWTDiutQq9UG4SI4t+obrCQ6CQSQBhYRsfV71ma3G+9tJLOxE7gAiABDNAoSmoIFdN+/8qulvR+CVM WMHEzJSUjPIL7FPystcloOz1JRO8gLIK8QNQM/AgyIEPVuY/AaqAZZYIoA9oILlMPA54mijByx7QPpix bH7uy8H9KCG83JVAgDTYoPxUAIIH5CAHFASAAvQ3QASq8GUTyMACXVgQnEVvYkNKAj4WOLFqhS5pel1S U5hlLePJBAouqEQFePCwgelEfONLhdaIkAkFOPABWQvbBqdIhLC9wBL102IlKBdGKwgwbFRsXxfDRgNN PHATJZgiJXKgtTYCgltsHOPZKIFFsRHBhVPMYuMSFzYppPBxlLMCDf2RLM/UK1BXMUISLyEAukpX2q7 20OwmTTRhcf/CO2iRAAqkAROekpcs4tiL2xnhS92wgBZ6loGliiSrREhA4RUJCVeoLUcxHCN560cLLNo ERVMAJgc/NOE6FhMB/IOZdDOgSsBQDkVZIB3CNwaEzIwS60Fc48KiCUIJqACOVICgp43C2xZgUpsOxx 2kwIutqEurOgYQOYWMERXCQxbTlyiGxpW94yoiwhGoEICwIAAmSgN2FBUZWleOMYo+kDChLyfLCOgh23 135 WO 2006/088962 PCT/US2006/005343 hotSw 4 .54.iD3NzE .r~s tjG iD3NZE+1qa~G2Rwalqgg~eru QsrXC/rQGP bOVD4008Bk+yNdAgB3ML3qSyBOlZwitq/lLamupmuqTV7U+bzloTOlejAtxgCd8DXlweCtFRSPScYkOq liZAl2OqD64 OBcBQyUZIEZZvZh3U6lwp8UZClqCufSXE4hWayqVa4ElzEmJLkrBp5RIHOxVKEe4 /j SzViBCXR/XRkRKBGc6QdpugOCED~xiAUMjYsSOFsSAAp~nt~onEJBgg]kuEwCtOZGtbQ2GkFoCmumz AtkSCTm8CnWwe7VH~dA~bXQC8cagyfSphf+cJymO2p3klsU87aNzLTWoGmsfUssETGvnmmj tlq6ZV84asE GyDAEgUAQmQyN1+1fXUoX82 ZfpbAgkskgAesIRxWhltcTIKyxf/IbbP8 lY2Yz 8D5XuYLF8IWXu8 cMS±66 daQw/xTLXU~wr~rLmDW8HyXveS6Q3vTllgoiZWMKIzKirSIPeU5 zQAkSMwAacQhO /a8 sVqlxJxOQGQggw tJQOHprLBriDkHy8RVFSaIX4TiF9 6o/u4aQJgj ZiNslbfuINPgCIriW3MagyQ3 JtWgqOVyFoaV~wlF64U gjDulkwNMAG11ti9jbgXQOWrM3 ztNghkpQQKkvAvfF2pn+pq23w3 SDyYHOE+lwDyKaXCYCh7YolMOCws k4vhx/UJAHFO qYXFLLnOGZISbHQldQdrXe~mlAksY2UaO +wDERUTEuV7QaqnnFON~iLVEOwgxe3 /aADL sg/GQSUsATjZuPYixCKxlR7bcsw3HBygEhuwlUCFTLTbGq~iRa5 tU4 4WAkWC4L9Ue7KnWdFsmaaOEsxk 7O2y2WJ'WbzekayT2rMdc4lRLwQdZlOS/7Q2 8leJavK68nfrW6M4wi~exWdxiv9uXXCKskabWZpGiCoo3 nvB3L3KDQQIoOQAbMLrkmNtkpCcN3 50YIQaZ+olAI9T'peXPCACogdTtpAGErhxSnW6YlAB7AXBq4coAf rj E7eXc+Etu7mBnVcNbJFtKpEwGngdUrMbeZ4ksYool+VroNHaG5HI+7UlAZohN~kBoAkxCU3 C~dkNGd N3NfJAkiyMQCnCAY/SrLTlQ+9 OU~s 8wJukKYuHRtxQMmgoZdGJfPld/E5B//c7ou~ivMWQuhTQ4onYlo bVgEAAfkvnh+os/RnUwVPXul3wJoYgmcok/PE58iGE8J223fdqNldOlLA2HJ'BJjB3KTHq9x7KS3xFcsu U58JJlz+L7vnvYV8L6XQ72zmsM9kWDVVSnyOALi4rd7,JdVXoG7olCUvehLidDsfsa39A31/SS3DL97OX D/oR8xRHACfToil+t22vNxXLYgQLwAkF4Aowd3 iGc3 8UKB3eF3uiE2mbIi8zsTdBEAUlQAEsICOeuCpA oTBgtQQ~kAKeYAHz53 ZrJW8VOIME4gigEVDoxv8qcLdzoXMJ/AG9dJ/cqEsghEcRADMAnoA\zpgd+ RSGDNBiFSXGBQXFylxQEJudPTgEESgAs 1XNflF4MnEESPArMxACDQgKGNBE SbaDCwaFt~hiHGMJT2Rf Z2 OLIuQoiH2klt8HFJWxEYBo3ETkAYHcMtFeAYo0AAlgeGlocoiCeHkIgUV~qiEjghWO 5Mrlhh7JFKEuxE1 TxEYPIAflM5ACHAcqABbE2bbht 9heirsgowJd3 23 YERLSBfrlpgcE3 ThAFhFcDJ7ABErAACPAAddcI( DDBbDdWIMtGKr~ iMVDLyymh4l1 ExmOMgHbh4QgAEPNfOBAfJCAfmPYLAkD/AoTnTzuTdyYBh8 64 jvqw f/T3GWDVT7YoGE5AeCtQAQ~wAAkwKq/gAgj 2T8kIezHBj OxYkNDfQGU74ExfZEOBDrQAj 4TDRAgA1Mw N+znSDj 4JQRpkBW5DHqxc3HhKFwlfDNwB~QAxeQhtGwAj zwFfyHfYzuRORvZkTTpC8lShdYyLyPyFDDA AeHoDATgACKwMfvlQlulObdlh4WTCPxYko4 SDf tnNCEJd7eoHOYCBCTwbdBwABxQAOFwBGx4IOf4 fAGJ NzPS1Gi5Cq8XaRkoe~q4M0YQf8UgAAVAASOgBxSRjTzj SPGlgaa~c/+hjmkSmMVgER44bo821vrvAx9Q /4zAsAAWEAlnIAM9gARLOETadmTnuHcn+BRnSZigKQofWYnUw3xgAUrAAIAowAJoASeKC9DjfI o SYM+Ra7 8ZmhuZvVwXYIQsaqEgRAAAWB9woLAAEX8A~o4AIloF+DyBpDdluZeYcZqVqptRe~yzvaaTDQ SlkMGRVIOAGmEABoiQANsAGToidO8ASX6Rv7OZt72jt5dTJXr7kZ3bmz/ZYINulSyXivQFjgoSEAxw cAM9YIRJs2ARICBOAGJ3 lRGiyOIQRep3p+Ij GCQB8pnDw3EHkNRoUyj5BCBJArcgwRJAJOYQWj 1xIHWGYEx GY~lQylC~aaHbqTtcAOGW9/9~ZmVBONloJCnBqZrc7i+NomNCjvaRUmtVAJZAIPsA+NCoFXl ZFzJ'QDRqo+ SKqkEAYCPiBCUEHsb'3 SRABnkAAv4IJErOExlRZ3 1GM6JfLIDYGGmS 1adnflmSe4mfFGg2MlY+O+oKKcMMJ OooJWYMJzfYCCGQALOBslaBn~dNT4XRe7TQzlyQFXZRGu8MEWZOnl4CoZqRCStqoAPCokWoJfWSpj JQ2 fc~fQoAflnTAAsqKKk8RoG+hVCRMEF6BfGNiiIFp6Q3YVdKps 1ORsvVoJDzBBwNpn5EQ2mFcJkZdOCpAB K8Oh/mNMyipYxkRC/ 8Z4 1UADX5SkuPMCdsRHd1VclRP/Ra/ES5JFR43TRRRER8u~cZhgO7ODrqj mUuQG bebzZVaAU2 skqlv3wkdhhRHKZCQ3wleX2epuZSQTbNklQz4RVgAotWZFMZp8NnSTE~gdu5oylUP2VkR2aT SCBQXJeldKElWGsErOBlS8 lUTmvkUmGzo4pkO5PnTsVlCQpAQcXV03TkVLn~p4IlbcemdlVEs3 dls7XW tUzpbTtLGS2aj r/xwk7Za3E9SHCRfnvdyiR9qBCqikLQAWBQAUvQMfBmqiaPdgpME9pNhrawqgAF3U OBtrAI9ashQUR14zVC8APGfj rmlbqYK~a7ZzWY9jrdYlUioQXtSkobsDPGZbayKl/Od92rOEuwmHalI4 u7g/ladg8zI+u6WXk6s3CBVFEJGY4AnxEo~mBKcJKJZLcgT4hAu97ATWj FSObFMqZ+JiwnmSgnXlFNm Cz~dFEMulUUtXKwViuxEvA7h7GGEbiE6wMKtztzVmotxrhqlz/GlgnQmOyTmlMRdkBMQEWe202 mImT4VqZoGnTCG+aKJ+aNCQ7MkUMClhe6vxGzk6TzSgkKwARtVLtglKm7hrsD7CG75nO 7h9 irw/ pbznZAAuowEwiGPApl~ilLj /y73Seqn3ygm7l6SSq7gYrKPlxFia9aSj2gi22KBYMQozcACRUYOYuYpH Insa2P8VDEgJJiI4E+yQ4EJuD6ia7iNCuNQk2UsA+dNRGuyNtidO/xuwaFMRLu3M+Blw iXNqSZvBGMzA QPVFWLOt3 dqzPrqz8Yq4G3yzDoAEJdW7cCS83ESZZcUTG6apc2DVTgMYluXqCNYEEKlkaB+C1MYeTDy 6 J~uj DQ2 QEygMABfABHGOw8yuPFUqxRtSrj YzXRAwC7ERBZiMFGcBLJQVSUvCxWMy9GUUflOaC3 CgxXGfBU gTqohUpDLRU2 ICsFKpAlcRtavPSxiUpHJcXAr7w1sTzLGxHRHDLlxC+Ire5pOu6QZQJbFoBgencaGZ g8yHOj WE4dgCAfi6rmcTu3p/ILD/RihUCVOnyclUPq9wwdcbynvoAomDrSerZntkZYRLoOGGSKkGy/GK YnNOO5LDzryTQgx8 z7 6czlbAs 8wkBdplzJprx~kYYGwyLTKACSGQigebkJj YUFSpJlcCppSwAPo.Lf~b RLfVzZHorJKbThqceUNaAilkvToweZ9HWBraCdEKqn2GPGoMypsQo8MaO 6dlg9cRuznWFEkQAusAhKdz gEO4KAYYYD4knE7wrybQA0YgnUJYmOB8CEJ8oXqkXSAwqZ/gyS/wAlnDeePAZlqqHoYsjbaJggbFA3 sM ACyAV10li ZT2xEHTd/4lHj dnCQIQAl~gnALZdvRCouuoP5XH/8kqnQndpD5VBw9mPRlRmVXoeItUQAJK qGi/ctf92XcBI(GioG9RJYwQ64JgLgAP7917IEx+GzdVpudYEmzk/BB9GoCCWcALecoV4 7JZxJ3 rZPJZP UduY4AK/UokCVXqt7dpP+ZFJPXzBCSOslTR83AOErcxbAat4w9YlBwoyIAJCg4DAKR7jrdwlCdtfCEkx 137UcgQuoIQBsAOFr4Z8sarXa5l974xl~jAkpUNW7DaHj LQ+OR94WMtmrjYyrjUleIQGVsALgvXc~qNu7 Qt9 9WJKHfAkHKq3R+9 /EoNCfkAHE5~Ar4+rQsKPcf qk9avcMGGoMXZ9CHOWFyhtditwOHii //Clng8OPR9 J+gEN9Bb2xMcXkifElHPebKk3 /AcTfYOJGoCZbY3c9lsQMvS zWUMDQZfOvzDi6oMytOQLKP4KKn~e0YTJ GlMZkMElXONqlb4KT77Qi/CqOLSTzyMaYXGaEcA6NfAwq+hfrkpylsiEfIMBLRw.ptsoGj 4MOkpO9+Os ?4GOJKNSSlZABwqpCWeYyXlMyI7Z0N32suEtCCGTINurSmV4C/VMzWQa4j g7qoPoy7PO/LkNA7foyVRd5 NRM/q0 6y5pUTJtasA~oRTsF7pn45lffZrBlAzj 93Th~plNdOxHsQ/iY6x6Dpb7pxWToWHepSkxzVeP faiCDVgBVAf/5OLX2/f9oNoSGJvQsFQrnOsZxC2BsvTRuhKCVnjynUo7GUOtG1NaV804AlvTTFTOoT vI~nauGdXr/W2NfCTR1OEPAWvNfdj dPUKNpRI(WeKcR+CI'rPTR/cuwhCRZMTNIWUhaPoSribxDnVpRYJK amnGsvfob70tHTK9W8FKQPmSD8e2UQilbxCK7NSZ8woxANGfqepONgNPj BDMQAWbVQ7C7KRdJ5LS4OReB BAmFCXb+mxne5P5QTkW8qA+QOJTJ1o1 8Ma2 zEMvsrUnEkuOozs9xFR2S8vVUcr2 1k5WJzaIDMZ5 +M'9uuz qMAz~xBcxO7KZ3flOA/AZht8XO7ESnAl1/XTEfj 2uVhQ1MCfK+CPzModsVGX/cl/wIPkMTOqq~uXtDo 7MYhGodSuljk00WgT0WiLlhv/+99u3aBxvSSVpZiGVZ8YZ12EASkNr9dR2jEasO8fI~deWEACji5Rg a78ymhLlVLxYP7K8NOzGO7iUpQAIG~nnEW3GJOJGrtFMoxiLO8QswDkuvwoBTzjT65vTFjbsfhh njubnLyV20xlhqFdHk2yply2IloTAAgoViUAVlYACpWIgBSVgaIL40PVg±IjhMTJVY6±j4+VkQ+PkRW Ug8PoZ61DGAZiKUTqUSrTWEwAGomLiaewcLDXxGx8j JAFXMzf/OzOJ7BOUNCINRCldPR2djXldLe lt/dQUNJOxQ94tj 14N9D~tzZ8PDT5fNATivIDU5A~vfECZz3rCAzAcosKz8JgiRrOYCVLAR1qwByTI IKJTYyQAoKOys+Ij 4 xBJULkmiATw0RCwh8jglkplERiihzJh2owoOspcksxxCZMEOqZNnSYpDdsZDKVK 14YYHfW1E6knqlaAfhKl~aFOOn~snoETkdCVZXGoDyWTasKlbhQbj zpPWj tG~gHXrj rNrj y+Qakda8AB3 b5w7u4j jrQvyNwkQGRWShThC13C9dvawxX2G8K3nzwOVOUAkiFAGK4JeDB3MOGhYurwbArjULTC+z/6ZGC Xu2E6UttbwAKgXkJQgCEl5nGQ~nbCU4atlvFIANmstTKiYaj dL25tUhj 8KRXWYcNTgvAAOhKSS4HUDw9 136 WO 2006/088962 PCT/US2006/005343 ga81htoCCa4zGbPODMPNtzcVY881LgTOIMQOlgOQJk96E43 GeY1xD9OHBFCQgcAcdiDAWEOBIPNdKbg jDMqMshxVsxXUSXDPEWDCqAwAcAmRIi1ooEk3 fJVKSOdVuQt Bv7GGxOj+ODIaIoQkYEiTBhgwJU50BKlllx6KYUUIGwiSHT1mWdKBkSVxJtEIhkpHiVVjgICdQCUUsJp UkAylZhbgiRoUpLQ/zBBSNVtJcoohuwJyldKAqCSFC8QgUlQlCKnkpaskeQejaQmBKMzmImol4WJiVPY XX3FIyGr7+CF2D+ALZHEEDe4MMICC8FgRK2ZpXoZNadWIWOpzLYlWkjNeSKJkMSA4IghTFhiALQOwIck TA84IoUiQBElEiQ98QaVFDSgZy4RvDwQEg2SQPQuLxlce4qlZrUJwC2h3PLKoLqUsghblGDFyFjyGoKm nKLeC7EuLalGH1ZMQOSRSzmgZ+24Y21sSMfqhoJubwg3q7IlnyTIDYjgZakMYiJm5eGHMlxRmGVOjjtjN EEcgkcQRI17ZgQQHLKjRCEuOw6LRAySa98tTKTP8wXySTGJMJLzdrVBs6AUTG9duje2JAiXwUq8nZguT CUOoKGQlaA9MEPYxbRsjndcK7eOTCPPFcvbcYstC9eHItFyFqqlqOziHH6pa6+PVDIEEEkHoUAMMKYzg QQQIBIDgCY6Rw5exPMcTNeKsJzTBC921fnh+L8Ceo+y45+7JRkLQq3uwBej+DoUslPhhDlX+DKtIWpD M2YqYlDAAFIn6AIS7xTf4lwvnlp98IgDuif4zGZgMBOvkK++yipA6wPZ6wOveOqtSsi9/acrL+L9DmYD RAHNcgAPjvAP1OUvddpYXfwWyMAGOvCBEESc4rgBMxDBCnkyY9HNlue8bLT/gzEIKNUFbDAsbzjtZU+r hgIjyMIWuvCFMGTg/PbCOMftDy/Ok9xh9ue/A8yoABgwAhJsVo9j5aVYElphDJfIxCY68YkMGZ7MKjir 5x3PVhhaXgUzAwQfIugANQDCESwTM+alkB5KhKIal8jGNjpwgvi7oUBWFQ6ebUOHGnIQNYAQwrcQAAIm OEEU9DfH4+3sgAnOnhsXuUjfBQMsOzGGvx5pFrGpoFExTJkTJ2jID3lQeRiOFRZN2CJPwgMIXXSLBGrA AGANG2ZG90AJz5hGRtqSiV8yRKMgWclivB9WuslIm6BSRhqsolwNGTjRKmhHCbviDOjoYP4qJADQCAE /zVwAhS8MasizuV+sYqlEGp5y3K6kHfPQQQvtYKIsT1ABQqoG3omoAIQmOonClBBCSDxAD9pqwT65Jsu ZPFO7wCUawYAaEaCsSV7bsRqGShBJtDjNwAONGz530cxmSjFKkbois/jXvaIx79PciOVxQhAAiTwARfs oBpGIOI2YHbBDgOEhWhUpDl3+kJXnGmeK4mkOkHiiAyMBUehCCpRDUGLluDnWqaAnOp2lIEJQJUQCoDq v-v7lki7lwmDtYU+OAGYo8lzLYG5MpuMmZONloiGDkpMlrMqBUgEQ4AAWOMEMgoAEJzRBCa+MhzPZqhdl EguP49QpTxf7wOAIKazrdP/byEYzFkxsbCRPOc9w+jmIdzGnBOOSLCdGM5xt5UgOQwrtcTqxCSwZ4gXw VIUBTGEf5LSWq6qJUxs5mcFQgnJmtHKRPDS4DnGggQrmAEPhIiEI5SQMWVVo7H2YjwLNe2mLCInY7cL PkmoxklDxawwLxWWTyHiNvwCyliOmhUVwI9f+plFJkgDIx+4d5iDyARtFdGJsyjgOKqShUTJawhIzPZO TlQrM+PozVi+KjEzpV8Q9CKEV35TjOiOI4ekSd36hVNliuWuiMEHLahoJLKUlEpZgILi9XJFBVCmE/M BhSoGCjGQcGKqGpbgltopCUucc9Gl9hRkH7UptEVKUn/hUtKCLGKjsbDKRkxXFPMaI9C5dDuiLd8OEBV KSSqQfGMVYyLAPVCvOoVCsMy4lnxTIc8M+Enm2OkVt3wIllSAAVQcqMLXsTuO2lt2RRLisW3TlfCx+rL hztYGANCjpAXejKE96dlLlu6WQI+G23FHN6wBMUHtkPzWXIBChUUqkhqkqxW/AQoA2RJEyCxaCgmAKUd A4Ao43MSrXOhCB/YKdDJMmkOO9nhM25QgOfGmUhn2kHiYZh4mPktzWbZPRh979LYptGB7laKqiqVnen1 9GU3IWo+nQYYgnCYVCvZMFOML9lRxVpWTgaRjdBWWi7xgYENRufdCprBUD7dzpwJ/OlieRi4OvOk4xDI wwYv3IjICnG2J+7CS55XmMoqHSKeQx5BmPjwsjblwz3SIGuschlnqJgmazk4W7YsDijlYWu4ewzVlfh 04YZhypN8Z7Hbl7ti5bPIajgQtpvcophOKSXuXQrJ7qTEFc6/Sxo9IOrUOJDz/r6HgCwoGr9jYImtm+R zHKWx4wcsySuwZO8velFO8ISV3Zirf3lutv97iz7tlsR4zxETnfRCZ/wpMmIdkjHKsN932H9WVtBlOb 75CPvIhR7vQov73tIyOerNLO5MIyG7uY54uRt7iYuTte8qhP/eTlHlOoD9zQEB90M+OKPAObEJriNCkR V+U4nv+r/vfA32TYe6sYlYfe2MLOvLKl++gs6vxpdhx2iFxldboH//rYTzDrF2zIm8MegdTFoQ5fP8cP Tzr2zOeh532f/fa7X32Ux+Bgjl9KV1ORuylvfs3jftMHF99OHyUN7Pd+BFiArFNOJXV+UodEd9R9cwVh h3RID3d7fLd74dQNA2iAGriBpMJb9MciliAQITgPI9giJQiCJIiCH3iCK9h2jbcZj8eBMjiDCKI4PNAD PHCDOoiDPLiDPqiDP4iDQZiDPdiDQGiER5iERLiEReiDTQiEUCiEUciEPvgDGUiDWJiFiaM4XNiFXviF YBiGYqgsWliGZhhFY5iGariGbJj/hjF4hnDYcQaTA/DzS63hET4wPogAAnmSA+lzaWOYiII4iIT4hnF4 iGJzJrVzblvxCLImJVkyCpsCiIRYiZZ4iYpjiIiIiCphMSUATJ/gCrdWar5RHadxSfBOE3tCaz7QLpbi XjngA3+oRphYi7Z4i5q4iXF4YD~wUPQxCqpBBOmiCANWHKlACMf4JUSQKL2QMY5wN9p3i9I4jW2Yi7oI h9biEjJGFvZFBLOlEvXWC1BhMUxAC5LgHQqAXoqwJ4rAcchEj fAYj2Bojdd4hgaQATTgCHrIErLYHI8A iTciBd7YFY9ABJ2wMY4wEu3YC+7IUfL4kBBpffU4kerOFRcX/46jMFtEQAsAmQvoBRwHsyebQAhHIhQL eXIRmZIqeRAU2ZKzUFWCE4p9YgWqOZFnMT5W4oiboALkwpA+iZIrGZTySI8tSYOboI2/eGtWMDDheJJZ FSildYd6FiauZpJcQYtCmZXUSJRFOYOxYTcOchyzyFhaWZa4 2JVoOSNWUh6rZ5ZuWYlcmZZyGRP7tGVv eZeDGJdzuZeUiJd+qYZ6yZeC2ZZ/WZjzOJhPlApVQgMmBzytsI8LASStQQwP8AqPiW2GmZlfGJiIGTxT RQqxFj+X2RZnwiMX4QujaWmauZqZ2JlNBI7MkQnulYsWFYuEcF4+8BXwlE9cNxp8aF97CP8CRtUuQAIR qGhfBkADsggcFnecrpgIoLYlwnCKNOkJl/QCOdBrgGNxrEgDvNCdkEBPXLecbMSa5imRrglDsClW9GQK i3AaOeBd/LKMunQpdclJtyApAikIUpADpbAnphCLpmA7OkEV/XkLqsFfACMMgmAARPA77plUTPAC4Ege MBYoysiMXAJjjgiU5/mhzsCZ6Yk7nlkKtMBf+6EfPjYR7DlV6PEA+eQLoCAoaIVtKASrqVfUkEtK8EE 1BISwiCQSskLhmAJsAmonwIChCMf6hgtJ4mVIBqlZDii6gkSwHgyWEW7QgKvKGMxESYhIUvyBeUkGm DAmbZrES/QL/HrehAqkxpkPFCLAJb/qGkCWZC08KRVIqpSJKpayzntbJFVJALYLAdY4oCV7KVXUjoyah VGXKL1HipWk6EkI6k8EAb9gCp5C6qagAAlAykoWgkFbZkES2pyDap3560IDapVjzJ3kWlfmCC+spMLXG paG6qY56plQBFCtRGkcZDJU6pLfKL1+BpHUiCWmyCD25kHkajaZqngiaqlOzqj95a45AMkqZZ7LaGw3z AlQipreKpmbajulKpxvzoElxGjLWicPKKbBZAmDKCJdFlcx6lXrrOcZrdLKQBPgUJLwXocjnLCgH8GH r/m6r7fomaqQO11SCtv4ewYLrQh7S+9UcO/Q2DpbYmrZF7Gsga8T+7EyxLGa6bEgW7LkI7Ija7IqW54o a5gku7Iwe4AtW5gvG7M2uzIzS7M3u7MulLN/WbM8G7QJ4rN+CbRCe7RVQbR4abRI27QLobR3ybROO7Vb CLVmKbVUm7XDYLVuibVa+7UAEAgAOw== .about.image create image 0 0 -image aboutImage -anchor nw .about.image create text 288 160 -anchor s -font {Helvetica 12) -text $::versionDate bind .about <Button> (destroy .about; image delete aboutImage) bind .about <Key> (destroy .about; image delete aboutImage) } rs] if {$rc} {puts "about dialog: $rs"} 137 WO 2006/088962 PCT/US2006/005343 tig-ng for running the Genome Explorer Application (1) Move the application Genome (looks like the PubMed double helix) to wherever you want it, (or run it off the CD). (2) Run the application (double click the icon). (3) It will present a dialog box with database (NCBI or LANL) and a query: [. on the top. Select the database and enter the same string in [ ..... ] as you would for the PubMed query page 'Search [ Protein ] for [ ..... J' (NCBI database), the Search for: [ ..... ] in the Influenza Sequence Database query page (LANL database). NCBI fetching ncbi.gif LANIL fetching lanl.gif (4) Hit the Search button. The application will now contact the database through your internet connection and fetch the query results. It will then download each protein page to access the amino acid sequence and description. After fetching all the pages, they will be parsed from the database form to a more convenient local form. parsing.gif ~ Finally the fetched and parsed pages are searched for the matching peptide subsequences. Passing sequences will be written to as a separate htm page for each protein to the desktop folder 'pass'; all failing proteins will be written to the single desktop htm page 'fail'. searching.gif " The other entries in the dialog box allow some of the match criteria to be adjusted. The values given are those originally requested. The parse and searching are done off-line: an internet connection is not necessary for these phases. As the proteins are checked, their identifiers will be shown at the bottom of the dialog box, and a red progress bar will move across the screen. 'total' counts the total number of proteins checked so far in this query; 'pass' and 'fail' count the number so far passing or failing the criteria. The progress bar will be progressed separately for fetch, parse, and search phases. (5) To stop an active search, hit the Stop button. The progress text will change to 'stopping' then 'stopped'. 138 WO 2006/088962 PCT/US2006/005343 (7) The pass results can be viewed by opening the desktop 'pass' folder, and then openning the htm pages in your browser. NCBI report ncbi.gif LANIL report lani.gif Each Replikin sequence in the Replikin Analysis page is linked to a history of everywhere that specific sequence is found in the current query. The entries in the Replikin History are linked back to the entries in the Replikin Analysis. sequence-history.gif " (8) The fail results can be viewed by openning the desktop 'fail.htm' in your browser. If no additional criteria, the search is a full text search of the database. completed.gif Additional criteria can be added to the search with the + icon at the bottom. Exclude criteria exclude those pages that have the given string, while include criteria includes them. Exclude and include can be intermingled to refine the search results. include exclude.gif The "must have" criteria indicates strings that must occur in the fields. must have.gif " The search will automatically restart after editting additional criteria, or the search can be restarted by hitting the Search button. If additional criteria are editted while search is in progress, the search is restarted with the new criteria. Also the search will try to use cached results: as long as the database and query: strings are left unchanged, only the additional criteria are changed, the fetch and parse phases are skipped. An internet connection is not necessary to refine a previously fetched search. A few preferences can be set with the application preferences preferences.gif " The Working directory is where all the files and cache are written. The default is the desktop. Save fetched pages is intended for debugging. It saves intermediate query pages to the working directory. Purge query cache removes all pages from the hidden cache. This happens automatically if the database or query string are changed. 139 WO 2006/088962 PCT/US2006/005343 Open results in browser will open the Replikin report in the default browser when it is completed. 140 WO 2006/088962 PCT/US2006/005343 oc rJa 14 WO 2006/088962 PCTIUS2006/005343 71~ o ~z. 142 WO 2006/088962 PCTIUS2006/005343 7,7I -4-7 C f41 _ t j ,i" fz Am 14 WO 2006/088962 PCTIUS2006/005343 l4 le .4Ma o r U)i ajH I Io PHI 144 WO 2006/088962 PCTIUS2006/005343 IF' oO ~~fl:r r ~r 145 WO 2006/088962 PCTIUS2006/005343 nCD 0r 4+ CLa a. E EE 0t r_ n z10 (V44 L)146 WO 2006/088962 PCTIUS2006/005343 Z Ul rusy \ o AM= ~ 1~1147 WO 2006/088962 PCTIUS2006/005343 Ma2a2a ,2,' 54,- k5 ' ~~IN "g g. 14, 1 ri o 148 WO 2006/088962 PCTIUS2006/005343 12: ~2 V ~~rU 1491 WO 2006/088962 PCTIUS2006/005343 Cc Q -1 w E z X E E C) 4-, wc n. Fc. C:,50 WO 2006/088962 PCTIUS2006/005343 -rrl 1511 WO 2006/088962 PCT/US2006/005343 Appendix D - "Dr. Peptide" Tcl Application with complex amino acid pattern recognizer. ##4########################################################### # DR PEPTIDE # # # #################################################4#################### # # Copyright (C) 2003 by Samuel Bogoch and Elenore Bogoch # All rights reserved. # main open-application reopen # main open-document.tcl path # main open-document .peptide path # main top mbox # main top remove # main quit # set q [query tracker path db term esm criteria] # $q stop # report path term # set arrname [manual path] # manual-okay arrname # set w [window . . .] file-menu {label command ... # geometry <geometry> # resizable (horizontal vertical) # title <title> # window <w> 4 $window quit # $window zoom w h x y # $window configure w h x y # $window configurerequest w h x y # $S(space)::status message i n # $S(space)::count seen pass fail # $S (space) : :completed # peptide new I open path # manualchecklist arrname title set Contents [file dir [file dir [lindex $tcl libPath 0]]] set env(PATH) $env(PATH):$Contents/Resources/bin lappend env(TCLLIBPATH) $Contents/Resources/lib set versionDate 2003-07-25 set openning 1 set opendocs {} set bootlog {} proc capture script set rc [catch (uplevel 1 $script} rs] if ($rc} (bgerror $rs) proc log (message) if {$::openning) lappend ::bootlog "starting: $message" else { log::tell $message log "Dr-Peptide $::versionDate." #####*#############4#### ##### ##### ###################### # WINDOW DRESSING ################################################### ####### D-1 152 WO 2006/088962 PCT/US2006/005343 # Generic window decoration: create a toplevel, set up its # window frame decorations, and initial behaviour. global unique; set unique 0 proc window {window args} { global unique; incr unique set window [string map "%U uniquea" $window] array set S [list \ window $window \ title Untitled \ menubar $window.menubar \ space [string range $window 1 end] \ file-menu {} \ array set S $args if (I[winfo exists $window]} toplevel $window wm title $window $S(title) if {[info exists S(geometry)] && ![string match lxl* $S(geometry)]) wm geometry $window $S(geometry) } if ([info exists S(resizable)]) wm resizable $window [lindex $S(resizable) 0] [lindex $S(resizable) 1] wm protocol $window WMDELETEWINDOW "catch ($S(space) :quit}" $window configure -menu [menu $S(menubar)] menu $S(menubar).file -tearoff 0 $S(menubar) add cascade -label File -menu $S(menubar).file foreach {label command accel binding) [string map [list %W $window %S $S(space)] $S(file-menu)] { if ([string equal $label -]} $S(menubar).file add separator elseif {[string length $accel]{ $S(menubar).file add command -label $label -command $command -accel $accel else { $S(menubar).file add command -label $label -command $command if ([string length $binding]) bind $window <$binding> $command $S(menubar).file add command -label Quit -command main::quit -accel Command-Q bind $window <Command-q> main::quit menu $S(menubar).edit -tearoff 0 $S(menubar) add cascade -label Edit -menu $S(menubar).edit $S(menubar).edit add command -label Cut -command {event generate %W <<Cut>>) -accel Command-X $S(menubar).edit add command -label Copy -command (event generate %W <<Copy>>) accel Command-C $S(menubar).edit add command -label Paste -command (event generate %W <<Paste>>) accel Command-V $S(menubar),edit add command -label Clear -command (event generate tW <<Delete>>) menu $S(menubar).help -tearoff 0 $S(menubar) add cascade -menu $S(menubar).help $S(menubar).help add command -label "Dr Peptide Help" -command exec open $Contents/Resources/Documents/Instructions.rtfd $S(menubar).help add command -label Manifest.rtf -command exec open $Contents/Resources/Documents/Manifest.rtf bind $window <Configure> if (\[string equal !W $window\]} { $S (space) : configure %w %h %x Wy bind $window <ConfigureRequest> if (\[string equal %W $window\]) if {\[string equal %d Zoomed\]} $S(space)::zoom %w %h '%x %y } else { $S (space): :configurerequest %w %h %x %y 153 WO 2006/088962 PCT/US2006/005343 namespace eval $S(space) variable S array set S [list [array get SJ] namespace eval $S(space) proc configure {w h x y} {} proc configurerequest {w h x y) {} proc quit {) (variable S; destroy $S(window); return 1} proc show {) (variable S; wm deiconify $S(window)) proc zoom {w h x y) {} list $S(space) $window # MAIN # # # # Operating system interface to handle open events. proc ::tk::mac::OpenApplication {} (} proc ::tk::mac::ReopenApplication () { capture (main::open-application 1) proc : :tk: :mac: :OpenDocument {args} capture { foreach document $args if {$::openning} { lappend ::opendocs $document else { main::open-document $document catch (console hide) window . \ create 0 \ menubar ,menubar \ title "Dr Peptide" \ resizeable ( 0} \ space main \ file-menu {New) (peptide new) (Command-N) (Command-n) (Open...) {peptide open-ask) (Command-0) (Command-o} (Show Log) (log::show} (} (1 pack message opensesame -text "Initialising Dr Peptide.\n\n$versionDate" -padx li -pady .5i update after idle log "Initialisation completed." .opensesame configure -text "Use the New or Open... menu commands to get a new query window. \n\n$versionDate" set openning 0 foreach message $bootlog log::tell $message main: :open-application 0 foreach document $opendocs main::open-document $document log "All windows openned." 154 WO 2006/088962 PCT/US2006/005343 namespace eval main proc queries {} set q {} foreach w [winfo children .] if {[string match .peptide* $w]} lappend q $w set q proc open-application (reopen} if (!$reopen} {prefs::open-prefs) set windows [queries] if ([llength $windows]} [string range [lindex $windows end] 1 end]::show } elseif ($reopen} peptide new proc open-document {document} log "Openning $document" if {[string match "/*" $document]} set path $document else { set path [exec osascript -e "get POSIX path of file \"$document\1T"] regsub {/$) $path ( path switch -- [file extension $path) .tcl { set upgrade [tk-messageBox -message "Install upgrade: [file tail $path]?" -type yesno -title "Install Upgrade"] if {[string equal $upgrade yes]} log "upgrade: [file tail $path]" file copy -force $path $::Contents/Resources/Scripts/[file tail $path] main quit peptide { peptide open $path default tk_messageBox -icon error -message "Not a peptide query folder: $path" -type ok -title "Error" proc quit {{ foreach window [queries] if ([catch {[[string range $window 1 end]::quit]} rs]} set rs 1 if (!$rs} {return) prefs::save-immediate log::quit exit proc bgerror args set ei $::errorlnfo set message [join $args " "] puts stderr ------------------------ \n$message\n--------\n$ei log $ei proc ::tk::mac::ShowPreferences {{ capture (prefs::show} proc tkAboutDialog {} 155 WO 2006/088962 PCT/US2006/005343 capture destroy .about catch {image delete aboutlmage} toplevel about win geometry about +50+50 win overrideredirect .about 1 pack [canvas asbout.image -width 576 -height 160 -highlightthickness 0J image create photo aboutlmage -dataI R0lGODlhQAI~gAOYAAP/////37/7+/v39/fv7+/r+vn5iff///f3 //f 39/b29VX19fHxBe/3 0+/vO+/v 7+fv9±fv7+fn5+fe79/f397n797e3tbn79be79be59bWls/Pz87e5S7W5870zsbW8bhGzr+/v73W3r30 3r29vbXO3rXG3rXG~rW~ta+vr63G3q3tlq291q29zq2traWslgWszgwlpz~fnS~enpylzpycnjezmZSI zpStzpStxpOTk4+Pj 4ytxoylxoyMj ISlxoScvX9/f3ucvXt7e3OcvXOUvXOUtXNzc3BwcGuUtWutWOM tWOErWNj Y2BgYFqErVKErVJ7rVJ7pVJSUIBQUEp7pUpzpUpX~kJzptJJznEJCQkBAQDa/PzlrpTlrnk5 OTFrnlFjnlFj DExMTAwMCljnClj lClalCkpKSFalCFajCFslCFSjCEhISAgIBhajBhSjBgYGBBSjBBK jBBKhBAQE~hKjAhKhAgQEAgIC~zAAAAgAAAAAAAAAAAAAA QAAgAAA H4BogoOEhYaHimK i4yNjo+fQj ggAlJTWW15iZinpucnZofoKGiooSlpqeoqagrrK2upZGxsrootbaKk6+Gu7ysvr/AwcLnxTy3 x8jjyro5xc7P0NHS09TVp~vY2drKzdbe3+Dh4uge2+hnGIzd5Ost7u/wsQDp9PXo6/LS~vvB/af2AA~m w+evoMc4DCNsJXMgw~sGEECNKnNj rUZiLGDNq3MixoePIEOK1EnyYqoHFFogXr~kyoyMxYGD~jElzps2a OG/qz~inT586fPoMCHVrzZMuj SJNOfATmohiMTSlCnSglapioV61anSrGTPcxYrxkqcLkCZMqW9NiXaUV rVW3av/bhjoqtK7du/AswrwqtC/RmVG8dFWTxosUlCo+dMBQAUK3EWfSv4 7OSddvLgza4 7GlG/ct3JTB VwXb9UkODg1AjTAj +rNruLBDp728ubbt265eNala+W+YmF7MwEnzh5EYH1JtW8EhMvLdQ2riJ 55/egfPy 17LZAI +shEMqyI,+fxm49Hjv5uaqoql+/Xu/u33yb65wj ZUWDVZBJwpfvnL8Y6OwFKCselrOnU3ahYeFd K+Dpd+B5EJonIYADfVmjhSrpRN4/amRxAiGKmchh/715xeFFoao4kHWhZeVeAgoJQVyqDQI14wRlqej xCiuuN4 eQAYp5B6oDDmkM0F~kgT/tUusp6NlkXqghhC+robmXiSUolaoP1BlpSCleHrnLkpWQuYmZ~xX4 ooPmgQHC7/kt+a'NMUg4o1VbcidnGKwiearaATh~aDYuj XssltaFP4aWRQSoIRNflBCCvQ8EMRTl1h IUjD6bfolVliaVl6eg44qJC1AlqqLoUgCWRdLbJLp3 ZPTVnKCD9A4YV+rjTH2ohcazvpinXfaid4iNJZ6 W6FkLvmqLqc+aamqOz5LLQBelimttkIGu2 1CTfL2JKgbihcK~h/ksEQYd~iRBIZS7VbTfvspGtYZuylq qn95Kmsbs8 6+2qg3 1X7JbbaUlIztt3 SaEmZ~aoon6 7B7qXHB/ycRCHZVBznRK+LB4YMxhM8wN~xsshr 1a+/mwlsrcGZEJwwnwcj rLCZfxJZc58sHvrep/Z661QUnmTwBBzw2TviVg42Ta6LYaQhubpd2btvfyuz innLOl8L7bcLS4uz2GPrPLPZZxdcMNpl~xSrsFjNn0MnozyRKyyzvsoxfHpdOYOCDABhxn~ppzj VFlr jRfXabvacONqh8l2swyTvS21qoLdT7j 1kqj fRTGZsEkO'VaRBr7hRyRsv3 zMtLXEaVUflQQA~m9HrIX4kr hhfjYX+Ss9mZdwo25Wj3Djm3yDusoT8Rx~eW3ijogsAZvY7bt74i733jplelMTcEQtch8f/hsB+Li+sd Lm98q2vDEDhNxkceP+TB7+x+QRka~jeO+rGgiRC2AxlonPxelol~aNvj zRmeOAAAJTKAKVQuV~kiFvujw jn3JT4BTvj ke/ygFveJaDX88aT ZE/B~4~p5y~xo~xdkDYEAwQAthu NNRXQVYtj 31+IyIHCQXCDXYPicpzlks4Jy+bDKspasBHBQjXMQG2bi9WagrFPGZCmczECJUYAnbsdrtR lauIFjyiS ZJYvJgpUTVBQzOClMKfEETJCdTVk2gy7gs ZLIEBfa4KaZgy4EOW2bc~h QPRZEGAEaj BkcRS GQXhW,7vgzdH/cXRc2 /vkN7/SlWlyqKzj SnwWLDIeMGorwIQQzuAiGTZNkGJkixVhmJ7EanuEMCwJA1sJYg MQnGRicXMnvUBKx3sce-Tov32KEI/LqKEVAHaVigot4RXBbtLDtios3nqJlKBg/8g4YAosOGH QlwLMpOZkg2S8kx9j OYp79cnPKL~nzyzpiIWaUG47cYLcIDBSpwujI-ibXVglGUuadjLoxkhDTy4BA28 crt5OnO~iOLznllSJT3 +OkomfFGUpoykRflRVwXEG7UhJWAIUIWMIBXojMiIBWw2Joxpydow8hiHCJCWCB NeXDikc/ytSDDUFxGSlnG7ii~zgUwRTJ/4AC33J5SzonysipsomcNwXA0GmDiCRJcalOZGiQSSACqj nBa LoO2P7peFAzrpMQPIGClp7JluSyYkzDKbIz3OcSWs2kVNSGVnpicEUWaWjlIhsXLWUkNApVYwbsqmnOjU TdSEEOOX(ZzxzBhxgAgIFnQxj G8vJx~qIKWP1I~zp52hTgJWEClMAAFPiqzXIBlbzlrddNzKCETGz OqxPa JGuXQW1L0 Waywulsj k7lKSGkBgA5uORU9bfdqvLlq2q4QSYqAfBD~aCb2 SuehsLW6j OlSc7vdtcgRdL ALCAauatpQtly73 z3nI7DtAEGGKndwu98AviiwjKbvdl/flrF4wa/8DrnFC9yQo3tHNKy/Psc9HZiN 78WdchFMYt3B9nOOleoi4RsGYFoylj jgKA~vGV+PBfeRTOmDWTexWQLDxcAlrJoXkJROA4Vm/DC2lPo 8iXVG62TTyfsNghgpwwgFw6Cl~zyfkLvvrxDgeIE+DlbszKAOmd~iCCligsTonGL7zoicbngAETSSg xS 6V54i9zOcLIVioUE4dohhM2SpepQcQZqGgYyMxwQTnnFi4QAM9QXPlVgnQ6zpmEVVocrcmvwFRlU vL~P3~x~oABYCgb~g~mz zvizrp3b0u~ZA~~CmU/NgLH jKlBTK~ilt3IZVlhWzlgHuylz~k3HlxyRyG5q5TAYEEKTAACC4jig4UppilzvBT~n3tekPMEYEG NgLHSGYDBSswYkA2Hcr7hB6s4AViIArtuKCzzUp/yVCb3tTXGDn~jbdb2ospncu~g1GAtZklIScvCB w/4 CATkor6g/l.9lhp7fiMMcMkX38mia4msN+CgUN5hTCCDFRgAsl8hRGOikulT/S/uT 65 OqcYl2 OPxZEh vopXq1AE0UW8DAOUWQ/XMRVld8m3 iSw+7QvDNez~apwcFsMZ3vAEB8Q6GkLAwlco~tVGfgrsYhfGA2QQ hB 1soBcPCMHfN7GB/xA8YBSBH7wm~n+JwlPgEiEwvCUoPh7k3DFdhcs1HyBBY3BkNyMABdTLyoK 16ChE97z/gsZ1EFITuBFCPawBU~sYQ8hwMQWaoBJCmyhDlsgQxlkklkdnGkLhwfA7YEUBEpQwAaAqkMK KBGE4ydfoInwbOoLfcVyBzHIUGBFj wBLLS+kAsz7DRw2y+RASd~d6gAJTB2AIANQL/5umD8Jamfe+X1 vg7NJ3 ghQAb4VwnyJ3 1OsAexJwN7QAUhAH / 14BBkAJ7UAf ltwduEAJTUoIDgoanOMlvi4 jRQYwY7VgwI 4AAiUA~,hQBgtBjgj ZXNaLhccs3 rwtwsJGHuUwP+AFggAKUAG~eg~yhd7rqeAD+E700h7VAAASLAFKeAG QhABW4AElNCEtLcTJNOB8 t7cfOEcGf7d8W6f4yhcED+CDPcAdQCGxieFC6N8e/B3xtd8FUgJGxgC1ReA tNeBj V~mo3VzlbgXOZUFMMBwwFABJUAlPh2AWZA~bMBG0BvwLLdVQJEvINPiDrwB9xlcJ 1cHSLCBvBdS QeCDSQh9j 8eAsXd7rxc~sld78rcHmah8bKinriitKd4zvgAabcBzod3auh8IfB4gwgArOcLsocFG3CHi AFB9 9FcJSIABVhxZAdTM4RCHVk+QLJAAH/ADWKAGaOBuSSNGiNYqb/Ym+xGNiYGCeinwgblH Af74doAyfQBAJ7A+wjMSXgNN3elRweKmohlxICT7YfFkIAPMoPJBXe7tnCSHe5 iwgUn4LKnYj 8fljNri Btfnj I02AXQ7FVTDhBXMgXg5QA1~xgAjDwA~bwBDinVhKwhRv7WiPzj dQalE55xjui4CkDSf/5HfRQpLfXX jElYBzvIfBJ4hOI~kQKpjmuY2JWwkABABWGQg5HHkJXwAAIYB4 8[-kbQnkXVYCUEAfTiIfYgAXDEYFWnQ AqZwACcYARiwArnCKXAgB4voFS4Ij tkTYv7Ff trOPPt~kzWzCj S4iQBAATJEAh2azITypjmswhv/Ihv3X /49FuZVF+TKEEngNulwUgATNSJA+SAaPB4wGM4 zFyHvlaAkFiYeM0GvdJRNiQF~i8AE~kAR~kAWPpgZs OGodaTVFlm2E1XJxAYnk'FZ82 ZenslwMuYxJ+I5yuAfTBySd+XqPR5xtSAlF2Y/Lqla3d5T9J5mXslkA oIBOuAFWeQn6aHlCaZg~w3 ±3J4 /5lBedFhQlNiI~hwENpoAYriRFOdXNn5lBHB2f8 ZUAkgnQvJT5 hYIY SIdA~rIcMSAYpUHIPuTBbEAryHvFiIGVMJ130KGtSJGRKTxB0JinUMIZfSJTA7MHyXwIC5CJj Nx4AOCIB:A IIEUaIHGRwYQyoEIMXMyOP9VatABntAG5GrE4H+hem/dNWEFYA~ktpaarK3 Lhi cAloKuBgkbtCO /ggkbj B4QJKAGKh4 0LeLFsp7VLgFEsiY140MJy4iUltCVQENX4DWkwy~eRg~1 9PdiOGwgkyGealSRxOxYG UtAJChAu8BaNriRXyAUCJpBTYZZSdh1 fFOtgTvgkpUBSYPihlact9YeTyTmpmUBSnZnAy~gMj gd5kmaA 156 WO 2006/088962 PCT/US2006/005343 lYgnAlV2 2 4cj YlCCmYABZKV9MvRdgRQKzgCIWAFbBBwvafl51u5phokCoPYB3MCyO g~nBCTQaClHkpP EeOe~eFhmMABcimCiapNsfVZnSMFAnACbEA4qfWee3r/S3vxqMHKC2DDoJqqCVpopSj KrldSS4wGFwvg dZiQAUjTVZgEk2S~nlZyBrFOAruq7SZHtruVKflQWgAA~gACVnEd82GQqvcYmlcajWX+AqbOJiBzMg AhsTrv2pN/Ni sAf 7DQvAAAagXn1-X/FGEUyQCRAQGYOmOglbpN4 iZi 1ZZTAA2FAS+MzsKFFWMc~rAIS eWDIqYflZi5OQeZxgtBChtKJgtExLqsu(CfoHD4V3kf7gs~rlrDKmmSXCRflgrymwiV3QI{1s2BkBioE AA4AA4ozYhcbUxnmctWmnlhO~gzrZfFupQSSVt~iwhOn6DuvJlRS~t4SbmZyAnncI/7juUQYyQjXsZvS JTUaZYKQ15FVoGxeOeLaoaj CMVWRKxquNej OiCxFflqpyHa5RLT26SXYLgViqb5MLh+67qqqwnVx~aOSw65 m7X49m2/JVNgsQSYQAPvtl/j KBSmZOJdxSY6SgkVkFMa87EyGVHthwjv~x3Qt4FfuZWO~wNXmJTAJWX+3 5ZvbeK4XVt4QYSHxISHpkQHwyAH2aiLXe672 1FSVEmIPwiwTrJFwSvN6VmyoRo6A~bO~kDflz 924 5UKYHV 17fHCL9AqLqOu7f8i4HT974YiATruwftCwAWHLB1+r3 1u3wPLMAE/HgT7L9KCMDNR4UAyJWv~wQ+yLrM 84wI9P9 ZEINFNIAiVbDnLP/NkNRFYIBQrMoDWWAGLQls4gigdxshLlgMeMu~cEq4rd7z5KZp7 C~wEylgH9ci+Mj B7BBiQrVuBtEcBr9eKxirGnOi~xTrB+-Wj GQbCMSCd/T/nGrzd9eKw+yHp7sSfFmbmV e5wCFuwGGAzGbEzGaWrGA5zGG2isd~yYFDrIr8iML~rG~njJZCJ D+6C~yWuX4mIGWPClAEAE7/SbehpR BuJrnMARILOJc ioAAEAHQxxxTLoOpYsQNwgAONeOgF zGbtCZrOKBxugPWtyAAKnOZdrnARRgEg~mTwVy7 /He7zUx8FciP14nM~F5±EDCiFl9z/nXdlj zzlx~y~zsest1Pst9/cj 3u7zPsomM88kdflczrjnkxzaj +uc zftIpgTKhuoJu7ybkfBgw8o7LHeQA5OmV/iym9M1JomV+jXE~ljBlUgiBUgBbREc9iH718E~nzjzj4 y4tcgK+yt+wMpQlplBQIJGTwanLhk4wmNeOhpMf7PnhIP5Ko~bmE3ZhjeNoj dpjEqZCTJABWR4zrV7 ufl+dO ORCz7lHOy2NCfQcOOFwnbxsgRtYeBgllESyoSgKfbNnNru7SnnouyomgkOysEAllgMhzJtXpI QGvxV9cxI~aKMCUaAL4FlYSzTOQVBOOHixRCczkldonBGDviX/9JImQJliswj LZOavHtCyION7YAVKNng i6bLCJ5rfNmYLZFDXaLbbXxTHcH~mImYj dRkStl~koSPzYZkgNnq24 ZIONnAh9rnrbaeYPaGvy~n~kLI 3KvQmBTJIFTtgmwtLqqUVU7st7XAmwSYcAOmO 9BiRTF+7Q8RiqKvR3wi/dJrSMw6qdqKnXt8RwF79y17 m4CX6HjlfdS7YMeUmnuOG5p7UMcUSpCRd4CUwNuakJT7mPNiBzlzwjd+559Qc3KHuXcb4p977jbTObd8N Gnjyj ZSiKlyGxGKe~dvS4Mn/NUj +hkJXcQZGM~kHYAWFkcqgvMoaGbPGGzWkDAB3AgMRku/82 18OPBUm. CZiE3e23 ZfoAWGynOqzMCkisWcmAflfrSmZgCKfB6RP~gRg7TM4qQsCunneiHUQ71ZJjT~yekEPXiHO6GG Vh~h~e4ETR4CT/ 7aSb7kFM'~ckE3mYvGYGhwCxufF1Sej Wqrh8kBkNnfDrfSSGOEVHIAcQkDd+TYVqGyy pyMOlb62 lyCbN~qrTOx+KqKD~rJStli4hG24 zxKlTuj 1KK200yiGb3qGC14JQ~iBXPCAO/53 gms~q62T xUyQb~qnspelhp2ZiquJTGHjp6Bzgzdfqdnp4exvgwFGBk2q4pO ~isDGVs3 54 jnu6VyrrL'B3DNdq7dT3 c ddOXnfIF~bX/Q8FhangpX7HlgYQqQLCMCSrAiMfLqDVuG/qnok/rtMfarpRH7OjLCQx7CV~rj JOKBPGM 31g7Cvi1+tPLuCON/CQN/CVXb4C1FwxCHevAmGmEflFmbQAQ3OBHYTbzOlY434ntwXcCfgskij r3DT7rVB ole8B/9ODgKQ76PQgwiGPjhy~eROvbYOEH2EwZY2AncQ7pjnXrOaQCAbGJuQq7rZnVCNdLTyFAIHN ElMrriHqufgMrblrxSnfABB3QTlzE5+IuVkrz4Z TBmt9CRk/MTFp8iO79vPggdXq IzCYnyxdlDQAR+Q OXJ77TcfgDEFvSeCQdwr4K2qPgh9mx//7AOO2Xxgjpt4WDEJhNqYAUigNxp4VvMDfYbHRZwYCubgAEq F6SQaPiHH6wpu9fot3 7i+mYtqHZFgAWGJkGoT72p42JesATdtAkOoFOgs~d~hwhLP/rAnxs+Y2v4qrn7 Srninf62 85PhyXRMI~VagAAG7rrFJPrBL/Xtmcrau2Aolof htOdCv2 7RewZsoAZiBAV~gAN/vwlJYDvS qzTWf/O1+WexwcMoj 3YPJWhC~imoBghhYWJiZ2lqZ2JeRSUOCAC~kZKTkjhqg2CCYoocms6dg3gioSj j SSoqaqrrK2ur7CxsrOOtba3uLm~u7y~vpKlwaRgYsTGxcjEYcrJyP~dmZiby83Rx9afhGlpYFVPSUA4 LxOB7o~al~altPQxeyCwsGnv/Tl9vf4+fr7/P378c1+aRLEadq7gsoyPcNGEJPBge+8gDnhIMESX~bS YBIliqEnYgB7XERlisgTJkyhTqwZK2SpdteOGfzoj jkxgzCdO ZTJyYsDexmwnPGSj F3RmOg3uRQlkgXT plCj Sp2 GcumogtlKckroc~umaAu1PnvoFQwYLxHmY~iRizlgR47ww1Kta7du3j z5rUgqtrGazMB23wY c~lCnpu8Q~j 14ISZNDvduSucVOFcvZgza97MuRSfPV9DE5SWOB3CODbfwlUGMeJiXRCeennj kLX/V1Bw O4JcSrez79/Ag+v~jDNy6cOileEOWnJ7S3GnDgUYX05kbAzOfzD3MZegw4sfbSK(4aLAuZoW3Sx7 lo4Os;xbzorjWhydYCrVlpvzmddVxifEdeQQWaOCflsxB3HDXsCUYyadQYtl4lzkD3CgQcOPAEHHNkVZp/ Y+2OHUzw8IbgiSimeOBn56OnylcQgaUTf+9FdpsO~aVywAQfwGCEFYOYYSN~t4FlE3y5ITogikw26SRm CrL3V4NUkshVYVstROyOCDQAQQk9MAFGGnC8scZ+Cj O4pVE3 /jWiQUs+KeecaKZk3nluqucXjWHv2Gdu 9PTgBBNYPJbGGW3N/4 fdjNlJ~qKj fiHZUZx1VvokBTsEsQM~FqGwQggPRPJpCJSMOmqolglQZWDtpSaj 1RCus2cyPOOXlnI-VLaqQZIMZx9O63pmY~rBD~yrDHscciQckWxwYBSQj ItcIssnXlkCqLxe366EFSYsPn aO4l5KG3NUGYZGOOFVIWawDu5 lJvxMYr~gNIlEFqCHXsscEkzO~xBSQ7RflvisTt~isQedaBaS GoOypTa O7leGeG~ziVFQMOyzqilTrpO3B18oYM~rT/AryFtZwuOUdkFDRr77kArJSEEsYXMNqcAQBBurOxZ qDLssQOz1YJ~p57p/Scj f~n6+dFp4upG2m2Sqf8T16NGdiUfbqEI(/LX4GlwrBMyo8Lswfu~cfAeMMtM wbEUG0sEJD1TAGOdNu8xdxBj OwG3cAz7yioOELaD9VFu~kj 4rO9X7HFOrlmq+FHOgAz25ZtRMSOVZUfC bAp7bLoHSGwPvAcZW6xcxwwDPLCHGwCTQYANIZMK+r98 zwOAGTEflzlemYLLsdNuGUldsPzaaQOHAV4 2VLBa+Vtso~ibvlvmjjR87E~pxyzGOGA/nbpkhyLer8oa75BwDtAlgMTVvPu78x7OAsBs574FjtXDERa+ cUzneRO~isQ4WOt~kacihTNeqdYoGdoYCmiuZ52JGBbXB71htcx/CNOgDzcn/QAc2eJ2m5Me3 .it3PdlZR V5SQkyQ~cQxcylGNBK3(Kb2ZEESR2lrVLOfAlkIld6KSn/dCEDAyUAEAGflSdzKBVotcxiwoecAEXRgeA 2 S3PhLOLjv4YxL±IdedNGj PLxAYYQ8Mox4BdTOD'kiCEgr/nwjVEZXxBCkALeoWyCn2oWEskXCZo9K4kcA ONwerDUAzYWPd7ij HyROgMXfCQ9qIHle8cDFFUn1SSb/kZQkcegidiGEeiGBFxxHeZIUbA9h9RviHkkF yD5mcXwyM9YeULWBfNXhYG54QAkXmXgbHFNkUOJF4+DMQFOaHAj +tVB2jQ5M3gChGSMZkk8hb~f±021 /SF4A~hQdDS11RCThgzQ8iTEn+G5Zz~fkl44DZfflOeBggaGUpjxPFATedc9JWxTMuISpzPhoR4HKbFgy M3 ZARgGGTUpx4 zwXKp~kzOlbMIJaDSX5J4px7U9PS~cObfig6L3wJt~kqEhHqo9ftudxEAvodY7zv6IU LqNrf~wXjbMdTYSUpDjNKS/uRPeAbP+EUH6iAABtSM9RWo4mWbUn+pM9T81L704imF2YKaSPj u tfiChSxgoatgsapYwopWsjbVq2f9Gljllala2tvWtao3rWsk6V7PaFal3lStZu+DUp2KmX8hywlTvwb 5EtfqggaylpJp8849rGQj f+sZCdLWab4NTzMllsIV~aXgUKCdHtQVip2SbtgyqmyqE2taleb~llFeNjPM qh8TAaDY1AHgAU7IFxnuGYRB IQEJW9hX61~gL9 zmyw3 1A64MehZc5oZqAlv4LcJScLCiAOC4 +lJuzwQL CVP~a2h3jITfNAc7SEAXCXxb7uuUZVsAbCB+VQDUznpWh5/RVmhEC+9vWmvf/vr3v659rvsi2lOM8ilf /LcrZb3tnLLzFj SUIixnfyMCzQdrvdAu+Je/mxkQkCDI(6fzshOYD7utglssCphtTrSJWvd7tWOpV2x~v W68gRJB2qMzs/PTmtz3 I15f/DbKQh/yZAAsYLwe8ZE/+eYGToltd1Xcwx~teVS/6pG2QeDUwZzlufkp i2Q4hvKxQiXi~j HLWiu+bmghwbsUO7bEfh~tE3GmyBlDq8k7clbiopxI3fEotJshsqAHTWgj H9kugPVZ qEhL2hkD~oSAfMD~tsfl3uOSzGCe8YVOVWaZYZCRjW7ZmkFX3kxfcWZsoBBh1YaqS8UvkVgEtGYITeta 9 9EQh5YKgSfB6Do7mnyQJp/fhOa3SpPqovLLdBl7vWlUN/vU2ZWsu41RGgncQo5PVujYXcgM~al7LO j K3HTe7J4j rXUtmIJEgLOmw+eXdUpFfvAllMNRv7lJgW4rLNzG9Ps+lgolX3m5Eb/4R66g3HEiQtBvf s 3clSOYqkLex~yO3xijMQ3ZsROCQa3brdglNHCAXowlcmXrxZIIF/3Rva/lFO6ZjMSg2lLLY01wTvRvpll pubbyWZMGhR4F4JG+/i~nwlkixv9Ho4qsafonFou/eUN1+wE4IdCVmCT~rNVj nCndlsZflwaulsodiTG N9gVp8DUG2CuptcCY/ faEpe~rLPE84 fOulNcGUvfjGfNuOOA9GywkrAbSeWMX7±lzxfa7Ptt~b~iuzve 157 WO 2006/088962 PCT/US2006/005343 lnjP+28mnLfysuRuQejxj+f5+M4XWvIqegAsbBlfqoxgezcbqedxP+TIg/71PmS97P3retjb3nqzz/1q a3/73s+LTPfAryzvfU/8YQX/+JEdfvGX31jkO58vyme+9Jv0/OpffPrYB~vlty+M6Gf/+wXivvhNAf7y x2v84/e++ddfdPRbX/3sj3+g3b99+Mv//nuh//vxz39u6r/69td/Aggv/weAA3iA41GAzxeACNiAsaGA yMeADjiB/ACBEUiBGDhrFhh8EpiBHvgLG8iBHziCdRGCwNeBJJiCtmCCuoeCKviCLcGCs+eCMFiDrCCD M2iDOkgSOCh7NLiDQAgJPch6PxiEOziEqleERmiDgQAAOw== } .about.image create image 0 0 -image aboutImage -anchor nw .about.image create text 288 160 -anchor s -font {Helvetica 12) -text $::versionDate bind .about <Button> {destroy .about; image delete aboutImage) bind .about <Key> {destroy .about; image delete aboutImage} ##################################################################### S.PEPTIDE WINDOW # ##################################################################### # <path> # <path>/config # criteria includejexcludelmust-have descriptorsj* string ... # db NCBI|LANL geometry =<w>x<h>+<x>+<y> # modified 011 # path <path> # query active query # seqpattern +<aminos>|-<aminos>l:<mingap>:<maxgap>... term query term # title <title> # zoom =<w>x<h>+<x>+<y> # zoomed 0jl # Create the peptide search window. This fills out the peptide # search window and creates the behaviour for it. if ([catch {load [lindex $env(TCLLIBPATH) end]/ncbiasn.dylib) rs]) log "load ncbiasn.dylib failed: $rs" uplevel #0 source "$Contents/Resources/Scripts/asn-parser.tcl" log "sourced asn-parser.tcl" else { log "loaded ncbiasn.dylib" package require http proc peptide (op {path .) switch $op new { set k [clock seconds) while ([file exists /tmp/$k.peptide] } (incr k} array set S [list path /tmp/$k.peptide title "Untitled Query"] open-ask set S(path) [tkchooseDirectory -message (Select the folder which is the peptide query.)] if {![string length $S(path)]} return set S(title) [file root [file tail $S(path)]] open set path [file root $path].peptide if {![file exists $path) (bell; return) array set S [list path $path title [file root [file tail $path]]] array set S file-menu (New) {peptide new) {Command-N) (Command-n) (open...) (peptide open-ask) {Command-O} {Command-o) - - { {} (Close) {%S::quit) (Command-W} (Command-w) - {}{} 158 WO 2006/088962 PCT/US2006/005343 {Save) {%S::save} (Command-S} (Command-s) {Save as...) {%S::save-as} {Command-Shift-S) (command-S} (Revert} {%S::revert) {} {} - - {} {} {Query) {%S::query) (Command-?) {Command-question) (Manually Check) (%S::manual) (Command-M) (Command-m) (Report) {%S::report) {} {} - - {1 {} {Show Log) {log::show} {} {} db NCBI term {) seqpattern {} criteria {) focus {} if ([file exists $S(path)/.peptide-configure]} set path $S(path) set f [open $S(path)/.peptide-configure] array set S [read $f] set S(path) $path close $f else { array set S (zoomed 0) array set S {segnum 1 critnum 1 modified 0) foreach {space w) [eval window .peptide;U [array get SI] break wm withdraw . log "Open peptide query $S(path) in $w." pack [frame $w.aminoacids] -side right -fill y -expand 1 foreach code { A B C D E F G H I K L M N P Q R S T V W Y Z name { Ala Asx Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr Glx pack [button $w.aminoacids.[string tolower $code] -font (Helvetica 10) -text "$name ($code)" -command " ${space)::segpattern-add-acid $code "] -side top -fill x namespace eval $space { proc show } { variable S wm deiconify $S(window) proc save ({ variable S array set Z [array get S] foreach x (file-menu focus modified window space menubar title) unset -nocomplain Z($x) set Z(geometry) [wm geometry $S(window)] file mkdir $S(path) set f [open $S(path)/.peptide-configure w] log "Save $S(window) to $S(path)/.peptide-configure: db=$Z(db) term=$S(term) segpattern=$Z(seqpattern) criteria=$Z(criteria)." foreach {p v) [array get Z] { puts $f "[list $p] [list $v]1 close $f set S(modified) 0 proc save-as {} variable S set path [tkchooseDirectory -mustexist 0 -message (Select the folder which is the peptide query.)] if (I[string length $path]} return if (![string equal [file extension $path] peptide]) set pathl [file root $path].peptide 159 WO 2006/088962 PCT/US2006/005343 if ([file exists $path] ) if {[file exists $path/.peptide-configure]{ elseif {[llength [glob -nocomplain "$path/*"]]} if {[string equal [ tk-messageBox -parent $S(window) -default no -icon warning -title "Overwrite?' -type yesno \ -message "Overwrite existing [file tail $path]?" ]no] return file rename $path pathi set path $pathl file mkdir $path if {![string equal $path $S(path)] set fns [glob -nocomplain "$S(path)/*"] if {[llength $fns]) (eval file copy -force -- $fns [list $path]} set S(path) $path log "Save as $S(window) to $S(path)." array set S [list path $path title [file root [file tail $S(path)]]) wm title $S(window) $S(title) save proc quit {} variable S if {$S(modified)} switch [ tk-messageBox -parent $S(window) -default yes -icon question -title "Save?" -type yesnocancel \ -message "Save $S(title) before closing?" ] { yes {$S(space)::save} cancel {return 0) log "Close $S(window)." destroy $S(window) after idle [list namespace delete $S(space)] if {![llength [main::queries]]} {wm deiconify return 1 proc manual {} variable S manualchecklist $S(space) $S(path) $S(title) proc revert {} variable S log "Revert $S(window) from $S(path)." catch {query-stop) foreach s {segpattern criteria) catch (eval destroy [$S(window).$s.t window names]) $S(window).$s.t delete 1.0 end if ([file exists $S(path)/.peptide-configure]{ set f [open $S(path)/.peptide-configure] array set S [read $f] close $f initialise set S(modified) 0 proc initialise {{ variable S log "Initialise $S(window): db=$S(db) term=$S(term) seqpattern=$S(seqpattern) criteria=$S(criteria)." set w $S(window) $w.query.db configure -text $S(db) 160 WO 2006/088962 PCT/US2006/005343 $w.query.text insert 0 $S(term) $w.count.pass configure -text 0 $w.count.passpercent configure -text $w.count.fail configure -text 0 $w.count.failpercent configure -text $w.countseen configure -text 0 $w.status dchars text 0 end $w.status coords bar 0 0 0 19 $w.button.search configure -state normal $w.button.stop configure -state disabled foreach piece $S(seqpattern) { seqpattern-add init $piece seqpattern-add init bottom foreach (test desc string} $S(criteria) criteria-add init $test $desc $string criteria-add init bottom bottom bottom proc zoom {w h x y) variable S set currgeom [wn geometry $S(window)] if {$S(zoomed) && [string equal $currgeom $S(zoom)]} set S(zoomed) 0 wm geometry $S(window) $S(geometry) else { set S(geometry) $currgeom if {I[regexp {^=?(\d+)x(\d+) [+](\d+) [+] (\d+)$) $currgeom - w h x y]j foreach {w h x y} {0 0 0 01 break } if {$w>[winfo screenwidth $S(window)]} {set w [winfo screenwidth $S(window)]} set S(zoom) ${w}x[expr {[winfo screenheight $S(window)]-50}]+$x+20 set S(zoomed) 1 wm geometry $S(window) $S(zoom) pack [frame $w.query] -side top -fill x pack [menubutton $w.query.db -text $S(db) -menu $w.guery.db.m] -side left menu $w.query.db.m -tearoff 0 $w.query.db.m add command -label www.ncbi.nlm.nih.gov -command "${space)::database NCBI" $w.query.db.m add command -label www.flu.lanl.gov -command "${space}::database LANL"I pack [label $w.query.label -text "query: "] -side left pack [frame $w.query.pad -width 201 -side right pack [entry $w.guery.text -validate key -validatecommand after idle ${space)::term; expr 1 "] -side left -fill x -expand 1 focus $w.query.text namespace eval $space { proc database db variable S set S(db) $db $S(window).query.db configure -text $db set S(modified) 1 proc term {} variable S set newterm [$S(window).query.text get] if (I[string equal $S(term) $newterm]{ set S(term) $newterm set S(modified) 1 pack [frame $w.count] -fill x -side top pack [label $w.count.1 -text "pass: "] -side left pack [label $w.count.pass -text 0 -width 5] -side left pack [label $w.count.passpercent -text '" -width 6] -side left 161 WO 2006/088962 PCT/US2006/005343 pack [label $w.count.2 -text " fail: "] -side left pack [label $w.count.fail -text 0 -width 5] -side left pack [label $w.count.failpercent -text -width 6] -side left pack [label $w.count.3 -text " total: ] -side left pack [label $w.count.seen -text 0 -width 5] -side left pack [canvas $w.status -height 20 -bd 0 -background white] -side top -fill x $w.status create rectangle 0 0 0 19 -fill red -tags bar $w.status create text 3 19 -anchor sw -fill black -tags text namespace eval $space { proc status {message i n} variable S $S(window).status dchars text 0 end $5(window).status insert text 0 $message if {$n==0} { $S(window).status coords bar 0 0 0 19 else { $S(window).status coords bar 0 0 [expr {([winfo width $S(window).status]*$i)/$n)] 19 update proc count (seen pass fail) variable S if {$seen==0} $S(window).count.pass configure -text 0 $S(window).count.passpercent configure -text $S(window).count.fail configure -text 0 $S(window).count.failpercent configure -text $S(window).count.seen configure -text 0 else { $S(window).count.pass configure -text $pass $S(window).count.passpercent configure -text ([expr ((100*$pass)/$seen}] %) $S(window).count.fail configure -text $fail $S(window).count.failpercent configure -text ([expr {(100*$fail)/$seen}]%) $S(window).count.seen configure -text $seen update proc completed {} variable S $S(window).button.search configure -state normal $S(window).button.stop configure -state disabled update pack [frame $w.button] -side top -fill x pack [button $w.button.search -text Search -state normal -command ${space}::query "] -side right pack [button $w.button.stop -text Stop -state disabled -command ${space}::query-stop " -side right bind $w <Command-.> ${space)::query-stop bind $w <Control-c> ${space}::query-stop bind $w <Escape> ${space}::query-stop namespace eval $space proc query {} { variable S $S(window).button.search configure -state disabled $S(window).button.stop configure -state normal query-start pack [labelframe $w.seqpattern -text "Peptide Pattern"] -side top -fill both -expand 1 pack [ scrollbar $w.segpattern.y -orient vertical -command "$w.segpattern.t yview" -side right -fill y pack [ text $w.seqpattern.t \ 162 WO 2006/088962 PCT/US2006/005343 -width 75 -height 15 \ -state disabled \ -highlightthickness 0 \ -yscrollcommand "$w.seqpattern.y set" -side top -fill y -expand 1 namespace eval $space { proc windowPositionInText {s} # s = ...t.nnn # Returns the text index in ...t of where ... t.n is. set w [winfo parent $s]; set k 0 foreach {key value index} [$w dump -window 1,0 end] if {[string equal $key window] && [string equal $s $value]} return (list $k $index] incr k error "window not found: $s" proc segpattern-add (s token} variable S $S(window).seqpattern.t configure -state normal if ([string equal $s init]} set pos end else { foreach (i pos} [windowPositionInText $s] break set S(modified) 1 set S(seqpattern) [linsert $S(seqpattern) $i $token] set w $S(window).seqpattern.t set s $w.$S(seqnum); incr S(seqnum) $w window create $pos -window [frame $s] pack [button $s.insertgap -text "^Gap" -font {Helvetica 10} -command $S(space)::segpattern-add $s "] -side left pack [button $s.insertamino -text "^Match" -font {Helvetica 10} -command $S(space)::seqpattern-add $s + "] -side left if {![string equal $token bottom]} pack [button $s.delete -text Delete -font (Helvetica 101 -command $S(space)::seqpattern-delete $s "] -side left if ([string match :* $token]} foreach {- mingap maxgap) [split $token :] break if (l[string length $mingap]) (set mingap 0) if (![string length $maxgap]l (set maxgap 01 pack [ label $s.mingaplabel -text "A gap of at least " -font {Helvetica 10) I -side left pack [ entry $s.mingap -width 5 -justify right -validate key -validatecommand after idle $S(space)::seqpattern-gap $s; expr 1 -side left -fill x -expand 1 pack [ label $s.maxgaplabel -text " but no more than " -font (Helvetica 101 -side left pack [ entry $s.maxgap -width 5 -justify right -validate key -validatecommand after idle $S(space)::seqpattern-gap $s; expr 1 -side left -fill x -expand 1 $s.mingap insert 0 $mingap $s.maxgap insert 0 $maxgap pack [ label $s.endgaplabel -text " amino acids." -font (Helvetica 10} ] -side left focus $s.mingap $s.mingap select range 0 end else { set quantifier [string index $token 0] pack 163 WO 2006/088962 PCT/US2006/005343 label $s.beginaminoslabel -text "One amino acid which is " -font (Helvetica 10} } -side left pack [ 10} menubutton $s.quantifier -width 9 -menu $s.quantifier.m -font (Helvetica -side left menu $s.quantifier.m -tearoff 0 $s.quantifier.m add command -label "any of" -command $S(space)::segpattern-quantifier $s + $s.quantifier.m add command -label "none of" -command $S(space)::seqpattern-quantifier $s seqpattern-menu-label $s [string index $token D] pack [ entry $s.aminos -width 20 -validate key -validatecommand after idle $S(space)::seqpattern-aminos $s; expr 1 I -side left -fill x -expand 1 bind $s.aminos <FocusIn> "$S(space)::seqpattern-focus-in %W" bind $s.aminos <FocusOut> "$S(space)::seqpattern-focus-out %-WI $s.aminos insert 0 [string range $token 1 end] focus $s.aminos $S(window).seqpattern.t configure -state disabled proc seqpattern-add-acid code variable S if {[winfo exists $S(focus)]} $S(focus) insert end $code proc seqpattern-focus-in w variable S set S(focus) $w } proc seqpattern-focus-out w variable S if {[string equal $S(focus) $w]) set S(focus) proc segpattern-delete (S) variable S set t $S(window).segpattern.t $t configure -state normal foreach {i pos) [windowPositionInText $s] break set S(seqpattern) replacee $S(seqpattern) $i $i] $t delete $pos $t configure -state disabled set S(modified) 1 proc seqpattern-gap {s} { variable S set i [lindex [windowPositionInText $s] 0] set mingap [$s.mingap get] set maxgap [$s.maxgap get] set S(seqpattern) replacee $S(seqpattern) $i $i :$mingap:$maxgap] set S(modified) 1 proc seqpattern-quantifier (s quantifier) variable S set i [lindex [windowPositionInText $s] 0] set e [lindex $S(segpattern) $i] seqpattern-menu-label $s $quantifier set aminos [string range $e 1 end] set S(seqpattern) replacee $S(segpattern) $i $i $quantifier$aminos] set S(modified) 1 proc seqpattern-aminos {a) variable S 164 WO 2006/088962 PCT/US2006/005343 set i [lindex [windowPositionInText $s] 0] set e [lindex $S(seqpattern) $i] set quantifier [string index $e 0] set aminos [$s.aminos get] set S(segpattern) replacee $S(seqpattern) $i $i $quantifier$aminos] set S(modified) 1 proc seqpattern-menu-label (s token}) if {[string equal $token +1 { $s.quantifier configure -text "any of" else { $s.quantifier configure -text "none of" pack [labelframe $w.criteria -text "Additional Criteria"] -side top -fill x -expand 0 pack [ scrollbar $w.criteria.y -orient vertical -command "$w.criteria.t yview" -side right -fill y pack [ text $w.criteria.t \ -width 75 -height 10 \ -state disabled \ -highlightthickness 0 \ -yscrollcommand "$w.criteria.y set" -side top -fill y -expand 0 namespace eval $space { proc criteria-add (s test desc string) variable S $S(window).criteria.t configure -state normal if {[string equal $s init]) set pos end else { foreach {i pos) [windowPositionInText $s] break set i [expr (3*$i}] set S(modified) 1 set S(criteria) [linsert $S(criteria) $i $test $desc $string] set w $S(window).criteria.t set s $w.$S(critnum); incr S(critnum) $w window create $pos -window [frame $s] if [string equal $string bottom]) { pack [button $s.insertgap -text "New Criteria" -font {Helvetica 10) -command $S(space)::criteria-add $s include * {} "] -side left } else { pack [button $s.delete -text Delete -font {Helvetica 10} -command $S(space)::criteria-delete $s "] -side left pack [ menubutton $s.test -width 11 -menu $s.test.m -font (Helvetica 10} -side left menu $s.test.m -tearoff 0 $s.test.m add command -label "Include" -command $S(space)::criteria-menu $s 0 include $s.test.m add command -label "Exclude" -command $S(space)::criteria-menu $s 0 exclude $s.test.m add command -label "Must have" -command $S(space)::criteria-menu $s 0 must-have pack entry $s.string -validate key -validatecommand after idle $S(space)::criteria-string $s; expr 1 -side left -fill x -expand 1 focus $s.string pack [ menubutton $s.desc -width 30 -menu $s.desc.m -font (Helvetica 10) -side left 165 WO 2006/088962 PCT/US2006/005343 menu $s.desc.m -tearoff 0 $s.descm add command -label "in any lines." -command $S(space)::criteria-menu $s 1 * $s.desc.m add command -label "in definition." -command $S(space)::criteria-menu $s 1 %definition $s.desc.m add command -label "in source." -command $S(space)::criteria-menu $s 1 %source $s.desc.m add command -label "in strain.' -command $S(space):;criteria-menu $s 1 %strain $s.desc.m add command -label "in serotype." -command " $S(space)::criteria-menu $s 1 %serotype $s.desc.m add command -label "in publications." -command " $S(space)::criteria-menu $s 1 *.pub.* $s.desc.m add command -label "in annotations." -command " $S(space)::criteria-menu $s 1 *.annot.* $s.desc.m add command -label "in keywords." -command " $S(space)::criteria-menu $s 1 *.keywords.* criteria-nenu-label $s 0 $test criteria-menu-label $s 1 $desc $s.string insert 0 $string $S(window).criteria.t configure -state disabled proc criteria-delete {a) variable S set t $5(window).criteria.t $t configure -state normal foreach {i pos) [windowPositionInText $s] break set i [expr {3*$i}] set j [expr {$i+2) set S(criteria) replacee $S(criteria) $i $j] $t delete $pos $t configure -state disabled set S(modified) 1 proc criteria-menu (s k what) variable S set i [lindex [windowPositionInText $s] 0] set i [expr (3*$i+$k)] criteria-menu-label $s $k $what set S(criteria) replacee $S(criteria) $i $i $what] set S(modified) 1 proc criteria-string {s) variable S set i [lindex [windowPositionInText $s] 0] set i [expr (3*$i+2)] set S(criteria) replacee $S(criteria) $i $i ($s.string get]] set S(modified) 1 proc criteria-menu-label {s k what) switch $k$what { Oinclude {$s.test configure -text "Include") Oexclude ($s.test configure -text "Exclude") Omust-have {$s.test configure -text "Must have") 1* ($s.desc configure -text "in any lines.") definition {$s.desc configure -text "in definition.") l%source ($s.desc configure -text "in source.") 1%strain {$s.desc configure -text "in strain.") 1%serotype {$s.desc configure -text "in serotype.") 1*.pub.* ($s.desc configure -text "in publications.") 1*.annot.* {$s.desc configure -text "in annotations.") 1*.keywords.* ($s.desc configure -text "in keywords.") 166 WO 2006/088962 PCT/US2006/005343 ${space}::initialise pack [label $w.version -text $::versionDate -anchor S -font (Helvetica 9)] -fill x -side bottom #################################### # QUERY # #################4################################################### # STATE # cache: cache/query # criteria: includeiexcludelmust-have %descriptorl* string ... # db: database to search # failing: fail.htm channel. # fetching: accessionlink accessionurl accessionnumber ... # segpattern: +<aminos>j-<aminos>|:<mingap>:<maxgap>... # offset: next fetching # path; *.peptide folder path # retry: fetching pending retry # stop: stop requested # term: query text search term # token: pending http token # tracker: mailbox to send status to. * tracker status <message> <i> <n> # tracker count <seen> <pass> <fail> # tracker completed # <path> # <path>/cache # <path>/cache/<accession>.prop #I %accession: accession number # %definition # %link: accession link # %pass: if passes all matched criteria (can be set manually) #- %reason: reason why %pass is set as it is # %seqdata: amino acid sequence #t %aerotype # %source # !%strain #I %subsequences matched-subsequence bounds # %uri: accession url # %year # <...>: ASN entry #t <path>/cache/cache/query term: query term #t LANL: if LANL searched # NCBI: if NCBI searched # Code adapted from GenomeExplorer to query a remote database # (NCBI or LANL) and parse the returned page. namespace eval $space proc query-start {} capture { variable S log "Query $S(term)." file mkdir $S(path)/cache file mkdir $S(path)/pass foreach fn [glob -nocomplain "$S(path)/pass/*.htm"] {file delete $fn} file delete $S(path)/index.htm file delete $S(path)/fail.htm regsub -all (\s+} $S(term) + S(termns) if ([catch { set f [open $S(path)/cache/query r] array set cache [read $f] set cached [expr ([string equal $cache(term) $S(termns)] && $cache($S(db)))] catch (close $f) rs]} ( set cached 0 167 WO 2006/088962 PCT/US2006/005343 array set cache [list term $S(termns) $S(db) 1] array set S [list \ cache [array get cache] \ fail 0 \ pass 0 \ seen 0 \ status "" 0 0 count 0 0 0 if {$cached} { match else { foreach fn [glob -nocomplain "$S(path)/cache/*"] {file delete $fn} fetch-list-$S(db) 20 proc query-stop {{ capture { variable S catch (http::wait $S(token); http::cleanup $S(token)} log "Query stopped." status "Stopped." 0 0 completed proc debugging-dump {file text} if {[pref debuggingMode default 0] 1 variable S file mkdir $S(path)/cache set f [open $S(path)/cache/$file w] puts $f $text close $f proc fetch-list-LANL {n} capture { variable S log "Querying for $n results from LANL." status "Querying for $n results from LANL." 0 0 set S(token) [http::geturl \ http;//www.flu.lanl.gov/search2/resultNhtml?search=l&field=ALL&num=$n&hspecies=Any&seg=any &nucorpro=nuc&orderby=dateasc&text=$S(termns) \ -command [list $S(space)::parse-list-LANL $n]] proc fetch-list-NCBI (n) capture { variable S log "Querying for $n results from NCBI," status "Querying for $n results from NCBI." 0 0 set S(token) [http::geturl \ http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Search&db=Protein&term=$S(termns)&disp max=$n&doptomdl=DocSum \ -command [list $S(space)::parse-list-NCBI $n] proc parse-list-LANL (n token} capture { variable S unset S(token) set queryPage [::http::data $token] debugging-dump debug-list.htm $queryPage 168 WO 2006/088962 PCT/US2006/005343 if {[regexp (Your request could not be processed due to a problem) $queryPage]) log "Server busy: retrying in 5 seconds..." status "Server busy: retrying in 5 seconds.. ." 0 0 after 5000 $S(space) ;;fetch-list-LANL $n elseif {![regexp (Got <B>[0-9]+)</B> hits total} $queryPage all numentries] 1 ![info exists numentries) || [string equal $numentries "'| $numentries==0 log "Nothing matched the query." status "Nothing matched the query." 0 0 ) elseif {$numentries>$n) { fetch-list-LANL $numentries else { array set S {fetching {) retry {} offset 0} set queryPage [string map ("\"" '} $queryPage] foreach {accessionlink accessionurl accessionnumber) \ [regexp -all -inline (<A HREF='(viewrecord.html[?] [^<>]*)'[^<>]*>([^<]*)</A5) $queryPage] \ set accessionurl "http://www.flu.lanl.gov/search2/$accessionurl" set accessionlink [string map {"HREF='" "HREF='http://www.flu.lanl.gov/search2/") $accessionlink] lappend S(fetching) $accessionlink $accessionurl $accessionnumber fetch-page-LANL http::cleanup $token proc parse-list-NCBI {n token) capture { variable S unset S(token) set queryPage [::http::data $token] debugging-dump debug-list.htm $queryPage if {[regexp {Your request could not be processed due to a problem) $queryPage]} log "Server busy: retrying in 5 seconds..." status "Server busy: retrying in 5 seconds..." 0 0 after 5000 $S(space)::fetch-list-NCBI $n elseif {![regexp (Items 1-[0-91+ of ([0-91+)) $queryPage all numentries] | ![info exists numentries] 11 [string equal $numentries ""] $numentries==0 log "Nothing matched the query" status "Nothing matched the query." 0 0 elseif {$numentries>$n) { fetch-list-NCBI $numentries else { array set S (fetching {) retry (} offset 0) set queryPage [string map ("\"" '1) $queryPage] foreach {accessionlink accessionurl accessionnumber) \ [regexp -all -inline (<a href='([^?<>]*[?]cmd=Retrieve[^<>]*)'>(['<]*)</a>} $queryPage] \ lappend S(fetching) $accessionlink $accessionurl $accessionnumber fetch-page-NCBI http::cleanup $token proc fetch-page-LANL } capture { variable S if {$S(offset)<[llength $S(fetching)]) set accessionlink [lindex $S(fetching) $S(offset)]; incr S(offset) set accessionurl [lindex $S(fetching) $S(offset)]; incr S(offset) set accessionnumber [lindex $S(fetching) $S(offset)]; inar S(offset) set S(token) [http::geturl $accessionurl \ 169 WO 2006/088962 PCT/US2006/005343 -command [list $S(space)::parse-page-LANL $accessionlink $accessionurl $accessionnumber]] elseif {[llength $S(retry)]} { array set S [list fetching $S(retry) retry (} offset 0) log "Retrying [expr ([llength $S(fetching)]/3}] entries in 5 seconds..."' status "Retrying [expr {[llength $S(fetching)]/3}] entries in 5 seconds..." 0 0 after 5000 $S(space)::fetch-page-LANL else { fetch-complete match proc fetch-page-NCBI {} capture { variable S if {$S(offset)<[llength $S(fetching)]} set accessionlink [lindex $5(fetching) $S(offset)]; incr S(offset) set accessionurl [lindex $S(fetching) $S(offset)]; incr S(offset) set accessionnumber [lindex $S(fetching) $S(offset)]; incr S(offset) set S(token) [http::geturl [string map {query.fcgi viewer.fcgi) $accessionurl]&view=asn \ -command [list $S(space)::parse-page-NCBI $accessionlink $accessionurl $accessionnumber] elseif {[llength $S(retry)]) array set S [list fetching $S(retry) retry {} offset 0) log "Retrying [expr ([llength $S(fetching)]/3})] entries in 5 seconds..." status "Retrying [expr {[llength $S(fetching)]/3}] entries in 5 seconds..." 0 0 after 5000 $S(space)::fetch-page-NCBI } else { fetch-complete match proc parse-page-LANL (accessionlink accessionurl accessionnumber token) capture { variable S unset S(token) log "Fetching $accessionnumber" status "Fetching $accessionnumber" [expr {$S(offset)+1)] [llength $S(fetching)] set page [::http::data $token] debugging-dump debug-page.htm $page if {[regexp {try again later} $page}{ lappend S(retry) $accessionlink $accessionurl $accessionnumber } else { array set PAGE [list \ %accession $accessionnumber \ %link $accessionlink \ Wuri $accessionurl \ %pass 1 \ %reason "Fetched page from $S(db)." \ catch set f [open $S(path)/cache/$accessionnumber.prop r] array set PAGE [read $f] close $f array set coding GCT a GCC a GCA a GCG a TGT b TGC c GAT d GAC d GAA e GAG e TTT f TTC f GGT g GGC g GGA g GGG g CAT h CAC h ATT i ATC i ATA i AAA k AAG k TTG 1 TTA 1 CTT 1 CTC k CTA 1 CTG 1 ATG m 170 WO 2006/088962 PCT/US2006/005343 AAT n AAC n CCT p CCC p CCA p CCG p CAA q CAG q CGT r CGC r CGA r CGG r AGA r AGG r TCT s TCC s TCA s TCG s AGT s AGC s ACT t ACC t ACA t ACG t GTT v GTC v GTA v GTG V TGG w TAT y TAC y TAA * TAG * TGA * foreach {all attribute value) \ [regexp -all -inline {<tr><td><span class="bold">([^<>]+)</span></td>\s*<td>((?:[^<]|<[^/]]</[^t]]</t[^d]l</td[^>])*)</td5\s*</tr>} $page] \ regsub -all {<[^<>]*>} $value () value set value [string trim [string map {  " & & < < > > ' ' &quot "\""} $value]] regsub -all {\s+) $value { } value set PAGE([string map {" ' -} $attribute]) $value switch $attribute { Strain {set PAGE(%strain) $value) Definition (set PAGE(%-definition) $value) Source {set PAGE(96source) $value) "Collection Year" (set PAGE(%year) $value) Serotype (set PAGE(%serotype) $value) "Raw Sequence" { regsub -all {\s+) $value {} value set PAGE(%seqdata) "'" for {set c 0) {$c<[string length $value]) {incr c 3) set codon [string range $value $c [expr {$c+2})] if {[info exists coding($ccdon)]} { append PAGE(%seqdata) $coding($codon) else { append PAGE(%seqdata) ? file mkdir $S(path)/cache set f [open $S(path)/cache/$accessionnumber.prop w] foreach {p v) [array get PAGE] (puts $f "[list $p] [list $v]') close $f http::cleanup $token fetch-page-LANL proc parse-page-NCBI (accessionlink accessionurl accessionnumber token) capture { variable S unset S(token) log "Fetching $accessionnumber" status "Fetching $accessionnumber" [expr {$S(offset)+l [llength $S(fetching)] set page [::http::data $token] debugging-dump debug-page.htm $page if ([regexp {try again later) $page]{ lappend S(retry) $accessionlink $accessionurl $accessionnumber else { array set PAGE [list \ accession $accessionnumber \ %link $accessionlink \ %uri $accessionurl \ %pass 1 \ %reason "Fetched page from $S(db)." catch set f [open $S(path)/cache/$accessionnumber.prop r] array set PAGE [read $f] close $f 171 WO 2006/088962 PCT/US2006/005343 set starts [string first "<pre>Seg-entry " $page] if {$starts>=0} { incr starts 5 set stops [string first "</pre>" $page $starts] if {$stops>0) {incr stops -11 else {set stops end} set page [string range $page $starts $stops] array set PAGE [ncbiasn $page] set PAGE(Eseqdata) {} foreach (p seqdata} [array get PAGE *.seq-data.ncbieaa.*] set q [string map {inst.seq-data.ncbieaa id.0.ddbj.accession) $p] if {[info exists PAGE($q)] && ![string equal $PAGE($q) $accessionnumber]} continue regsub -all {\s+) $seqdata {} seqdata lappend PAGE(%seqdata) [string tolower $seqdata] if ([llength $PAGE(%seqdata)]>0) { set PAGE(%definition) {} foreach (p v} [array get PAGE *.descr.*.article.title.*.name.>] (lappend PAGE definitionio) $v) set PAGE(%definition) [join $PAGE(%definition) " "] set PAGE(%source) unknown foreach {p v} [array get PAGE *.source.org.taxname.>] set PAGE(-source) $v set PAGE(%strain) unknown foreach {p v} [array get PAGE *.subtype.>] if ([string equal $v strain]) { set PAGE(%strain) $PAGE([string map {.subtype.> .subname.>) $p]) if {[string equal $PAGE(Ostrain) unknown) && [regexp {$([^)]*(\([^)]*$)?)\)\s*$} $PAGE(%source) -> strain] set PAGE(%strain) $strain set slash [string last / $PAGE(Wstrain)] if ($slash>=0 && [regexp -start $slash {/([0-9]+)} $PAGE(%strain) -> y]} if {$y<30} {incr y 2000} elseif {$y<100} {incr y 19001 set PAGE(%year) $y } else { set PAGE(%year) unknown foreach (p y} [array get PAGE *.year.>] if {$y<30) (incr y 2000} elseif {$y<100} {incr y 1900) if {$PAGE(%year) eq "unknown" 11 $PAGE(%year)>$y} {set \PAGE(%year) if ([regexp {$([^()]+)$\s*$) $PAGE(%strain) -> s]) { set PAGE(%serotype) $s } elseif {[regexp {$([^()]+)$\s*$) $PAGE(%source) -> s ) set PAGE(%serotype) $s } else { set PAGE(%serotype) unknown file mkdir $S(path)/cache set f [open $S(path)/cache/$accessionnumber.prop w] foreach (p v) [array get PAGE] {puts $f "[list $p] [list $v]") close $f http::cleanup $token fetch-page-NCBI proc fetch-complete {{ capture { variable S file mkdir $S(path)/cache set f [open $S(path)/cache/query w) foreach {p v) $S(cache) {puts $f "[list $p] [list $v]I} 172 WO 2006/088962 PCT/US2006/005343 close $f # This is the new code that traverses the amino sequence data and compares # against the peptide pattern in the peptide search window. proc match { capture variable S unset -nocomplain S(fetching) unset -nocomplain S(retry) set S(props) [lsort -dictionary [glob'-nocomplain ,$S(path)/cache/*.prop"]] log "Matching [llength $S(props)] entries." for (set q 0} {$q<[llength $S(props)]} {incr q) set fn [lindex $S(props) $q] set accessionnumber [file tail [file root $fn]] status $accessionnumber [expr ($q+1}] [llength $S(props)] set f [open $fn r] array set PAGE [read $f] close $f set results {} foreach sequence $PAGE(%seqdata) set sequence [string tolower $sequence] array unset F * set machine {} for {set i 0; set j -1} {$i<[string length $sequence]} {incr i; incr j} lappend machine 0 $i $j 0 while {[llength $machine]} foreach (state 11 ul gap} $machine break; set machine rangee $machine 4 end] incr ul if ($state>=[llength $S(seqpattern)]} set "F($11 [expr {$ul-1)])" 1 ) elseif {[regexp {'+] (.*)$) [lindex $S(segpattern) $state] aminos]i { set aminos [string tolower $aminos] if {[string first [string index $sequence $ul] $aminos]>=O} lappend machine [expr {$state+1}] $11 $ul 0 } } elseif {[regexp {'-(.*)$} [lindex $S(seqpattern) $state] - aminos]) set aminos [string tolower $aminos] if {[string first [string index $sequence $ul] $aminos]<0} lappend machine [expr {$state+1] $11 $ul 0 elseif {[regexp {':(\d+):(\d+)$) [lindex $S(seqpattern) $state] mingap maxgap] i if {$gap<$mingap) lappend machine $state $11 $ul [expr {$gap+1}] } else { if {$gap<$maxgap} lappend machine $state $11 $ul [expr ($gap+l}] lappend machine [expr ($state+1)] $11 [expr ($ul-l}] 0 } else error "bad segpattern element; [lindex $S(seqpattern) $state]" set result [lsort -integer -index 0 [array names F]] if {[llength $result]) { lappend results [list [llength $result] $result $sequence] if lengthgh $results]) set results [lsort -integer -index 0 $results] set PAGE(%subsequences) [lindex $results end 1] set PAGE(%sequence) [lindex $results end 2] } else { set PAGE(%subsequences) {} set PAGE(%sequence) [lindex $PAGE (%seqdata) 0) 173 WO 2006/088962 PCT/US2006/005343 if {[llength $PAGE(%subsequences)]) { array set PAGE (%pass 1 %reason "Matched amino acid pattern,") } else { array set PAGE {-apass 0 %reason "Did not match amino acid pattern.,,} if {$PAGE(%pass)) { foreach {kind descriptor string) $S(criteria) if {![string equal $kind must-have]) continue catch {unset exists) foreach {p extent) [array get PAGE descriptorr] regsub -all {{-[:alnum:]@%_+=/]+) $extent { } extent foreach word [split [string tolower [string trim $extent]] " set exists($word) 1 regsub -all {[^-[:alnum:]@%_+=/]+) $string { string set any 0 foreach word [concat [split [string tolower [string trim $string]] "Junknown] set any [info exists exists($word)] if {$any) break if {!$any} { array set PAGE [list %pass 0 %reason "Missing must have word: $string"] break if {$PAGE(%pass)) foreach (kind descriptor string) $S(criteria) if ([string equal $kind must-have]) continue array set PAGE {%-pass 1 %reason "Matched amino acid pattern.") catch {unset exists) foreach {p extent) [array get PAGE $descriptor] regasub -all ([^-[:alnum:]_]+) $extent { } extent foreach word [split [string tolower [string trim $extent]] " "] set exists($word) 1 regsub -all {[^-[:alnum:]_]+} $string { } string foreach word [split [string tolower [string trim $string]] " "] if {[string equal $kind exclude]== [info exists exists($word)]} array set PAGE [list %pass 0 %-reason "$kind word: $word"] break if {$PAGE(Wpass)) break if {$PAGE(%pass)) { log "Match pass $accessionnumber ([llength $PAGE(%subsequences)] matches): $PAGE(%reason)" } else log "Match fail $accessionnumber: $PAGE(.reason)" file mkdir $S(path)/cache set f [open $S(path)/cache/$accessionnumber.prop w] foreach (p v} [array get PAGE] (puts $f "[list $pj [list $v]"} close $f if ($PAGE(Wpass)) (incr S(pass)} else (incr S(fail)) incr S(seen); count $S(seen) $S(pass) $S(fail) update ::report $S(path) $S(term) status "Completed," 0 0 completed proc report {{ variable S 174 WO 2006/088962 PCT/US2006/005343 toplevel .reportmessage wm overrideredirect .reportmessage 1 pack [label .reportmessage.1 -text "Creating pass/fail report."] -padx 20 -pady 10 update ::report $S(path) $S(term) update destroy reportmessage 4################################################################# # REPORT ##########################################f######################### # Format the search results into set of pages. This reporting is # adapted the report generation of GenomeExplorer. proc report {path term) capture { set fns sortt -dictionary [glob -nocomplain "$path/cache/*.prop"J] eval file delete [list "$path/fail.htm"] [glob -nocomplain "$path/pass/*.htm"] foreach fn $fns { array unset PAGE * set f [open $fn r] array set PAGE [read $f] close $f if {$PAGE(%pass)) { set passing [open $path/pass/$PAGE(%accession).htm w] puts $passing "<html><head><title>$PAGE(%accession)</title></head><body> puts $passing "<Hl>$PAGE(%accession)</H1>" puts $passing "<dl>" puts $passing "<dt>PubMed Code:<dd>$PAGE(%link)" puts $passing "<dt>Description:<dd>$PAGE(%definition)" puts $passing "<dt>Isolated:<dd>$PAGE(%year)" if {![string equal $PAGE(!%source) unknown]} (puts $passing "<dt>Source:<dd>$PAGE(source)") if {![string equal $PAGE(%strain) unknown]) (puts $passing "<dt>Strain:<dd>$PAGE(%strain)") if {![string equal $PAGE(%serotype) unknown]} {puts $passing "<dt>Serotype:<dd>$PAGE(%serotype)"} puts $passing "</dl>" set n 0 array set where (Amino-terminal {} Mid-molecule ( Carboxy-terminal {}} foreach ch (split $PAGE(%sequence) ""] set C($n) " <b>$ch</b><SUP>[expr {$n+1I}</SUP>" set S($n) 0 incr n foreach pg $PAGE(%subsequences) foreach {p q} $pq break if ($p<=$n/3} { lappend where(Amino-terminal) $p $q elseif {$p<=(2*$n)/3) { lappend where(Mid-molecule) $p $q ) else { lappend where(Carboxy-terminal) $p $q for {set i $p} {$i<=$q) (incr i} (inor S($i)) set r [llength $PAGE(%subsequences)] puts $passing "<p>1 set wascol black puts -nonewline $passing "<font color='black'>" for (set i 0) ($i<$n} {incr i) { if {$S($i)) (set iscol red) else (set iscol black) if ( [string equal $iscol $wascol]} puts $passing '</font>" puts -nonewline $passing "<font color='$iscol'>" 175 WO 2006/088962 PCT/US2006/005343 set wascol $iscol puts -nonewline $passing $C($i), puts $passing "</font></p><dl>" foreach w (Amino-terminal Mid-molecule Carboxy-terminal} puts $passing "<dt>$w</dt><dd>" if {[llength $where($w)]} { foreach {p q) $where($w) { puts $passing "<p>" for (set i $p} ($i<=$q) {incr i} { puts -nonewline $passing $C($i) puts $passing </p> puts $passing "</pd> " } else { puts $passing "Zero subsequences.' puts $passing "</dl>" close $passing if {![string equal $PAGE(%year) unknown} { if {![info exists mindate] $PAGE(%year)<$mindate} {set mindate $PAGE(%year)) if {1[info exists maxdate] $PAGE(%year)>$maxdate) {set maxdate $PAGE(%-year) if {![info exists nsubs($PAGE(%year))} { set nsubs($PAGE(year)) 0 set sum($PAGE(%year)) 0.0 set sumsq($PAGE(Wyear)) 0.0 set reference($PAGE(%year))"" incr nsubs($PAGE(%year)) 1 set sum($PAGE(%year)) [expr {$sum($PAGE(%year)) + $r}] set sumsq($PAGE(%year)) [expr ($sumsq($PAGE(%year)) + $r*$r}] append reference($PAGE(%year)) "\n$PAGE(%lJink) <a href='pass/$PAGE(%accession).htm'>$r</a>" else { if {![info exists failing]) set failing [open $path/fail.htm w] puts $failing {<html><head><title>Failing Sequences</title></head><body>} puts $failing "<p><b>query;</b> $term" puts $failing "<p>$PAGE(%link): $PAGE(Wreason)" set i 1 foreach e [split $PAGE(%sequence) ""{ puts -nonewline $failing " <b>$e</b><SUP>$i</SUP>" incr i puts $failing catch {close $failing) set stats [open $path/report.htm w] puts $stats "<html><head><title>Subsequences Analysis</title></head><body> <Hl>Subsequences Count by Year</Hl> <H2>$term</H2> <TABLE> <TR> <TH align='center valign='top' >Year</TH> <TH align='center' valign='top'>PubMed Accession Number-Subsequences Count< /TH> <TH align='center' valign='top'>No. of Isolates per year</TH> <TH align='center' valign='top'>Mean Subsequences Count per year</TH> <TH align=center' valign='top'>S.D.</TH> </TR> set Y {} if {[info exists nsubs(unknown)]} (lappend Y unknown} if {![info exists mindate]} {set mindate 2000) if {![info exists maxdate]) (set maxdate 20001 for {set y $mindate) ($y<=$maxdate) {incr y} {lappend Y $y} 176 WO 2006/088962 PCT/US2006/005343 foreach y $Y if {[info exists nsubs($y)]} set mean [expr {$sum($y)/$nsubs($y))] # var = (sum (x - m)^2)/(n-1) # = (sum (x^2 - 2xm + m^2))/(n-1) # = (sum x^2 - 2m sum x + m^2 sum 1)/(n-1) # = (sumsq - 2*m*sum + n m^2)/(n-1) if {$nsubs($y)==l) { set sd 0.0 else { set sd [expr {sqrt(($sumsq($y) + $nsubs($y)*$mean*$mean 2*$mean*$sum($y))/($nsubs($y)-1))}] } puts $stats "<TR> <TD align='center' valign='top'>$y</TD> <TD align='left' valign='top'>$reference($y)</TD> <TD align='right' valign='top'>$nsubs($y)</TD> <TD align='right' valign='top'>[format %.1f $mean]</TD> <TD align='right' valign='top'>[format %.lf $sd]</TD> </TR>" } else { puts $stats "<TR><TD align='center' valign=ltop'y$y</TD><TDy</TD><TD></TD><TDy</TD><TD></TD></TR>"I puts $stats "</TABLE>" close $stats set index [open $path/index.htm w] puts $index "<html><head><title>Index</title></head><body><dl>" set fns [glob -nocomplain "$path/pass/*"] if {[llength $fns]) { puts $index "<dt>Pass:<dd>" foreach fn $fns {puts $index "<a href='pass/[file tail $fn]'>[file tail [file root $fnj J</a>"1) puts $index "</dd>" puts $index "<dt>Report:<dd><a href='report.htm'>Subsequences by year</a></dd>" if ([file exists $path/fail.htm]) { puts $index "<dt>Fail:<dd><a href='fail.htm'>Failing sequences</a></dd>" close $index if ([pref openHTML default 0} { exec osascript -e "open location \"file://127.0.0.1[file join $path/index.htm]\"" ###t~t#####it###it#~###########################t###t#i########## # MANUAL CHECKLIST WINDOW ####iitti#####t#t## #######i#####i######i#########ti##########i####### # The database query started in query-start extracts information # about the protein, and the match function assigns a pass/fail # whether the protein matches the pattern. This window allows # manual editting of the query information presented in the report # and alterring the pass/fail result. proc manualchecklist spacee path title) capture { array set S [list \ file-menu { {Save and Close} {%S::save) (Command-S) {Command-s} (Cancel and Close) {%S::quit) {Command-W} (Command-w) title "$title Manual Checklist" \ zoomed 0 \ path $path \ sspace $sspace \ 177 WO 2006/088962 PCT/US2006/005343 ] foreach {space w} [eval window .manual%U [array get S]] break namespace eval $space { variable A; array set A {accessions {}} proc load-array {} variable S variable A set fns sortt -dictionary [glob -nocomplain "$S(path)/cache/*.prop"] foreach fn $fns { array unset PAGE * set f [open $fn r] array set PAGE [read $f] close $f lappend A(accessions) $PAGE(%accession) array set A [list \ $PAGE(accession):pass $PAGE(%pass) \ $PAGE(taccession):reason $PAGE(%reason) \ $PAGE(%accession):uri $PAGE(%uri) \ $PAGE(%-accession):year $PAGE(%year) \ $PAGE(%accession):passO $PAGE(%pass) \ $PAGE(%accession):yearo $PAGE(%year) \ $PAGE(taccession):path $fn \ trace add variable $S(space)::A write $S(space)::manualChanged proc manualChanged (arrname subscript op) variable S variable A foreach {accession what} [split $subscript :] break if {[string equal $what pass]} { set A($accession:reason) "Manually set." proc manual-okay {} variable S variable A foreach accession $A(accessions) if {$A($accession:pass) =$A($accession:passo) || $A($accession:year) !=$A($accession:yearo)} array unset PAGE * set f [open $A($accession:path) r] array set PAGE [read $f] close $f if {$PAGE(%pass) I=$A($accession:pass)} { if {$A($accession:pass)} { set PAGE(%reason) "Manually set to pass." else { set PAGE(treason) "Manually set to fail." set PAGE(%pass) $A($accession;pass) set PAGE(%year) $A($accession:year) set f [open $A($accession:path) w] foreach {p v) [array get PAGE] { puts $f "[list $p] [list $v]" close $f proc save {{ variable S manual-okay [set S(sspace)]::report quit proc quit {{ 178 WO 2006/088962 PCT/US2006/005343 variable S destroy $5(window) after idle [list namespace delete $S(space)] proc zoom (w h x y) variable S set currgeom [wm geometry $S(window)] if {$S(zoomed) && [string equal $currgeom $S(zoom)]) set S(zoomed) 0 wm geometry $S(window) $S(geometry) else { set S(geometry) $currgeom if { [regexp {^=?(\d+)x(\d+) [+] (\d+) [+](\d+)$} $currgeom - w h x y]j foreach (w h x y} {0 0 0 0} break $S(window)]) if {$w>[winfo screenwidth $S(window)]) {set w [winfo screenwidth set S(zoom) ${w)x[expr ([winfo screenheight $S(window)]-50}]+$x+20 set S(zoomed) 1 wm geometry $S(window) $S(zoom) pack [frame $w.buttons] -side bottom -fill x -expand 1 -pady 5 pack [button $w.buttons.okay -text Okay -command '$(space)::save"] -side right -padx 20 pack [button $w.buttons.cancel -text Cancel -command "${space}::quit"] -side left ${space)::load-array upvar #0 $(space}::A A pack [scrollbar $w.y -orient vertical -command "$w.t yview"] -side right -fill y pack [text $w.t -yscrollcommand "$w.y set" -tabs {.7i .9i 1.li) -width 80] -side top fill both -expand 1 set num 0 foreach accession $A(accessions) $w.t insert end "$accession: \t" $w.t window create end -window [checkbutton $w.t.pass$num -variable ${space}::A($accession:pass) -text Pass] $w.t insert end "\t Year: " $w.t window create end -window [entry $w.t.year$num -textvariable ${space}::A($accession:year) -width 4] $w.t window create end -window [button $w.t.uri$num -text "URL" -command "exec open $A($accession:uri)"] $w.t insert end "\t$A($accession;reason)\n" incr num $w.t configure -state disabled # LOG # ###############4################################################## # Spyware to track program/user interaction. window .log \ title {Activity Log) \ file-menu { {Save as) {log save ) (Command-S) (Command-s} {Email) {log send-email) {} {) (Close} {log hide) (Command-W} {Command-w) zoomed 0 wm withdraw .log pack [scrollbar .log.y -orient vertical -command ".lcg.t yview"] -side right -fill y pack [text .log.t -yscrollcommand ".log.y set"] -side top -fill both -expand 1 wm protocol .log WM_DELETEWINDOW "log::hide" 179 WO 2006/088962 PCT/US2006/005343 namespace eval log { proc save { { set path [tkgetSaveFile -message 'Where to save the log" -title 'save Log"] if ([string length $path] } set f [open $path w] puts $f [string trim [$S(window) .t get 1.0 end]] $S(window) .t show end close $f proc quit {} variable S destroy $S (window) proc show {{ variable S wm deiconify $S(window) raise $5(window) proc hide {} variable S wm withdraw $S(window) proc zoom (w h x y) variable S set currgeom [wm geometry $S(window)] if {$S(zoomed) && [string equal $currgeom $S(zoom)]} set S(zoomed) 0 wm geometry $S(window) $S(geometry) ) else set S(geometry) $currgeom if {![regexp {^=?(\d+)x(\d+) [+] (\d+) [+] (\d+)$) $currgeom - w h x y]j} foreach {w h x y) {0 0 0 0) break if ($w>[winfo screenwidth $S(window)]} {set w [winfo screenwidth $S(window)]} set S(zoom) ${w}x[expr {[winfo screenheight $S(window)]-50}]+$x+20 set S(zoomed) 1 wm geometry $S(window) $S(zoom) proc send-email {} variable S exec mail -s {Log file) wyrmwif@rawbw.com << [string trim [$S(window) .t get 1.0 end]] proc tell (text} variable S set message "[clock format [clock seconds] -format %y/%m/%d.%H:%S:%S] .$text" puts stderr $message $S(window) .t insert end $message\n # PREFERENCES *# ############################4#####4####################gg~ggggggg # Preference panel. proc pref (var args) global preference if {! [info exists preference]} prefs:;:open-prefs switch lengthh $args] 0 { if {[info exists preference($var)] } 180 WO 2006/088962 PCT/US2006/005343 return $preference($var) else { return set preference($var) [lindex $args 0] prefs save if {![string equal [lindex $args 0] default]} error "usage: pref var default default-value" if {{info exists preference($var)]} return $preference($var) else { return {lindex $args 1] default error "usage: pref var ... " namespace eval prefs { variable S array set S path "-/Library/Preferences/com.omyx.drpepper.prefa.tcl" showing 0 factoryDefaults { debuggingMode 0 outputDirectory ~/Desktop openHTML 0 proc open-prefs ( variable S array set ::preference $S(factoryDefaults) catch (source $S(path)} proc save {{ variable S catch {cancel $S(pending)} set S(pending) [after 5000 prefs::save-immediate] } proc save-immediate {} variable S catch {cancel $S(pending)} set f [open $S(path) W] puts $f "array set ::preference {" foreach (p v} [array get ::preference) puts $f "[list $p] [list $vj puts $f "}" proc show {} variable S if {I$S(showing)) global preference preferences array set preferencel [array get preference] toplevel preferences wm title preferences Preferences wm geometry .preferences 400x90+100+100 place [ label .preferences.ldebug -text "Save for debugging:" -anchor e -x 0 -y 0 -relwidth 0.45 -height 25 place [ checkbutton .preferences.cdebug -text " " -variable preference1.(debuggingMode) 181 WO 2006/088962 PCT/US2006/005343 ] -relx .50 -y 0 -relwidth 0.45 -height 25 place [ label .preferences.lopen -text "Open results in browser:" -anchor e -x 0 -y 30 -relwidth 0.45 -height 25 place [ checkbutton .preferences.copen -text 1 " -variable preference(openHTML) ] -relx .50 -y 30 -relwidth 0.45 -height 25 place [ button .preferences.cancel -text Cancel -command destroy .preferences unset -nocomplain preferences ] -x 0 -rely 1.0 -y -35 -width 100 place [ button .preferences.reset -text Reset -command array set preferences [array get preference] } -relx 1.0 -x -210 -rely 1.0 -y -35 -width 100 place [ button .preferences.okay -text Okay -command array set preference (array get preferences] file mkdir $preference(outputDirectory) cd $preference(outputDirectory) set ch [open $prefs::S(path) w] puts $ch "array set preference {" foreach (p v} [array get preference] puts $ch "[list $p] [list $v]" puts $ch "}" close $ch destroy preferences unset -nocomplain preferences foreach p sortt [array names preference]] log "preference $p = $preference($p)" -relx 1.0 -x -100 -rely 1.0 -y -35 -width 100 place [ button .preferences.crap -text " " -command ] -relx 1.0 -x -100 -rely 1,0 -y -35 -width 100 lower .preferences.crap } 182

Claims

1. A method of identifying a Replikin Scaffold in a virus or organism comprising identifying a series of Replikin Scaffold peptides comprising about 16 to about 30 amino acids comprising (1) a terminal lysine and a lysine immediately adjacent to said terminal lysine; (2) a terminal histidine and a histidine immediately adjacent to said terminal histidine, (3) a lysine within about 6 to about 10 amino acids from another lysine; and (4) at least 6% lysines, wherein the method is used for diagnosis and treatment of diseases related to the virus or organism.

2. The method of claim 1 further comprising identifying an individual member or plurality of members of said series of Replikin Scaffold peptides.

3. The method of claim I further comprising identifying an Exoskeleton Scaffold wherein said series of Replikin Scaffold peptides is identified in a first series of virus, strain of virus, or organism and said Exoskeleton Scaffold is identified in a later-arising virus, strain of virus, or organism as compared to said first series of virus, strain of virus, or organism wherein said Exoskeleton Scaffold comprises an amino acid sequence comprising about the same number of amino acids as said Replikin Scaffold and further comprises (1) a terminal lysine and a lysine immediately adjacent to said terminal lysine, (2) a terminal histidine and a histidine immediately adjacent to said terminal histidine, and (3) no lysine within about 6 to about 10 amino acids from another lysine.

4. The method of identifying said Replikin Scaffold peptide of claim I further comprising identifying a second Replikin Scaffold peptide of claim 1, comparing said Replikin Scaffold peptide to said second Replikin Scaffold peptide and if said second Replikin Scaffold peptide is unchanged from said second Replikin Scaffold peptide, choosing either Replikin Scaffold peptide as a vaccine.

5. An isolated or synthesized influenza virus peptide consisting of from 7 to about 50 - 183 - C:\NRPonbrlCC\SCG\4 I K2012_ I DOC-29)2/20; 12 amino acids comprising the amino acid sequence KKNSTYPTIKRSYNNTNQEDLLVLWGIHH (SEQ ID NO: 15).

6. A preventive or therapeutic virus vaccine comprising at least one isolated or synthesized peptide of claim 5.

7. A preventive or therapeutic virus vaccine comprising a peptide consisting of the amino acid sequence KKNSTYPTIKRSYNNTNQEDLLVLWGIHH (SEQ ID NO: 15).

8. The preventive or therapeutic virus vaccine of claim 6 or claim 7 further comprising any of SEQ ID NO: 459, SEQ ID NO: 460; SEQ ID NO: 461; SEQ ID NO: 462; SEQ ID NO: 463; SEQ ID NO: 464; SEQ ID NO: 465; SEQ ID NO: 466; SEQ ID NO: 467; SEQ ID NO: 468; SEQ ID NO: 469.

9. The preventive or therapeutic virus vaccine of claim 7 further comprising SEQ ID NO: 469.

10. The preventive or therapeutic virus vaccine of claim 6, 7, 8, or 9 further comprising a pharmaceutically acceptable carrier and/or adjuvant.

11. The preventive or therapeutic virus vaccine of claim 7 further comprising Vaccine V120304U2.

12. A method of stimulating an immune system of a subject to produce antibodies to influenza virus comprising administering an effective amount of at least one isolated or synthesized influenza virus Replikin peptide of claim 5.

13. The method of claim 12 further comprising a pharmaceutically acceptable carrier and/or adjuvant and further preventing or treating an influenza infection.

14. The method of claim 12 wherein said isolated or synthesized influenza virus peptide is present in an emerging virus. - 184- CANRPonbM)CC\SCG\4 IX200I2_1 DOC-29/02/2012

15. The method of claim 12 or the isolated or synthesized influenza virus peptide of claim 5 wherein said isolated or synthesized influenza virus peptide consists of an amino acid sequence KKNSTYPTIKRSYNNTNQEDLLVLWGIHH (SEQ ID NO: 15).

16. A method of making a preventive or therapeutic virus vaccine comprising identifying a Replikin Scaffold comprising a plurality of Replikin Scaffold peptides comprising about 16 to about 30 amino acids and isolating or synthesizing at least one of said Replikin Scaffold peptides as a preventive or therapeutic virus vaccine wherein said Replikin Scaffold peptides comprise: (1) a terminal lysine and a lysine immediately adjacent to said terminal lysine; (2) a terminal histidine and a histidine immediately adjacent to said terminal histidine; (3) a lysine within about 6 to about 10 amino acids from another lysine; and (4) at least 6% lysines.

17. The method of claim 16 wherein said Replikin Scaffold peptide is present in an influenza virus.

18. An isolated functional derivative of the isolated or synthesized influenza peptide of claim 5 or claim 15.

19. A preventive or therapeutic virus vaccine comprising the functional derivative of claim 18.

20. A method of stimulating the immune system of a subject to produce antibodies to influenza virus comprising administering an effective amount of the functional derivative of claim 18.

21. An isolated antibody or antibody fragment that specifically binds to any one of the peptides of claim 5 or claim 15 or that binds to a functional derivative of claim 18.

22. A method according to any one of claims I to 4, or 12 to 17, or 20, or any isolated - 185 - C:\NRPortbl\DCOSCG\4 1 2(X12_ .DOC-29102/11 12 or synthesized influenza virus peptide according to any one of claims 5 or 15, or a preventive or therapeutic virus vaccine according to any one of claims 6 to 11, or 19, or an isolated functional derivative according to claim 18, or an isolated antibody or antibody fragment according to claim 21, substantially as hereinbefore defined. - 186 -