IE990955A1

IE990955A1 - Peptides capable of being recognized by antibodies induced against retro-viruses of human immunodeficiency (HIV viruses), their uses for the diagnosis of infections due to some of these viruses and, where appropriate, for vaccination against AIDS

Info

Publication number: IE990955A1
Application number: IE19990955A
Authority: IE
Inventors: Marc Alizon; Luc Montagnier; Denise Guetard; Francois Clavel; Mireille Guyader; Pierre Sonigo; Ronald Desrosiers; Pierre Tiollais; Lisa Chakrabarti
Original assignee: Pasteur Institut
Priority date: 1987-02-11
Filing date: 1988-02-11
Publication date: 2000-12-13
Also published as: ATE417108T1; DE3856595D1; EP0750041B1; NZ223494A; OA08716A; CA1341634C; CA1341520C; EP0750041A3; KR890700603A; KR0135885B1; EP0750041A2; MX169441B

Abstract

The invention relates to peptides having immunological properties in common with those of the peptide backbone of the HIV-2 class of viruses, in particular with the HIV-2 envelope glycoprotein, characterised in that they also have a peptide structure in common with the peptide backbone of SIV peptides, in particular with the SIV envelope glycoprotein. The invention also relates to compositions capable of detecting an infection due to HIV-2 and to compositions for vaccines.

Description

The present invention relates to peptides having immunological, even immunogenic, properties in common with antigens which can be obtained in a purified form from viruses capable of causing lymphadenopathies which, in man, may subsequently degenerate into the acquired immunodeficiency syndrome (AIDS).

The invention relates in particular to antigenic peptides capable of being recognized by antibodies induced in man by viruses designated by the abbreviation HIV according to the nomenclature defined in NATURE. The invention also relates to peptides having immunogenic properties or which are capable of being rendered immunogenic in vivo, this immunogenicity manifesting itself by the induction in vivo of antibodies recognizing antigens characteristic of the viruses HIV-2 and even, at least in the case of certain of these peptides, antigens derived from HIV-1.

The invention relates in addition to uses of these peptides for the mass production of compositions for the in vitro diagnosis in man of the latent existence of certain forms of AIDS and, in the case of some of the AIDS viruses, for the production of immunogenic compositions and of compositions to be used as vaccines against the HIV retroviruses.

Similarly, the invention relates to the uses for the same purpose of tffe antibodies capable of being induced in vivo by the immunogenic peptides or peptides made immunogenic and, in the case of some of these antibodies, to their uses for the production of medicinal active principles against these human AIDS.

The invention also relates to the utilization of some of these peptides in procedures for the in vitro diagnosis in man of some forms of AIDS as well as their use for making up diagnostic kits.

The first retrovirus designated LAV-1 or HIV-1 was isolated and described in the patent application GB.83/24.800 and in the application EP.84/401.834 of 14/09/84. This virus has also been described by F. Barre Sinoussi et al. in Science, 220 No. 45-99, 20 pages 868-871. - 2 IE990955 Variants of this HIV-1 virus designated as LAV ELI and LAV MAL have also been isolated, characterized and described in the patent application EP.84/401.834.

The HIV-1 viruses and their variants possess the following properties: - their preferred targets are the human Leu3 cells (or T4 lymphocytes) and the immortalized cells derived from them. - they have a reverse transcriptase activity requiring the presence of Mg2+ ions and exhibit a marked activity toward poly(adenylate-oligodeoxythymidylase) poly(A)-oligo(dT)12-18). - they have a density of 1.16 to 1.17 on a sucrose gradient, - they have an average diameter of 139 nanometers and a nucleus with an average diameter of 41 nanometers, - the lysates of these viruses contain a protein p25 (core protein) which immunologically does not cross-react with protein p24 of HTLV-1, - they contain a protein p42 as a constituent of their envelope, - they also contain a glycoprotein gpUO of molecular weight 110,000 in their envelope.

The isolation and characterization of retroviruses belonging to a distinct class which are immunologically related to those just mentioned only to a limited extent have been described in the European patent application No. 87/400.151.4. These retroviruses which have been grouped together under the designation HIV-2 were isolated from several African patients exhibiting symptoms of a lymphadenopathy or an AIDS.

The retroviruses of the HIV-2 type like the retroviruses of the HIV-1 type»are characterized by a tropism for the human T4 lymphocytes and by a cytopathogenic effect on these lymphocytes as a result of multiplying within them, thus giving rise either to generalized and persistent polyadenopathies or an AIDS.

More generally, the retroviruses purified by HIV-2 possess the following properties: - the preferred target of the HIV-2 retroviruses is constituted by human Leu3 cells (or T4 lymphocytes) and for immortalized cells derived from these T4 lymphocytes; - they are cytotoxic for human T4 lymphocytes - they have a reverse transcriptase activity requiring the presence of 2+ Mg ions and exhibit a marked activity for poly(adenylate-oligodeoxythymidylase) (poly(A)-oligo(dT) 12-18); - they have a density of 1.16 in a sucrose gradient; - they have a mean diameter of 140 nanometers and a nucleus having a mean diameter of 41 nanometers; - they can be cultured in permanent cell lines of the HUT type or those expressing the T4 protein; - they do not infect T8 lymphocytes; - the lysates of these viruses contain a protein p26 which immunologically does not cross-react with the protein p24 of the virus HTLV-I or of the virus HTLV-II; - in addition, these lysates contain a protein pl6 which is not recognized immunologically by the protein pl9 of HTLV-I or of HTLV-II in radioimmunoprecipitation assays. - in addition, they contain a glycoprotein having a molecular weight of the order of 130,000-140,000 in their envelope which immunologically does not cross-react with the gpllO of the HIV-1, but which, in contrast, does cross-react immunologically with the glycoprotein gpl40 of the envelope of STLV-III (a virus isolated from monkeys); - these lysates also contain antigens which can be labelled with 35 S-cysteine, the molecular weights of which range between 32,000 and 42,000-45,000: they include in particular an antigen having a molecular weight of the order of 36,000 and an antigen having the molecular weight of-’the order of 42,000, one of these antigens (p36 or p42) probably constituting a transmembrane glycoprotein of the virus HIV-2; - the genomic RNA of the HIV-2 does not hybridize with the genomic RNA of HIV-1 under stringent conditions; - under non-stringent conditions, the genomic RNA of HIV-2 hybridizes neither with the env gene and the LTR which is next to it of HIV-1 nor with the sequences of the pol region of the genome of HIV-1; - under non-stringent conditions, it hybridizes weakly with nucleotide sequences of the region of HIV-1.

Another retrovirus designated SIV-1, this designation replacing the earlier one of STLV III, was isolated from the rhesus macaque monkey (M. D. Daniel et al., Science 228, 1201 (1985); N. L. Letwin et al., - 4 10 Science 230, 71 (1985) under the name STLV-IIImac).

Another retrovirus, designated STLV-IIT '' (or SIV ) was isolated AGM AGM from wild green monkeys. However, in contrast to the virus present in the rhesus macaque monkeys, the presence of STLV-IIT does not seem AGM to induce a disease of the AIDS type in the African green monkey.

A strain of the SIV-lmac retrovirus was deposited with the CNCM on 7th February 1986 under the No. 1-521. Studies have shown that the retrovirus SIV-1 contains some proteins which are immunologically related to structural proteins or glycoproteins which can be isolated from HIV-2 under similar conditions. This retrovirus SIV-1, the infectious nature of which has been observed in monkeys, has been designated as STLV-III by the research scientists who have isolated it (literature references cited above).

For semantic convenience, these viruses will be referred to in the following discussion only by the expression SIV (the expression SIV is an abbreviation for Simian Immunodeficiency Virus) which may be followed by an abbreviation indicating the species of monkey from which it was derived, for example MAC (or mac) for the macaque or AGM for the African green monkey.

Using the same methods as those mentioned above, it has been observed that the SIV-1 mac also contains: - a major nuclear protein p27 having a molecular weight of the- order of 27 kilodaltons, — a major glycoprotein, gpl40, in its envelope, -¾ protein p32, probably transmembrane, which is hardly observed in 35 RIPA when the virus has been labelled beforehand with S-cysteine, but which can be observed in the form of broad bands on Western blots.

More precise studies have been carried out on the HIV-2 and SIV viruses. The continuation of the study of the HIV-2 retroviruses has also led to the preparation of complementary DNA sequences (cDNA) of the RNAs of their genomes. The complete nucleotide sequence of the cDNA of a representative retrovirus of the HIV-2 class (HIV-2 ROD) was deposited on 21/02/1986 with the CNCM under No. 1-522 and the reference LAV-II ROD).

This nucleotide sequence and the open-reading frames which it contains are indicated in the Figure 1A.

Furthermore, continuation of the study of other retroviruses has also made it possible to obtain their complete nucleotide sequences. This is the case in particular for the cDNA derived from the genomic RNA of SIV.

The cloning and sequencing of the SIV-lmac virus which led to the elucidation of its nucleotide sequence were carried out under the following conditions: The DNA of HUT 78 cells infected with the SIV virus (isolate STLV-III mac 142-83 described by Daniel et al. (1985) Science, 228, p. 1201-1204, partially digested by the restriction enzyme Sau3A was cloned at the BamHl site of the bacteriophage vector lambda ELBL3 in order to constitute a genomic library.The 2million recombinant phages of the genomic library thu)s constituted were screened in situ under P3 conditions of securitywith the aid of sequences of the HIV-2 virus, derived from clones of lambda-R0D4, lambda-ROD35 and E2 (Clavel et al., 1986, Nature, 324, p. 691.) and nicktranslated.

Hybridization was carried out in 5 x SSC at 50°C and washings in 2 x SSC at 50°C. A single clone containing all of the viral sequences was obtained. This clone was designated as lambda-SIV-1. The insert of the phage lambda-SIV-1 measures a total of 16.5 kb and includes an integrated provirus which lacks only the first 250 bases of the leftside of LTR whereas the right-side of LTR is complete.

The integrated provirus was sequenced by the dideoxynucleotide method after subcloning of random fragments in the phage M13mp8. 300 subclones were analyzed.

Fragments of cDNA derived from the clone lambda-SIV-1 inserted into plasmids pSIV-1.1 and pSIV-1.2 were deposited with the CNCM on 15th April 1987 under the numbers 1-658 (pSIV-1.1) and 1-659 (pSIV-1.2).

The results are mentioned in the figures described below.

Figure IB presents the nucleotide sequence of the viral genome of SIV and the sequences which have been deduced from it for the.viral proteins corresponding to the gene products gag, pol, env, Q, X, R, tat, art, F.

The Figures 3 to 11 and the Figure 1C present comparisons between HIV2 and SIVmac of the theoretical products of the viral genes and the LTR CXSIV-l).

The invention also relates to fragments of cDNA deduced from the cDNA derived from the entire genome of SIV-1, these fragments containing one or more sequences derived from the complete sequence of cDNA and which code for interesting peptides of the invention. These sequences are indicated in Figure IB and that relating to the LTR sequence of the virus in Figure 1C.

The nucleotide sequences of the cDNA of SIV were matched with the nucleotide sequences of the virus HIV-2 ROD with respect to the LTR sequence (Figure 1C). This type of match which is made for the entire genome by comparing Figure IB with Figures 3 to 11 makes it possible to locate or deduce the nucleotide sequences of the two viruses which have common essential structural elements.

The invention naturally also relates to the use of the cDNAs derived from SIV or their fragments (or recombinants containing them) both as probes for diagnosing the presence or absence of HIV-2 viruses in serum samples or other liquids or biological tissues obtained from patients suspected to be carriers of the HIV-2 virus. These probes are preferably also labelled (radioactive, enzymatic, fluorescent markers, etc). Probes of particular interest for the application of the diagnostic procedure for the HIV-II virus or a variant of HIV-2 may be characterized in that they contain the totality or a fraction of the cDNA complementary to the gene of the SIV virus or particularly the recombinant fragments contained in various clones.

The probes used in this diagnostic procedure for the HIV-2 virus and ijk the diagnostic kits are in no way limited to the probes described above. On the contrary, they comprise all of the nucleotide sequences derived from the genome of the SIV virus, of a variant of SIV or of a virus closely related in structure as long as they make it possible to detect in the biological fluids of persons suspected of developing an AIDS,antibodies directed against a HIV-2 or a virus which is closely related to it.

The detection may be carried out by any known method. It may comprise the placing in contact of these probes either with the nucleic acids obtained from cells contained in these sera or other biological fluids, for example cerebrospinal fluid, saliva, etc ... It may also involve placing these probes in contact with these fluids themselves once their nucleic acids have been made accessible to hybridization with these probes under conditions permitting hybridization between these probes and these nucleic acids to occur. The final step of in vitro diagnosis thus comprises the detection of the hybridization which may have occurred. The above-mentioned diagnostic procedure involving hybridization reactions may also be carried out with the aid of mixtures of probes derived from a HIV-2 and a SIV-1 or a HIV-1, a HIV-2 and a SIV, respective^’ if it is not necessary to distinguish between the type of virus searched.

Generally speaking, the diagnostic procedure for the presence or absence of the HIV-2 virus or a variant in serum samples or other liquids or tissues obtained from patients suspected of being carriers of the HIV-2 virus comprise the following steps: 1) at least one hybridization step conducted under stringent conditions by placing in contact the DNA of the cells of the sample taken from the suspected patient with one of the above-mentioned labelled probes on an appropriate membrane, 2) the washing of the said membrane with a solution which maintains these stringent conditions of hybridization, 3) the detection of the presence or absence of the HIV-2 virus by a method of immunodetection.

In another preferred embodiment of the procedure according to the invention the above-mentioned hybridization is conducted under nonstringent conditions and the washing of the membrane is carried out under conditions suited to those of the hybridization.

It will be obvious that the invention relates to the nucleic acids corresponding to sequences located in analogous regions of variants of SIV as well as to all of the nucleic acids in which modification would make it possible to take advantage of the degeneracy of the genetic code.

The comparative studies which have also enabled results ’to be obtained relating to the core proteins, hereafter termed gag proteins and to proteins of the envelope, hereafter referred to as env proteins, have also been reported in the European patent application No. 87/400.151.4 already mentioned. These results show that the core proteins (gag proteins) in HIV-2 exhibit less marked differences from those of the HIV-1 - 8 IE990955 viruses than the proteins of the envelope (env proteins). Overall, the env proteins of HIV-2 have been found to be immunologically related with the corresponding env proteins of the HIV-1 viruses to only a very slight degree or not at all.

On the other hand, comparative studies on the structures of the cDNA sequences of the HIV-2 and SIV viruses have demonstrated certain common properties which make their appearance at the level of the proteins.

Overall, the proteins of HIV-2 and SIV-1 show a considerable degree of immunological relatedness.

The major glycoprotein of the envelope of HIV-2 has been shown to be more closely related immunologically to the major glycoprotein of the envelope of SIV than to the major glycoprotein of the envelope of HIV-1.

These observations can be affirmed not only with regard to the molecular weights: 130-140 kilodaltons for the major glycoproteins of HIV-2 and SIV compared with about 110 kilodaltons for the major glycoproteir of the envelope of HIV-1, but also with respect to their immunological properties since sera taken from patients infected with HIV-2, and more particularly antibodies formed against the gpl40 of HIV-2 recognize the gpl40 of SIV-lmac, whereas in similar assays the same sera and the same antibodies of HIV-2 do not recognize the gpUO of HIV-1. However, the anti-HIV-1 sera which have never reacted with the gpl40 of HIV-2 precipitate a protein of 26 Kdal labelled with ^S-cysteine contained in the extracts of HIV-2.

The major core protein of HIV-2 seems to possess an average molecular weight (about 26,000) intermediate between that of the p25 of HIV-1 and the p27 of SIV.

These observations derive from assays carried out with viral extracts obtained from a HIV-2 isolated from one of the patients mentioned above. Similar results have been obtained with viral extracts of a HIV-2 isolated from another patient.

More extensive studies have led the inventors to recognize an initial class of peptides having sequences of amino acids which are either identical or very similar to sequences contained in the interior of the structures of the gag and env proteins of HIV-2 or SIV, or even HIV-1. These - 9 IE990955 peptides are particularly useful for the diagnosis of an infection by the HIV-2 virus or one of its variants in man.

In this respect, the present invention also relates to diagnostic procedures and compositions for the in vitro detection of antibodies directed against a HIV-2 virus or its variants, more particularly in biological samples, particularly in the sera of patients who have become infected with the HIV-2 virus, since some of these peptides make possible a particularly fine distinction between the infections due to HIV-2 viruses and those due to HIV-1 viruses.

These more exhaustive studies have also led to the possibility of synthesizing immunogenic peptides or peptides capable of being made immunogenic which exhibit structural characteristics enabling them to induce in vivo the production of antibodies capable of recognizing env proteins both of HIV-1 and HIV-2 and, at least in the case of some of these peptides, of binding both to HIV-1 viruses and to HIV-2 viruses, and resulting more particularly in their neutralization. The use of these latter types of peptides is thus particularly indicated for the production of active principles of vaccines against the HIV viruses, and hence against the AIDS.

In order to specify in the subsequent discussion the amino acid residues which constitute the peptides according to the invention, resort will be had to the code shown in the table which follows for those amino acids having an unequivocal meaning in the international nomenclature which designates each naturally occurring amino acid by a single letter (capital letter): M Methionine L Leucine I Isoleucine V Valine F Phenylalanine S Serine P Proline T Threonine A Alanine Y Tyrosine H Histidine Q Glutamine N Asparagine K Lysine D Aspartic acid E Glutamic acid C Cysteine W Tryptophan R Arginine G Glycine When, on account of its position within the chain of amino acids characteristic of a specific peptide, an amino acid can assume several meanings, it will be designated either by a dash if it can be any amino acid, or by a lower case letter when this amino acid can assume a limited number of preferred meanings, this number being, however, always higher than one. In this latter case, the possible meanings of this lower case letter will always be specified in relation to the peptide to which it belongs.

In order to facilitate reading, these peptides will be designated by an abbreviation env or gag followed by a number referring to sequences of amino acids containing either in the env proteins or in the gag proteins of certain HIV-1, HIV-2 or SIV. Reference will also be made to them in the ensuing discussion.

Finally, in the definitions which follow - tjie X groups represent either a free group or one amidated in particular by one or two alkyl groups containing from 1 to 5 carbon atoms, or a peptide group comprising from 1 to 5 amino acids, the N-terminal amino acid of which itself possesses a free group or one amidated as previously mentioned, and - the Z groups represent either a free -OH group or an alkoxy, group contain ing an alkyl group comprising from 1 to 5 carbon atoms, or a peptide group comprising from 1 to 5 amino acids, the C-terminal amino acid of which itself possesses a free -OH group of an alkoxy group as previously indicated, the groups of from 1 to 5 amino acids contained,as appropriate, in X or Z or in both at once being such that their presence is for the most part not incompatible with the preservation of the immunological or even immunogenic, properties of the peptides which do not contain them.

The peptides according to the invention which have immunological properties in common with antigens of HIV-2 and, in certain cases, also with antigens of HIV-1 or its variants are characterized in that they also have a peptide structure in common with the antigens of SIV. In an advantageous manner, these peptides normally contain a maximum of 40 amino acid residues.

The following are preferred peptides: env 1 XRV-AIEKYL-DQA-LN-WGCAFRQVCZ env2 X-LE-AQI-QQEKNMYELQKLNZ env3 ., XELGDYKLVEITPIG-APT—KR-----Z env4 X----VTV-YGVP-WK-AT--LFCA-Z env5 X---QE--L-NVTE-F--W-NZ env6 XL S-KPCVKLTPLCV—Z env7 jrX---N-S-IT--C-K----Z env8 X-I---YC-P-G-A-L-C-N-TZ env9 X------A_c------W--Z env 10 X-G-DPE------NC-GEF-YCN-----NZ env 1 1 C-IKQ-I------G---YZ More particularly, the invention relates to the following peptides: env 1 : XRV-AIEKYL-DQA-LN-WGCAFRQVCZ env2 X-LE-AQIQQEKNMYELQKLNZ env3 < XELGDYKLVEITPIG-APT--KR-----Z env4 X----VTV-YGVP-W—AT--LFCA-Z env5 X----E--L-NVTE-F--W-NZ env6 XL---S-KPCVKL-PLC---Z env7 X---N-S-I---C-K----Z env8 X-I---YC-P-G-A-L-C-N-TZ env9 X------A-C------W--Z env 10 X-G-DPE------NC-GEF-YC------NZ env1 1 X-----c_i_q_i------G---YZ Useful peptides corresponding to those just cited above are presented in the formulas which follow: envl XRVTAIEKYLQDQARLNSWGCAFRQVCZ, or XRVTAIEKYLKDQAQLNAWGCAFRQVCZ env2 XSLEQAQIQQEKNMYELQKLNSWZ, or XLLEEAQIQQEKNMYELQKLNSWZ env3 XELGDYKLVEITPIGFAPTKEKRYSSAHZ, or XELGDYKLVEITPIGLAPTNVKRYTTG-Z (It will be noticed that the peptides envl, env2, env3 testify to the very close relationship between HIV-2 and SIV-1. In fact, the first peptide is included in the genome of HIV-2 and the second in that of SIV-1) .. env4 XabcdVTVeYGVPfWogAThiLFCAjZ , in which the letters from a to j may have the following meanings: a is C, E or D b is T, K , D, N c is 0 or L d is Y or W e is F or Y f is T, V or A g is N or E h is I or T i is P or T j is T or S o is K or R env 5 XabcoEdeLfNVTEgFhiWjNZ, in which the letters from a to j may have the following meanings: a is D or P b is D or N c is Y or P djas I, V, I or L e is Τ, V, E or A f is V, G or E or - g is A, N, G or S h is D or N i is A or M j is N, K or E o is Q or S env6 XLabcSdKPCVKLoPLCuefKZ , in which the letters from a to f may have the following meanings: a b is is F or E or W D c is T or Q d is I or L e is A, S or f is M or L o is T or S u is V or I env7 XabCNxSylocdCeKfghiZ , in which the letters from a to i and x and y may have the following meanings : a is N or T or I b is H or S or N c is E or Q d is s, A or C e is D or P f is H, V or D g is Y or S h is w or F i is D or E X is T or R y is V or A o Is T or Q env8 XalbcdYCxPeGfAgLhCiNjTZ, in which the letters from a to k and x may have the following meanings: a b is is A R or P or P c is F, I or d is R or H e is P or A f is Y or F g is L or I h is R or K I is - or N j is D or K x is A or T env9 XwabcxyAdCefghizWjkZ, in which the letters from a to k and x to z may have the following meanings: a is K or - or E b is R or - c is P or M or I d is W or H or Y e is w or N or T or R f is F or I g is K or s or N or G h is G or R or E i is - or A or T j is K or N or D or S k is D or A or N or K or w is N, D or I X is R or G or K y is Q or K or R z is K or E or Q or N env 10 XaGbDPEcdefghNCiGEFjYCokxlmnNZ, ii^which the letters from a to n and x may have the following meanings : a is K or - or G b is S or G or - c is V or I d is A or V or T e is Y or T or M or f is M or H g is W or S h is T or F i is R or G j is L or F o is N or K k is M or S 1 is W or Q or K m is F or L n is L or F X is T or S or N env1 I 1 XabcdwCeloQfIxgyhizGjklYZ, in which the letters from a to 1 and w to z may have the following meanings : a is R of T or S or N b is N or I c is Y or T d is A or L or V e is H or R f is I or F g is T or M h is H or Q or A i is K or E J is R or K k is N or A 1 is V or Ii w is P or Q x N or K y is W or V z is V or T or K 0 is K or R The structure of the antigenic peptide coded by the gene gag and designated by gag 1 is also shown below: XDCKLVLKGLGaNPTLEEMLTAZ, in which the letter a specifies M or T.

It will be noticed that, generally speaking, the amino acids having an unequivocal meaning (hence represented by a capital letter in accordance with the international nomenclature) which appear in the definitions which precede of the peptides according to the invention are found to correspond to identical amino acids placed in the same order in the corresponding env or gag sequences of the env or gag protein of at least one of the HIV or of SIV-1? The positions of the sequences are underlined and located within the sequences of amino acids of the env proteins of HIV-2 ROD (CNCM Mo. 1-532) and HIV-1 BRU (CNCM No. 1-232) shown in Figure 2. Moreover, the alignments of the amino acids of the env and gag proteins of SIV-lmac (CNCM No. 1-521) and of HIV-2 ROD are presented in Figures 3 and 4.

The continuous lines which replace individual dashes at certain locations in these sequences is intended to underline the fact that certain amino acids contained in these. sequences have been deliberately omitted from the figure in order to allow identical amino acids to be aligned (in that case marked with an asterisk) or for two points to be aligned vertically in the sequences of the corresponding proteins of HIV-1 and HIV-2 on the one hand, and SIV and HIV-2 on the other.

In addition to the peptides already mentioned, the invention also relates to peptides modified by insertion and/or deletion and/or substitution of one or more amino acids provided that the antigenic or immunogenic properties of the said peptides are not modified and that the recognition properties of the antigen and the antibody for these peptides are not substantially modified.

In a particularly preferred embodiment the invention relates to peptides having immunological properties in common with the peptide skeleton of the glycoprotein of the envelope of the viruses of the class HIV-2, these peptides containing a number of amino acid residues not exceeding 40.

These preferred peptides according to the invention have the env 1 RVTAIEKYLQDQARLNSWGCAFRQVC AIEKYLQDQ RVSAIEKYLKDQAQLNAWGCAFRQVC AIEKYLKDQ env2 SLEQAQIQQEKNMYELQKLNSW QIQQEKN LLEEAQIQQEKNMYELQKLNSW env3 ELGDYKLVEITPIGFAPTKEKRYSSAH YKLVEITPIGFAPTKEK ELGDYKLVEITPIGLAPTNVKRYTTGYKLVEITPIGLAPTNVK env4 CTQYVTVFYGVPTWKNATIPLFCAT VTVFYGVPTWKNAT CIQYVTVFYGVPAWRNATIPLFCAT VTVFYGVPAWRNAT EKLWVTVYYGVPVWKEATTTLFCAS VTVYYGVPVWKEAT EDLWVTVYYGVPVWKEATTTLFCAS VTVYYGVPVWKEAT DNLWVTVYYGVPVWKEATTTLFCAS VTVYYGVPVWKEAT env5 DDYQEITL-NVTEAFDAWNN L-NVTEAF DDYSELAL-NVTESFDAWEN L-NVTESF PNPQEVVLVNVTENFNMWKN LVNVTENF PNPQEIELENVTEGFNMWKN LENVTEGF PNPQEIALENVTENFNMWKN LENVTENF env6 ETSIKPCVKLTPLCVAMK ETSIKPCVKLSELCITMR DQSLKPCVKLTPLCVSLK DQSLKPCVKLTPLCVTLN PCVKLTPLCV env7 NHCNTSVITESCD NTSVIT NHCNTSVIQECCD NTSVIQ TSCNTSVITQACP NTSVIT INCNTSVITQACP NTSVIT INCNTSAITQACP NTSAIT env8 YCAPPGYALLRC-NDT YCAPAGFAILKCNNKT YCAPAGFAILKCNDKK YCAPAGFAILKCRDKK env9 NKRPRQAWCWFKG-KWKD NERPKQAWCRFGG-NWKE N—MRQAHCNISRAKWNA D--IRRAYCTINETEWDK I--IGQAHCNISRAQWSK env10 KGSDPEVAYMWTNCRGEFLYCNMTWFLN NCRGEFLYCN GG-DPEVTFMWTNCRGEFLYCKMNWFLN NCRGEFLYCK -GGDPEIVTHSFNCGGEFFYCNSTQLFN NCGGEFFYCN -GGDPEITTHSFNCRGEFFYCNTSKLFN ,NCRGEFFYCN -GGDPEITTHSFNCGGEFFYCNTSGLFN NCGGEFFYCN 1C env 1 1 RNYAPCHIKQIINTWHKVGRNVY CHIKQII RNYVPCHIRQIINTWHKVGKNVY CHIRQII TITLPCRIKQFINMWQEVGKAMY CRIKQFI SITLPCRIKQIINMWQKTCKAMY CRIKQII NITLQCRIKQIIKMVAGR-KAIY CRIKQII qaq1 DCKLVLKGLGTNPTLEEMLTA The peptides according to the invention may also advantageously be prepared by the standard methods applied in the field of peptide synthesis. This synthesis may be carried out in homogeneous solution or on a solid phase. ' For example, use can be made of the methods of synthesis in homogeneous solution described by HOUBENWEYL in the monograph entitled Methoden der Organischen Chemie, edited by E. Wunsch, vol. 15-1 and II, THIEME, Stuttgart, 1974.

These methods of synthesis consist of condensing successive amino acids two at a time in the required order, or of condensing amino acids with fragments already formed and already containing several amino acids in the appropriate order or, again, of condensing several fragments previously prepared in this way, it being understood that precautions will be taken to protect beforehand all of the reactive functions present on these amino acids or fragments with the exception of the amino function of one and the carboxyl function of the other which must be free to enter into the formation of the peptide bond, particularly after activation of the carboxyl function according to the well-known methods of peptide synthesis. As an alternative, recourse may be had to coupling reaction making use of standard coupling reagents of the carbodiimide type such as, for example, l-ethyl-3-(3-dimethyl-'aminopropyl) carbodiimide.

When the amino acid used possesses an additional acid function (particularlj in the case of glutamic acid) such functions must be protected by t-bustyl ester groups, for example.

JLn the case of step-wise synthesis in which one amino acid is added at a time, the synthesis starts preferably with the condensation of the C-terminal amino acid with the amino acid which corresponds to the neighbouring aminoacyl in the desired sequence and it continues in this manner until the N-terminal amino acid is reached. According to another preferred method of the invention use may be made of the procedure described by R. D. MERRIFIELD in the article entitled Solid phase peptide synthesis (J. Am. Soc., 45, 2149-2154).

For the preparation of a peptide chain according to the procedure of MERRIFIELD it is necessary to make use of a very porous polymeric resin to which is attached the first amino acid, the C-terminal residue of the chain. This amino acid is attached to the resin through its carboxyl group and its amino function is protected by the t-butoxycarbonyl group, for example.

When the, C-terminal acid is attached to the resin as the first amino acid, the protecting group of the amine function is removed by washing the resin with an acid. In the case in which the protecting group of the amine function is the t-butoxycarbonyl group, it can be removed by treatment of the resin with trifluoroacetic acid.

Subsequently, the second amino acid, which furnishes the second aminoacyl of the desired sequence counting from the C-terminal aminoacyl _ residue, is coupled to the deprotected amine function of the Cterminal amino acid, the first amino acid attached to the resin. The carboxyl function of this second amino acid is activated preferably for example with dicyclohexylcarbodiimide and its amine function is protected for example by means of the t-butoxycarbonyl group.

In this way, the first part of the desired peptide chain is obtained consisting of two amino acids, the terminal amine function of which is protected. The amine function is deprotected as previously described and subsequently the third aminoacyl is coupled under conditions analogous to those used for the addition of the amino acid next to the C-terminal amino acid. In this way, one after the other, each of the amino acids which will constitute the peptide chain is coupled to the deprotected amine group of the portion of the peptide chain already. formed and which is attached to the resin.

When the desired peptide chain has been assembled in its entirety, the protecting group of the different amino acids constituting the peptide chain are removed and the peptide is cleaved from the resin by means of hydrogen fluoride, for example.

The invention also relates to water-soluble oligomers of the abovementioned peptide monomers. Oligomerization can bring about an increase of the immunogenicity of the peptide monomers according to the invention. It may be mentioned that these oligomers may, for example, contain from two to ten monomeric units without implying that this number is to be considered as limiting.

The monomeric units forming this oligomer are either all constituted of the polypeptide of sequence 1 or by the polypeptide of sequence 2 or by both of these polypeptides.

For the preparation of the oligomer use may be made of any method of polymerization commonly used in the field of peptides, this polymerization reaction being continued until an oligomer or a polymer is obtained which contains the number of monomeric units required for the acquisition of the desired immunogenicity.

One method of oligomerization or polymerization of the monomer consists in allowing the latter to react with a cross-linking agent such as glutaraldehyde.

Use may also be made of other methods of oligomerization or coupling, for example that involving successive coupling of monomeric units through their terminal carboxyl and amine functions in the presence of homo- or hetero-bifunctional coupling agents.

For the production of molecules containing one or more sequences of 17 amino acids such as those defined above, use may also be made of genetic engineering techniques using micro-organisms transformed by a specific nucleic acid containing the corresponding, appropriate nucleotide sequences.

The invention also relates to the nucleic acids containing one or more sequences derived from the sequence of the cDNA of the virus HIV-2 ROD.

These sequences, located by the numbers shown on the sequence previously described, code for some interesting peptides of the invention. Sequence coding for env1 nucleotides 7850 to 7927 ♦· H env2 II 8030 to 8095 <1 tc •n env3 14 7601 to 7636 44 H env4 ·· 6170 to 6247 4« env5 41 6294 to 6 349 14 M env6 M 6392 to 6445 11 *4 env7 N 6724 to 67 63 M « env8 ♦« 6794 to 6838 M env9 ♦4 7112 to 7162 M N env 10 « 7253 to 7336 M env11 M 7358 to 7426 M H qad1 M 1535 to 1597 Finally, the invention relates to nucleic acids corresponding to the SIV virus containing one or more sequences derived from the cDNA of the virus SIV-1. These sequences coding for the peptides env 1 to env 11 and gag 1 can be located on Figure 3 by comparison with the corresponding sequences described for HIV-2.

It will be obvious that the invention relates to the nucleic acid corresponding to sequences located in analogous regions of the cDNA derived from variants of HIV-2 ROD or SIV as well as all of the other nucleic acids representing modifications of those already described which result from exploiting the degeneracy of the genetic code.

The invention also concerns the conjugates obtained by covalently coupling the peptides according to the invention (or the above-mentioned oligomers) to carrier molecules (naturally occurring or synthetic), physiologically acceptable and non-toxic, through the intermediary of complementary reactive groupings situated on the carrier molecule and the peptide. Examples of appropriate groupings are illustrated in what follows: As examples of carrier molecules or- macromolectilar supports constituting part of the conjugates according to the invention, mention will be made of naturally occurring proteins such as tetanus anatoxin ovalbumin, serum albumins, hemocyanins, etc. ...

As examples of synthetic macromolecular supports mention may be made of the polylysines or poly(D,L-alanine)-poly(L-lysine).

Other types of macromolecular supports which can be used are mentioned in the literature; usually they have a molecular weight higher than 20,000.

In order to synthesize the conjugates according to the invention use may be made of known procedures such as that described by FRANTZ and ROBERTSON in Infect, and Immunity, 33, 193-198 (1981) or that described in Applied and Environmental Microbiology, (October 1981), vol. 42, No. 4, 611-614 by P. E. KAUFFMAN by using the peptide and the appropriate carrier molecule.

In practice, and without implying any restriction on the use of others, it is advantageous to use the following compounds as coupling agents: glutaraldehyde, ethyl chloroformate, water-soluble carbodiimides /N-ethyl-N’(3-dimethylamino-propyl) carbodiimide, HC1/, diisocyanates,'bis-diazobenzidine, di- and trichloro-s-triazines, cyanogen bromide as well as the coupling agents mentioned in Scand. J. Immunol., (1978), vol. 8, p. 7-23 (AVRAMEAS, TERNYNCK, GUESDON). - 25 IE990955 It is possible to use any other coupling procedure which causes one or more reactive functions of the peptide, on the one hand, to react with one or more reactive functions of the support molecules, on the other. Advantageously, these reactive functions are carboxyl and amine functions which can undergo a coupling reaction in the presence of a coupling agent of the type used in the synthesis of proteins, for example, l-ethyl-3(3-dimethylamino-propyl)-carbodiimide, N-hydroxybenzotriazole, etc ...

Use may again be made of glutaraldehyde, particularly when it is required to link amino groups to each other which are situated on the peptide and the support molecule respectively.

The peptides according to the invention possess antigenic properties. They may thus be used in diagnostic procedures for the detection of an infection by the HIV-2 virus.

As has already been mentioned, studies have made it possible to distinguish between two groups of peptides which can be used in procedures for the detection of antibodies to the HIV-2 virus in a human biological fluid, particularly serum or cerebrospinal fluid.

The first group (I) includes the gag 1 peptides. These peptides recognize anti-HIV-2 antibodies and are thus capable of detecting infection by HIV-2. To a certain extent they also recognize anti-HIV-1 antibodies.

The second group (II) includes peptides which correspond more especially to those which are located in the transmembrane portion and at the terminus of the external part of the protein of the envelope.

These peptides are those previously designated by env 1, env 2 and env 3. -»= - - They make possible the specific recognition of the presence of antibodies to HIV-2 and thus make it possible to distinguish in a given individual past or present infections due to a HIV, more particularly between those caused by a HIV-2 and those caused by a HIV-1.

The invention also relates to a composition containing at least one of the above-mentioned peptides or at least one oligomer of this peptide, characterized in that it has the capacity to be recognized by sera of human origin containing antibodies against the HIV-2 virus.

The invention relates to an in vitro diagnostic procedure involving one or more peptides according to the invention for the detection of antibodies to HIV-2 in biological fluids, in particular in human sera.

Generally speaking, the above in vitro diagnostic procedure comprises the following steps: - the placing in contact of this biological fluid with the said peptides, - the detection of the possible presence of an antibody-peptide complex by physical or chemical methods in the said biological fluid.

In a preferred embodiment of the invention, the detection of the antibody-antigen complex is carried out by means of immunoenzymatic assays (of the ELISA type), immunofluorescence assays (of the IFA type), radioimmunological assays (of the RIA type) or radioimmunoprecipitation assays (of the RIPA type).

In this sense, the invention also relates to any peptide according to the invention labelled by means of an appropriate marker of the enzymatic, fluorescence, radioactive, etc ... type.

Such methods comprise for example the following steps: - deposition of defined amounts of a peptide composition according to the invention in the wells of a microtitre plate, - introduction into the said wells of increasing dilutions of the serum requiring diagnosis, - incubation of the microtitre plate, - repeated rinsing of the microtitre plate, - introduction into the wells of the microtitre plate of labelled antibodies against immunoglobulins of the blood, the antibodies having been labelled with an enzyme chosen from among those which are capable of hydrolyzing a substrate, thus leading to a change in the absorption spectrum of the latter at at least one specific wavelength, - detection of the amount of substrate hydrolyzed in comparison with a control.

The invention also relates to kits for the in vitro diagnosis of the presence of antibodies to the HIV-2 viruses and, in certain cases, the HIV-1 viruses in a biological fluid. The kits contain: - a peptide composition according to the invention, - the reagent necessary for constituting a medium appropriate for performing the immunological reaction, - the reagents used to detect the antibody-antigen complex produced by the immunological reaction. These reagents may also comprise a marker or be capable of being recognized in turn by a labelled reagent, more particularly in the case in which the above-mentioned polypeptide composition is not labelled. - a. reference biological tissue fluid not containing antibodies recognized by the above-mentioned polypeptide composition.

The invention relates to the antibodies themselves formed against the peptides of the invention. It will be obvious that these antibodies are not limited to polyclonal antibodies. The invention also relates to any monoclonal antibody produced by any hybridoma capable of being formed by standard methods from the spleen cells of an animal,in particular of a mouse or a rat immunized against one of the peptides of the invention, on the one hand, and the cells of an appropriate myeloma cell line on the other, and of being selected by its capacity to produce monoclonal antibodies recognizing the peptide initially used for the immunization of the animals.

The invention also relates to immunogenic compositions for the production of vaccines, the active principle of which is constituted by at least one peptide according to the invention, or an oligomer of this peptide or a peptide in the form of a conjugate with a carrier molecule, characterized in that they induce the production of antibodies against the above-mentioned peptides in amounts sufficient to inhibit the proteins of the HIV-2 retrovirus as well, and even indeed the HIV-2 retrovirus in combination with a pharmaceutically acceptable vehicle.

The immunogenic compositions for the production of vaccines contain more particularly in an advantageous manner at least one of the peptides previously designated as env4, env5, env6, env7, env8, env9, envlO, envll and even mixtures of them.

Of the peptides suitable for constituting active principles of vaccines some are particularly preferred because they possess a basic amino acid structure corresponding to regions of the glycoproteins of the envelope which exhibit a considerable degree of conservation, not only in the HIV-2 and SIV viruses, but also in the HIV-1 viruses. These particularly preferred peptides are the peptides designated as env4, and certain peptides of env5, env6 and envlO.

In a preferred embodiment of the invention the immunogenic peptides (or fragments of these peptides), suitable for constituting active principles of vaccines are chosen from among those, the formulas of which correspond to sequences which, in the glycoproteins of the envelopes of HIV-2, SIV and HIV-1, exhibit an amino acid sequence homology greater than 50%, belong to the external part of the envelope of the virus, show no or almost no deletions and contain cysteine residues favourable to the stabilization of bonds and to the constitution of bonding loops.

The following peptides belong to this class of preferred peptides. env4 XVTV-YGVP-W--ATZ env5 ' XL-NVTE-FZ env6 XKPCVKL-PLC-Z env7 XN-S-I-Z env 10 XNC-GEF-YC-Z env 1 1 XC-I-Q-IZ Advantageous pharmaceutical compositions are constituted by solutions, suspensions or injectable liposomes containing an efficacious dose of at least one product according to the invention. These solutions, suspensions and liposomes are preferably prepared in a 25ssterilized isotonic aqueous phase, preferably saline or a glucose solution, The invention relates more particularly to those suspensions, solutions and liposomes which are suitable for administration by intradermal, intramuscular and subcutaneous injections or even by scarifications.

The invention also relates to pharmaceutical compositions which can be administered by other routes, in particular by the oral route.

The pharmaceutical compositions according to the invention which can be used as vaccines to stimulate the production of antibodies against the HIV-2 virus can, for example, be administered at doses varying between 10 and 500 pg/kg, and preferably between 50 and 100 pg/kg, of peptide according to the invention.

These doses are cited as examples but do not imply any restriction on the dose which may be used.

As has already been indicated above the various peptides which have been specified may contain modifications which do not have the effect of modifying fundamentally their immunological properties. The peptide equivalents which result form part of the claims which follow.

As examples of peptide equivalents, mention will be made of those with structures corresponding to equivalent regions in the cDlQAs of other variants of HIV-2, SIV or HIV-1 when these regions are aligned under conditions similar to those which were mentioned above with regard to HIV-2 ROD, SIV and HIV-1 BRU. As other examples of these peptides mention will be made of those with structures corresponding to equivalent regions in the cDNAs which were deposited with the CNCM in particular those under the numbers 1-502, 1-642 (HIV-2 IRMO), 1-643, (HIV-2 ΕΗ0) as well as, in appropriate cases, variants of HIV-1 which were deposited with the CNCM under the numbers 1-232, 1-240, 1-241, 1-550, 1-551.

The peptides according to the invention can also be defined by the following formulae (in which X, Z and the dashes have the meanings indicated above): XRV-AIEKYL -DQA-LN-WGCAFRQVCZ XAIEKYL-DZ X-LE-AQIQQEKNMYELQKLNSWZ XQIQQEKNZ XELGDYKLVEITPIG-APT--KR-----Z XYKLVEITPIG-APT--KRZ X----VTV-YGVP-W--AT--LFCA-Z XVTV-YGVP-W--ATZ X----E--L-NVTE-F--W-NZ XL-NVTE-FZ XL---S-KPCVKL-PLC----Z XKPCVKL-PLC-Z XS-KPCVKL-PLC-Z X---N-S-I-—r-C-Z XN-S-I-Z XYC-P-G-A-L-C-N-TZ X------A-C------W--Z NKRPRQAWCWFKG-KWKD X-G-DPE------NC-GEF-YC------NZ X-----C-I-Q-I------G---yz The invention also relates^in addition to the peptides of SIV already described^ to the proteins coded by the cDNA of the SIV virus.

It also relates to the proteins of any virus immunologically closely related to SlV-lmac, in particular any virus, the proteins and the glycoproteins of the envelope of which cross-react immunologically and the cDNAs of which exhibit a sequence homology of at least 95% and preferably at least 98%.

In particular, the invention relates to: 1) the proteins and glycoproteins of the envelope encoded in the env gene and shown in Figure 3, 2) the protein GAG shown in Figure 4, 3) the protein POL shown in Figure 5, 4) the protein 0 shown in Figure 6, 5) the protein R shown in Figure 7, 6) the protein X shown in Figure 8, 7) the protein F shown in Figure 9, 8) the protein TAT shown in Figure 10.

The amino acids of the previously mentioned proteins of SIV have been presented in alignment with the sequences of the amino acids of the corresponding proteins of the HIV-2 virus; the points aligned vertically between the two sequences indicate amino acids common to the proteins of the two viruses.

The cDNA sequences coding for the previously mentioned proteins appear in the Figure IB. In addition to the nucleic acid sequences previously mentioned, the invention relates to any modified nucleic acid sequence which also codes for the proteins of the retrovirus SIV or one of its variants.

These cDNA sequences indicated by numbers shown on the sequences previously described (Figure IB) are the following: -sequent: «1 II e coding for II II GAG . POL. Q, X, nuc1eotides 55 1 to 2068 1726 4826 5298 to 4893 to 5467 to 5633 II · — i II R, 5637 to 5939 II *1 F, 8569 to 9354 II II TAT- 1 5788 to 6084 II II ART-1 6014 to 6130 It II TAT-2 8296 to 839 1 <1 II ART-2 8294 to 8548 _ K *1 ENV 6090 to 8732 Thus, the invention quite naturally relates to the proteins previously described when they are isolated from the SIV virus or when they are prepared by a method of synthesis, in particular by one of the methods already cited in connection with the synthesis of shorter peptides.

The invention also relates to the use of the preceding proteins for the diagnosis of the possible presence of antibodies directed against the proteins of HIV-2, and even against the whole HIV-2 virus, and, in some cases, to their use for the purposes of diagnosis of an infection due to one of the HIV viruses. Thus, the peptide GAG encoded by the corresponding gene can be used to locate the possible presence of antiHIV-1 or anti-HIV-2 antibodies. The env proteins are used preferably for the specific diagnosis of an infection due to HIV-2 or one of its variants, and sometimes for the diagnosis of an infection due to HIV-2 or HIV-1.

Thus, the invention also relates to an in vitro diagnostic procedure for the detection of antibodies against HIV-2 and possibly against HIV-1 in biological fluids and in particular in human sera.

Such procedures entailing the use of the proteins of SIV previously mentioned for diagnostic purposes have already been described in the present invention.

The invention also relates to kits for the in vitro diagnosis of the presence of antibodies against the HIV-2 virus and, in certain cases, - 33 IE990955 against HIV-1 in a biological fluid. Such kits employing the peptides previously mentioned have also been described in the present invention.

The invention also relates to immunogenic compositions for the production of vaccines, the active principle of which is constituted z advantageously by at least a part of the ENV protein of the SIV virus and this protein may exist as a conjugate formed with a carrier molecule. These immunogenic compositions induce the production of antibodies against the above-mentioned peptide in sufficient amounts to inhibit the proteins of the retrovirus HIV-2, and even the retrovirus ll HIV-2 itself.

However, the use for diagnostic purposes of the proteins of SIV is in no way limited only to the proteins ENV and GAG. Other proteins among those described may be considered for the preparation of compositions for diagnosis and even as vaccines.

Claims

1. Nucleotide sequence characterized in that it corresponds to the nucleotide sequence shown in Figure IB or Figure 1C, or in that it contains the nucleotide sequence shown in Figure IB or Figure 1C, or in that it is part of the sequence shown in Figure IB or Figure 1C, said part of sequence coding for a peptide recognized by antibodies present in the serum of a patient infected with a HIV-2 retrovirus or which can be used as a probe for the detection of the presence of a HIV-2 retrovirus in a biological sample.

2. Nucleotide sequence according to Claim 1, characterized in that it comprises one of the following sequences identified in Figure IB or Figure 1C. GAG extending from nucleotides 550 to 2068 POL 1726 to 4893 Q 4826 to 5467 X 5298 to 5633 R 5637 to 5939, F 8569 to 9354 TAT-1 5788 to 6084 ART-1 6014 to 6130 TAT-2 8296 to 8391 ART-2 8294 to 8548 LTR 8950 to 9468 and Ito 316 ENV 6090 to 8732

3. Nucleotide sequence according to Claim 1 or Claim 2, characterized in that it comprises one of the following sequences identified in the sequence shown in Figure 1A: - sequence corresponding to nucleotides 7850 to 7927 - sequence corresponding to nucleotides 8030 to 8095 - sequence corresponding to nucleotides 7601 to 7636 - sequence corresponding to nucleotides 6170 to 6247 - sequence corresponding to nucleotides 6294 to 6349 - sequence corresponding to nucleotides 6392 to 6445 - sequence corresponding to nucleotides 6724 to 6763 - sequence corresponding to nucleotides 6794 to 6838 - sequence corresponding to nucleotides 7112 to 7162 : i - sequence corresponding to nucleotides 7253 to 7336 - sequence corresponding to nucleotides 7358 to 7426 - sequence corresponding to nucleotides 1535 to 1597

4. Nucleotide sequence characterized in that it is a sequence 5. According to Claim 4, modified in accordance with the degeneracy of the genetic code.

5. Nucleotide sequence according to Claim 1, characterized in that it is a sequence contained in the plasmid pSIV-1.1 (CNCM I658) or in the plasmid pSIV-1.2 (CNCM 1-659). 10 6. Nucleotide sequence according to Claim 1, characterized in that it codes for a polypeptide selected from : POLrod ° r POLmaC shown in Figure 5 QROD or QMAC shown in Figure 6 PROD or Rmac shown in Figure 7 15 XROD or XmaC shown in Figure 8 FROD or FmaC shown in Figure 9 TATrod or TATmac shown in Figure 10 ARTrod or ARTmac shown in Figure 11 7. Nucleotide sequence according to any one of the Claims 1 to 6, 20 characterized in that it is labelled. 8. Use of a nucleotide sequence according to any one of the Claims 1 to 7 as probe for the detection of a HIV-2 retroviral infection in a biological sample. 9. Recombinant nucleic acid characterized in that it comprises a 25 nucleotide sequence according to any one of the Claims 1 to 6 inserted into a nucleic acid derived from a vector. 10. Procedure for the diagnosis of the presence or absence of the HIV-2 virus or a variant in samples of sera or other fluids or tissues obtained from patients suspected of being carriers of the 30 HIV-2 virus comprising : - at least one hybridization step conducted under stringent conditions by placing the cellular DNA of the sample taken from the suspect patient in contact with a probe according to Claim 7 on a suitable membrane, 35 - washing of said membrane with a solution which maintains these stringent hybridization conditions, - detection of the presence or absence of the HIV-2 virus by an immunodetection method. 11. Procedure according to Claim 10, characterized in that the hybridization step is conducted under non-stringent conditions and in that the washing of said membrane is performed with a solution which maintains the non-stringent conditions. F. R. KELLY & CO., AGENTS FOR THE APPLICANTS 1/35 FIG. 1. A i HIV2.R0D !-> R GTCGCTCTGCGGAGAGGCTGGCAGATTGAGCCCTGGGAGGTTCTCTCCAGCACTAGCAGG ·····« TAGAGCCTGGGTGTTCCCTGCTAGACTCTCACCAGCACTTCGCCGGTGCTGGGCAGACGG . . . 100 R j US CCCCACGCTTGCTTGCTTAAAAACCTCTTAATAAAGCTGCCACTTACAAGCAAGTTAAGT • · · · · ’ · GTCTGCTCCCATCTCTCCTAGTCGCCGCCTGGTCATTCGGTGTTCACCTGAGTAACAAGA • 200 · · « . •dCCTGGTCTGTTAGGACCCTTCTTGCTTTGGGAAACCGACGCAGGAAAATCCCTAGCAGG • · · . . 300 TTGGCGCCTCAACAGGGACTTGAAGAAGACTGAGAAGTCTTGGAACACGGCTGAGTGAAG ····«· GCAGTAAGGGCGGCAGGAACAAACCACGACGGAGTGCTCCTAGAAAGGCGCGGGCCGAGG • · · 400 · . i'ACCAAAGGCAGCGTGTGGAGCGGGAGGAGAAGAGGCCICCGGGTGAAGGTAAGTACCTA caccaaaaactgtagccgaaagggcttgctatcctacctttacacaggtagaagattgtg « 500 « · · · MetGlyAlaArgAsnSerValLeuArgGlyLysLysAlaAspGluLeuGluArglle GGAGATGGGCGCGAGAAACTCCGTCTTGAGAGGGAAAAAAGCAGATGAATTAGAAAGAAT • «.· * · 600 ArgLeuArgProGlyGlyLyaLysLysTyrArgLeuLysHlsIleValTrpAlaAlaAsn CAGGTTACGGCCCGGCGGAAAGAAAAAGTACAGGCTAAAACATATTGTGTGCGCAGCGAA • · · ... LyaLeuA6pArgPheGlyLeuAlaGluSerLeuLeuGiuSerLysCluGlyCysGlnLys TAAATTGGACAGATTCGGATTAGCAGAGACCCTGTTGCAGTCAAAAGAGGGTTCTt/.AAA • · .700 . . IleLeuThrValLeuAepProHetVaIProThrGlySerGluAsaLeuLysSerLeuPhe AATTCTTACAGTTTTAGATCCAATGGTACCGACAGGTTCACAAAATTTAAAAAGTCTTTT • · · · « · As aThrValCysVallleTrpCyalleKisAlaGluGluLysValLyaAspThrGluGly 7AATACTGTCTGCGTCATTTGGTGCATACACGCAGAAGAGAAAGTGAAAGATACiGAAGG • 800 .... AlaLyeGloIle7alArgArgHiaLeuValAlaGluThrGlyThrAlaGluLyeHetPro AGCAAAACAAATACTGCGGAGACATCTAGTGGCAGAAACACGAACTGCACACAAAATGCC FIG. 1A 2/35 SerThrSerArgProThrAlaProSerSerGluLysGlyGlyAsnTyrProValGlnHis AAGCACAAGTAGACCAACAGCACCATCTAGCGAGAAGCGAGCAAATTACCCAGTCCAACA • ♦ · « · Va lGlyGlyAsnTyrThrHisI leProLeuSerProArgThrLeuAsnAlaTrpValLys TGTAGGCGGCAACTACACCCATATACCGCTGAGTCCCCGAACCCTAAATCCCTGGGTAAA 1000 LeuValGluGluLysLysPheGlyAlaGluValValProGlyPheGlnAlaLeuSerGlu ATTAGTAGAGGAAAAAAAGTTCGGGGCAGAAGTAGTGCCAGGATTTCAGGCACTCTCAGA • · · · · · GlyCysThrProTyrAspIleAsnGlnKetLeuAsnCysValGlyAspHisGlnAlaAla AGGCTGCACGCCCTATGATATCAACCAAATGCTTAATTGTGTGGGCGACCATCAAGCAGC • 1100 · . « · HetGlnllelleArgGluIlelleAsnGluGluAlaAlaGluTrpAspValGlnHisPro CATGCAGATAATCAGGGAGATTATCAATGAGGAAGCAGCAGAATGGGATGTGCAACATCC • * · · · 1200 IleProGlyProLeuProAlaGlyGlnLeuArgGluProArgGlySerAspIleAlaGly AATACCAGGCCCCTTACCAGCGGGGCAGCTTAGAGAGCCAAGGGGATCTGACATAGCAGG • · · ♦ · · ThrThrSerThrValGluGluGlnlleGlnTrpMetPheArgProGlnAsnProValPro GACAACAAGCACAGTAGAAGAACAGATCCAGTGGATGTTTAGGCCACAAAATCCTGTACC * » · 1300 · » ValGlyAsnlleTyrArgArgTrpIleGlnlleGlyLeuGlnLyeCyeValArgMetTyr AGTAGGAAACATCTATAGAAGATGGATCCAGATAGGATTGCAGAAGTGTGTCAGGATGTA • · · · · · AenProThrAenlleLeuAepIleLysGlnGlyProLysGluProPheGlnSerTyrVal CAACCCGACCAACATCCTAGACATAAAACAGGGACCAAAGGAGCCGTTCCAAAGCTATGT • 1400 · · · AspArgTheTyrLys SerLeuArgAlaGluGInThrAepProAlaVaILysAsnTrpMet AGATAGATTCTACAAAAGCTTGAGGGCAGAACAAACAGATCCAGCAGTGAAGAATTGGAT . . . . . 1500 ThrGlnThrLeuLeuValGlnA8nAlaAsnProAspCysLyeLeuValLeuLyeGlyLeu GACCCAAACACTGCTAGTACAAAATGCCAACCCAGACTGTAAATTAGTGCTAAAAGGACT • J- · · · · ♦ GlyMetAsnProThrLeuGluGluMetLeuThrAlaCysGInGlyVaIGlyGlyProGly AGGGATGAACCCTACCTTAGAAGAGATGCTGACCGCCTGTCAGGGGGTAGGTGCGCCAGG ,·. * · 1600 · · GlnLysAleArgLeuMet AlaGluAlaLeiiLy aGluVallleGlyP roAlaPro I lePro CCAGAAAGCTAGATTAATGGCAGAGGCCCTGAAAGAGGTCATAGGACCTGCCCCTATCCC • · « · * · PheAlaAlaAlaGInGInArgLysAlaPheLysCyeTrpAenCyeGlyLyeGluGlyBie ATTCGCAGCAGCCCAGCAGAGAAAGGCATTTAAATGCTGGAACTGTGGAAAGGAAGGGCA * 1700 . · · · SerAlaArgGlnCyeArgAlaProArgArgGlnGlyCyeTrpLyeCyeGlyLysProGly CTCGGCAAGACAATGCCGAGCACCTAGAAGGCAGGGCTGCTGGAAGTGTGGTAAGCCAGG . . . . . 1800 TbrGlyArgPhePheArgTbrGlyProLeuGly HisIleMetTbrAenCyeProAepArgGlnAlaGlyPheLeuGlyLeuGlyProTrpGly ACACATCATGACAAACTGCCCAGATAGACAGGCAGGTTTTTTAGGACTGGGCCCTTGGGG • · · · * · Ly8GluAlaProGlnLeuProArgGlyProSerSerAlaGlyAlaAepThrA6nSerThr Lye LyeProArgAenPheProValAlaGlnValProGlnGlyLeuTbrProThrAlaPro AAAGAAGCCCCGCAACTTCCCCGTGGCCCAAGTTCCGCAGGGGCTGACACCAACAGCACC . . . 1900 ProSerGlySerSerSerGlySerThrGlyGluIleTyrAlaAiaArgGluLyeThrGlu ProValAepProAlaValAepLeuLeuGluLyeTyrMetGInGInGlyLyeArgGInArg CCCAGTGGATCCAGCAGTGGATCTACTGGAGAAATATATGCAGCAAGGGAAAAGACAGAG • · · · · · ArgAla.G luArgGluThrIleG InGlySerAspArgGlyLeuTbrAlaProArgAlaGly GluGlnArgGluArgProTyrLyeGluVaIThrGluAspLeuLeuHisLeuGluGInGly AGAGCAGAGAGAGAGACCATACAAGGAAGTGACAGAGGACTTACTGCACCTCGAGCAGGG ( fig. lA-suite 1) 3/35 GlyA6pThrIleGlnGlyAlaThrAsnArgGlyLeuAlaAlaProGlnPheSerLeuTrp GluThrProTyrArgCluProProThrGluAspLeuLeuKisLeuAsnSerLcuPheGl GGAGACACCATACAGGCAGCCACCAACACACCACTTCCTGCACCTCAATTCTCTCTTTGG • · · · · 2100 Ly6ArgProV’alValThrAlaTyrIleGluGlyGlnProValGluValLeuLeuAspThr LysAspG ln AAAAGACCAGTAGTCACAGCATACATTGAGGGTCAGCCAGTAGAAGTCTTGTTAGACACA «···*« GlyAlaAspAspSerlleValAlaGlylleGluLeuGlyAsnAsnTyrSerProLysIle GGGGCTGACGACTCAATAGTAGCACGAATAGAGTTAGGGAACAATTATAGCCCAAAAATA . . . 2200 ValGlyGlylleGlyGlyPhelleAsnThrLysGluTyrLysAenValGlulleGluVal GTAGGGGGAATAGGGGGATTCATAAATACCAAGGAATATAAAAATGTAGAAATAGAAGTT ····*· LeuAsnLysLysValArgAlaThrlleHetThrGlyAspThrProIleAenllePheGly CTAAATAAAAAGGTACGGGCCACCATAATGACAGGCGACACCCCAATCAACATTTTTGGC . 2300 . ArgAsnlleLeuThrAlaLeuGlyMetSerLeuAsnLeuProValAlaLysValGluPro AGAAATATTCTGACAGCCTTAGGCATGTCATTAAATCTACCAGTCGCCAAAGTAGAGCCA . . . . . 2400 IleLyellelletLeuLysProGlyLysAspGlyProLysLeuArgGlnTrpProLeuThr ATAAAAATAATGCTAAAGCCAGCGAAAGATGGACCAAAACTGAGACAATGGCCCTTAACA • · » · · · LysGluLysIleGluAlaLeuLysGluIleCysGluLyeMetGluLysGluGlyGlnLeu AAAGAAAAAATAGAAGCACTAAAAGAAATCTGTCAAAAAATGGAAAAAGAAGGCCAGCTA • · · 2500 . · GluGluAlaProProThrAsnProTyrAsnThrProThrPheAlalleLysLysLysAsp GAGGAAGCACCTCCAACTAATCCTTATAATACCCCCACATTTGCAATCAAGAAAAAGGAC • · - · · · · LysAsnLysTrpArgMe tLeuI1eAspPheArgGluLeuAeuLysVaIThrGlnAspPhe AAAAACAAATGGAGGATGCTAATAGATTTCAGAGAACTAAACAAGGTAACTCAAGATTTC • 2600 · · « · ThrGluIleGlnLeuGlylleProHieProAlaGlyLeuAlaLyeLysArgArglleThr ACAGAAATTCAGTTAGGAATTCCACACCCAGCAGGGTTGGCCAAGAAGAGAAGAATTACT . . . . . 2700 ValLeuAspValGlyAspAlaTyrPheSerlleProLeuHieGluAspPheArgProTyr GTACTAGATGTAGGGGATGCTTACTTTTCCATACCACTACATGAGGACTTTAGACCATAT • · . · · « * ThrAlaPheThrLeuProSerValAsnAsnAlaGluProGlyLysArgTyrlleTyrLye ACTGCATTTACTCTACCATCAGTGAACAATGCAGAACCAGGAAAAAGATACATATATAAA • · · 2800 « · ValLeuProGlnGlyTrpLyeGlySerProAlallePheGlnHisThrMetArgGlnVal GTCTTGCCACAGGGATGGAAGGGATCACCAGCAATTTTTCAACACACAATGAGACAGGTA • · · · · . · LeuGluProPheArgLysAlaAsnLyeAspValllellelleGlnTyrMetAspAepIle TTAGAACCATTCAGAAAAGCAAACAAGGATGTCATTATCATTCAGTACATGGATGATATC • 2900 · · · * · LeuIIeAlaSerAspArgThrAspLeuGluHisAspArgValValLeuGlnLeuLysGlu TTAATAGCTAGTGACAGGACAGATTTAGAACATGATAGGGTAGTCCTGCAGCTCAAGGAA 3000 LeuLeuAenGlyLeuGlyPheSerThrProAs.pGluLysPheGlnLysAspProProTyr CTTCTAAATGGCCTAGGATTTTCTACCCCAGATGACAAGTTCCAAAAAGACCCTCCATAC • ······ HiaTrpMetGlyTyrGluLeuTrpProThrLysTrpLysLeuGlnLysIleGInLeuPro CACTGGATGGGCTATGAACTATGGCCAACTAAATGGAAGTTGCAGAAAATACAGTTGCCC • « · 3100 · · GlnLysGluIleTrpThrValAsnAspIleGlnLyeLeuValGlyValLeuAsnTrpAla / CAAAAAGAAATATGGACAGTCAATGACATCCAGAAGCTAGTGGGTGTCCTAAATTGGGCA / . - Xfig.lA-suite 2) 4/35 AlaGlnLeuTyrProGlyIleLysThrLyeHieLeuCyeA»gLeuIleArgGlyLyeKet GCACAACTCTACCCACGGATAAAGACCAAACACTTATGTAGGTTAATCAGAGGAAAAATG 3200 . . . . TbrLeuThrGluGluValGlnTrpThrGluLeuAlaGluAIaGIuLeuGluGluAenArg ACACTCACAGAAGAAGTACAGTGGACAGAATTAGCAGAAGCAGAGCTAGAAGAAAACAGA • · · ·· · 3300 IlelleLeuSerGlaGluClnCluGlyHisTyrTyrClnGluGluLytGluLeuGluAla ATTATCCTAAGCCAGGAACAAGAGGGACACTATTACCAAGAAGAAAAAGAGCTACAAGCA TSrValClnLysAspGluGluAsnGluTrpThrTyrLysIleHisGlnGluGluLysIle AGAGTCCAAAAGGATCAAGAGAATGAGTGGACATATAAAATACACCAGGAAGAAAAAATT LeuLyeValGlyLysTyrAlaLysValLyeAenThrHieThrAenGlylleArgLeuLeu CTAAAAGTAGGAAAATATGCAAAGGTGAAAAACACCCATACCAATGGAATCAGATTGTTA *··«·· AlaGlnValValGlnLyeIleGlyLyeGluAlaLeuVa1IleTrpGlyArgl1eProLys GCACAGGTAGTTCAGAAAATAGGAAAAGAAGCACTAGTCATTTGGGGACGAATACCAAAA 3500 . PheHieLeuProValGluArgGluIleTrpGLuGlnTrpTrpAepAenTyrTrpGlnVal TTTCACCTACCAGTAGAGAGAGAAATCTGGGAGCAGTGGTGGGATAACTACTGGCAAGTG « « · · · 3600 ThrTrpIleProAspTrpAspPheValSerThrProProLeuValArgLeuAlaPheAen ACATGGATCCCAGACTGGGACTTCGTGTCTACCCCACCACTGGTCAGGTTAGCGTTTAAC • · · · · · LeuValGlyAepProIleProGlyAlaGluThrPheTyrThrAspGlySerCysAanArg CTGGTAGGGGATCCTATACCAGGTGCAGAGACCTTCTACACAGATGGATCCTGCAATAGG . . . 3700 GlnSerLysGluGlyLysAlaGlyTyrVaIThrAepArgGlyLysAspLye ValLysLye CAATCAAAAGAAGGAAAAGCAGGATATGTAACAGATAGAGGGAAAGACAAGCTAAAGAAA ·····« LeuGluGInThrThrAsnGInGInA1aGluLeuGluA1aPheAlaMetAlaLeuThrAsp CTAGAGCAAACTACCAATCAGCAAGCAGAACTAGAAGCCTTTGCGATGGCACTAACAGAC « 3800 « · · · SerGlyProLye VelAenllelleValAepSerGlnTyrValHetGlylleSerAlaSer TCGGGTCCAAAAGTTAATATTATAGTAGACTCACAGTATGTAATGGGGATCAGTGCAAGC . . . . . 3900 GInProThrGluSerGluSerLyeIleValAenGlnllelleGluGluMetIleLyeLye CAACCAACAGAGTCAGAAAGTAAAATAGTGAACCAGATCATAGAACAAATGATAAAAAAG • s e· · · · * GluAlalleTyrValAlaTrpValProAlaHieLyeGlylleGlyGlyAenGlnGluVal GAAGCAATCTATGTTGCATGGGTCCCAGCCCACAAAGGCATAGGGGGAAACCAGGAAGTA . . . 4000 As pHisLeuVaISerGInGlylleArgGlaValLeuPheLeuGluLyeIleGluProAla GATCATTTAGTGAGTCAGGGTATCACACAAGTGTTGTTCC.TGGAAAAAATAGAGCCCGCT • ♦ · « ·. · GlnGluGluHieGluLysTyrHieSerAenValLysGluLeuSerHiaLyePheGlylle CAGGAAGAACATGAAAAATATCATAGCAATGTAAAAGAACTGTCTCATAAATTTGGAATA . 4100 « . · · · P r-o As nLe u Va 1A1 aArgG In 11 e Va ΙΑβ n S e r Cy e A1 aG InCy 8 G InG InLy sG ly G lu CCCAATTTAGTGGCAAGGCAAATAGTAAACTCATGTGCCCAATGTCAACAGAAAGGGGAA . . . . · 4200 AlalleHiaGlyGlnValAenAiaGluLeuGlyThrTrpGlnMetAepCyeThruisLeu GCTATACATGGGCAAGTAAATGCAGAACTAGGCACTTGGCAAATGGACTGCACACATTTA • · · · · * * GluGlyLyallellelleValAlaValHieValAlaSerGlyPhelleGluAlaGluVal gaaggaaagatcattatagtagcagtagatgttgcaagtggatttatagaagcagaagtc . . . 4300 IleProGInGluSerGlyArgGInThrAlaLeuPheLeuLeuLyeLeuAlaSerArgTrp atcccacaggaatcaggaagacaaacagcactcttcctattgaaactggcaagtaggtgg (f ig.l'A-suite 3) 5/35 ProIleThrHieLeuHieThrAspAsnGlyAlaAenPheThrSerGlaGluValEyeMet CCAATAACACACTTGCATACAGATAATGGTGCCAACTTCACTTCACAGGAGGTGAAGATG . 4400 ValAlaTrpTrpIleGlylleGluGlnSerPheGlyVelProTyrAaaProGlnSerGla GTACCATGGTGGATAGGTATAGAACAATCCTTTGGAGTACCTTACAATCCACAGAGCCAA . . . . . 4500 GlyVolValGiuAlaMetA6nHisHi8LeuLysAanGlnIle£fcrGluThrileValLeu GGAGTAGTAGAAGCAATGAATCACCATCTAAAAAACCAAATAAGTGAAACAATAGTACTA • » « · · · Me tAlaileHi eCyaMetAenPheLyeArgArgGlyGlyIleGlyAepMetThrProSer ATGGCAATTCATTGCATGAATTTTAAAAGAAGGGGGGGAATAGGGGATATGACTCCATCA • · · 4600 . · GluArgLeuIleAcnMetlleThrThrGluGlnGluIleGlnPheLeuGlaAlaLysAaa GAAAGATTAATCAATATGATCACCACAGAACAAGAGATACAATTCCTCCAAGCCAAAAAT • · · · · · SerLysLeuLysAepPheArgValTyrPheArgG luGlyArgAepGInLeuTrpLysGly TCAAAATTAAAAGATTTTCGGGTCTATTTCAGAGAAGGCAGAGATCAGTTGTGGAAAGGA . 4700 .... ProGlyGluLeuLeuTrpLyeGlyGluGlyAlaValLeuValLyeValGlyThrAspIle CCTGGGGAACTACTGTGGAAAGGAGAAGGAGCAGTCCTAGTCAAGGTAGGAACAGACATA . . · * · 4800 LysIlelleProArgArgLysAlaLysIlelleArgAspTyrGlyGlyArgGlnGluMet MetGluGluAspLysArgTrp AAAATAATACCAAGAAGGAAAGCCAAGATCATCAGAGACTATGGAGGAAGACAAGAGATG • · · · · · AspSerGlySerHieLeuGluGlyAlaArgGluAspGlyGluMetAla 11eVa1VaIP roThrTrpArgValProGlyArgMetGluLyaTrpBisSerLeuValLys GATAGTGGTTCCCACCTGGAGGGTGCCAGGGAGGATGGAGAAATGGCATAGCCTTGTCAA • . · 4900 . · TyrLeuLysTyrLysThrLysAapLeuGluLysValCyeTyrValProBisHisLyeVal GTATCTAAAATACAAAACAAAGGATCTAGAAAAGGTGTGCTATGTTCCCCACCATAAGGT • * ♦ · · · · GlyTrpAlaTrpTrpThrCysSerArgValllePheProLeuLysGlyAaaSerBieLeu GGGATGGGCATGGTGGACTTGCAGCAGGGTAATATTCCCATTAAAAGGAAACAGTCATCT . 5000 · · * . GlulleGlnAlaTyrTrpAsaLeuThrProGluLysGlyTrpLeuSerSerTyrSerVal AGAGATACAGGCATATTGGAACTTAACACCAGAAAAAGGATGGCTCTCCTCTTATTCAGT • ··.-· · · 5100 ArglleThrTrpTyrThrGluLysPheTrpThrAspValThrProAapCysAlaAspVa1 AAGAATAACTTGGTACACAGAAAAGTTCTGGACAGATGTTACCCCAGACTGTGCAGATGT « « · · · · LeuIleBiaSerThrTyrPheProCysPheThrAlaGlyGluValArgArgAlalleArg CCTAATACATAGCACTTATTTCCCTTGCTTTACAGCAGGTGAAGTAAGAAGAGCCATCAG « ·. · 5200 .· · GlyGluLysLeuLeuSerCyaCysAsaTyrProArgAlaBisArgAlaGlaValProSer AGGGGAAAAGTTATTGTCCTGCTGCAATTATCCCCGAGCTCATAGAGCCCAGGTACCGTC • · ♦ · · · LeuGInPheLeuAlaLeuValValValGlnGlnAanAspArgProGlnArgAepSerThr HetThrAepProArgGluThrValPro ACTTCAATTTCTGGCCTTAGTGGTAGTGCAACAAAATGACAGACCCCAGAGAGACAGTAC • 5300 « .· · · ThrArgLysGlnArgArgArgAspTyrArgArgGlyLeuArgLeuAlaLyeGlnAspSer ProGlyAenSerGlyGluGluThrIleGlyGluAlaPheAlaTrpLeuAsnArgThrVal CACCAGGAAACAGCGGCGAAGAGACTATCGGAGAGGCCTTCGCCTGGCTAAACAGGACAG . . . . . 5400 ArgSerBiaLysGinArgSerSerGluSerProThrProArgThrTyrPheProClyVal GluAlaIleAsnArgGluAlaValAsnBiaLeuProArgGluLeuIlePheGlnVaITrp TAGAAGCCATAAACAGAGAAGCAGTGAATCACCTACCCCGAGAACTTATTTTCCAGGTGT (fig.lA-suite 4)

6. /35 AlaGluValLeuGluIleLeuAla GlnArgSerTrpArgTyrTrpHieAepGluGlnGlyHetSerGluSerTyrThrLyeTyr GGCAGAGGTCCTGGAGATACTGGCATGATGAACAAGGGATGTCAGAAAGTTACACAAAGT . . . 5500 ArgTyrLeuCy6lleIleGlnLysAlaValTyrMetHisValArgLysGlyCyeThrCy6 ATAGATATTTGTGCATAATACAGAAAGCAGTGTACATGCATGTTAGGAAAGGGTGTACTT • · · · · · LeuGlyArgGlyHisGlyProGlyGlyTrpArgProGlyProProProProProProPro GCCTGGGGAGGGGACATGGGCCAGGAGGGTGGAGACCAGGGCCTCCTCCTCCTCCCCCTC • 5600. . . . MetAlaGluAlaProThrGluLeuProProValAepGlyThrProLeu GlyLeuVal*** CACGTCTGGTCTAATGGCTGAAGCACCAACAGAGCTCCCCCCGGTGGATGGGACCCCACT ArgGluProGlyAspGluTrpIlelleGluIleLeuArgGluIleLysGluGluAlaLeu GAGGGAGCCAGGGGATGAGTGGATAATAGAAATCTTGAGAGAAATAAAAGAAGAAGCTTT ···«·· LyeHisPheAspProArgLeuLeuIleAlaLeuGlyLysTyrlleTyrThrArgHisGly Me tG lu AAAGCATTTTGACCCTCGCTTGCTAATTGCTCTTGGCAAATATATCTATACTAGACATGG • · , · 5800 · . AspThrLeuGluGlyAlaArgGluLeuIleLysValLeuGInArgAlaLeuPheThrH ia ThrProLeuLysAlaProGluSerSerLeuLyaSerCysAsnGluProPheSerArgThr AGACACCCTTGAACGCGCCAGAGAGCTCATTAAAGTCCTGCAACGAGCCCTTTTCACGCA • · · · · · PheArgAlaGlyCyeGlyHiaSerArglleGlyGlnThrArgClyGlyAanProLeuSer SerGluGlnAepVaΙΑ 1aThrGInGluLeuA1aArgGInGlyGluGlu11eLeuSerGIn CTTCAGAGCAGGATGTGGCCACTCAAGAATTGGCCAGACAAGGGGAGGAAATCCTCTCTC . 5900 .... AlalleProThrProArgAenMetGln LeuTyrArgProLeuGluThrCyeAenAenSerCysTyrCysLyeArgCyeCyeTyrHis AGCTATACCGACCCCTAGAAACATGCAATAACTCATGCTATTGTAAGCGATGCTGCTACC . . . . . 6000 MetAanGluArgAlaAep CysGlnMetCyePheLeuAenLysGlyLeuGlylleCyeTyrGluArgLyeGlyArgArg ATTGTCAGATGTGTTTTCTAAACAAGGGGCTCGGGATATGTTATGAACGAAAGGGCAGAC ·«···· GluGluGlyLeuGlnArgLyeLeuArgLeuIleArgLeuLeuHisGInThrSerGluTyr Met ArgArgThrProLyaLye ThrLyaThrHieProSerProThrProAepLye GAAGAAGGACTCCAAAGAAAACTAAGACTCATCCGTCTCCTACACCAGACAAGTGAGTAT « · · 6100 . · AepGluSerAlaAlaTyrCyeHiePhelleSer Me tAenGlnLeuLeuIleAlaXleLeuLeuAla SerAlaCys LeuValTyrCyeThrGln GATGAATCAGCTGCTTATTGCCATTTTATTAGCTAGTGCTTGCTTAGTATATTGCACCCA • · · « · · * TyrValThrVa'lPheTyrGlyValP.roThrTr.pLyeAenAlaThrlleProLeuPheCye ATATGTAACTGTTTTCTATGGCGTACCCACGTGGAAAAATGCAACCATTCCCCTCTTTTG * 6200 · · · · AlaThrArgAsnArgAepThrTrpGlyThrlleGlnCyeLeuProAspAsnAapAepTyr TGCAACCAGAAATAGGGATACTTGGGGAACCATACAGTGCTTGCCTGACAATGATGATTA 6300 GInGlulleThrLeuAenValThrGluAlaPheAepAlaTrpAenAenThrValThrGlu TCAGGAAATAACTTTGAATGTAACAGAGGCTTTTGATGCATGGAATAATACAGTAACAGA • ♦ · · · · GlnAlalleGluAepValTrpHieLeuPheGluThrSerlleLyeProCyaVaiLyeLeu ACAAGCAATAGAAGATGTCTGGCATCTATTCGAGACATCAATAAAACCATGTGTCAAACT • · ( c <· τ . ··. ζΐ 6 400 . · (flg.lA-suite 5)

7. /35 ThrProLeuCyaValAlaMetLyeCysSerSerThrCluSerSerThrGlyAenAanThr AACACCTTTATGTGTAGCAATGAAATGCAGCAGCACAGAGAGCAGCACAGGGAACAACAC • · · ♦ · · ThrSerLysSerThrSerThrThrThrThrThrProThrAepGlnGluGlnGluIleSer AACCTCAAAGAGCACAAGCACAACCACAACCACACCCACAGACCAGGAGCAAGAGATAAG . 6500 .... GluAepThrProCysAlaArgAlaAspAsnCysSerGlyLeuGlyGluGluGluThrlle TGAGGATACTCCATGCGCACGCGCAGACAACTGCTCAGGATTGGGAGAGGAAGAAACGAT • · · · · 6600 AsnCyeGlnPheAsnMetThrGlyLeuGluArgAspLysLyeLysGlnTyrAsoGluThr CAATTGCCAGTTCAATATGACAGGATTAGAAAGAGATAAGAAAAAACAGTATAATGAAAC ♦ * · · · · TrpTyrSerLyeA8pValValCyeGluThrA8nA«nSerThrAenGlnThrGlnCy«Tyr ATGGTACTCAAAAGATGTGGTTTGTGAGACAAATAATAGCACAAATCAGACCCAGTGTTA • · · 6700 . . MetAsnHisCysAsnThrSerVallleThrGluSerCysAspLysHieTyrTrpAspAla CATGAACCATTGCAACACATCAGTCATCACAGAATCATGTGACAAGCACTATTGGGATGC IleArgPheArgTyrCysAlaProProGlyTyrAlaLeuLeuArgCysAenAspThrAsn TATAAGGTTTAGATACTGTGCACCACCGGGTTATGCCCTATTAAGATGTAATGATACCAA • 6 80 0 · · · · TyrSerGlyPheAlaProAenCysSerLy8ValValAlaSerThrCyeThrArgMetMet TTATTCAGGCTTTGCACCCAACTGTTCTAAAGTAGTAGCTTCTACATGCACCAGGATGAT 6900 GluThrGlnThrSerThrTrpPheGlyPheAsnGlyThrArgAlaGluAsnArgThrTyr GGAAACGCAAACTTCCACATGGTTTGGCTTTAATGGCACTAGAGCAGAGAATAGAACATA • · · * · · IleTyrTrpHieGlyArgAepAsnArgThrl lelleSerLeuAeaLysTyrTyrAenLeu TATCTATTGGCATGGCAGAGATAATAGAACTATCATCAGCTTAAACAAATATTATAATCT • . · 7000 · · S erLeuHi6CyeLysArgProGlyA snLys ThrVaILyeGlnlleMetLeuMet SerGly CAGTTTGCATTGTAAGAGGCCAGGGAATAAGACAGTGAAACAAATAATGCTTATGTCAGG • · · · · · HisValPheBieSerHieTyrGlnProIleAsnLyeArgProArgGlnAlaTrpCyeTrp ACATGTGTTTCACTCCCACTACCAGCCGATCAATAAAAGACCCAGACAAGCATGGTGCTG . 7100 ·.·· PheLyeGlyLysTrpLyeAspAlaMetGlnGluVa1LysGluThrLeuAlaLysHiePro GTTCAAAGGCAAATGGAAAGACGCCATGCAGGAGGTGAAGGAAACCCTTGCAAAACATCC . . . . . 7200 ArgTyrArgGlyThrAsnAspThrArgAanI leSerPheAlaAlaProGlyLyeGlySer CAGGTATAGAGGAACCAATGACACAAGGAATATTAGCTTTGCAGCGCCAGGAAAAGGCTC t « · « · · · AspProGluValAlaTyrMetTrpThrAsnCysArgGlyGluPheLeuTyrCysAsnMet AGACCCAGAAGTAGCATACATGTGGACTAACTGCAGAGGAGAGTTTCTCTACTGCAACAT . . . 7300 ThrTrpPheLeuAenTrpIleGluAenLysThrHisArgAenTyrAlaProCyeHielle GACTTGGTTCCTCAATTGGATAGAGAATAAGACACACCGCAATTATGCACCGTGCCATAT • · * · · · LysGlnllelleAsnThrTrpHieLyeValGlyArgAsnValTyrLeuProProArgGlu AAAGCAAATAATTAACACATGGCATAAGGTAGGGAGAAATGTATATTTGCCTCCCAGGGA • 7 40 0 · · · * GlyGluLeuSerCyeAenSerThrVaIThrSerllelleAlaAenlleAapTrpGlnAen AGGGGAGCTGTCCTGCAACTCAACAGTAACCAGCATAATTGCTAACATTGACTGGCAAAA . . . . . 7500 AenAenGlnThrAenlleThrPheSerAlaG luValAlaGluLeuTyrArgLeuGluLeu CAATAATCAGACAAACATTACCTTTAGTGCAGAGGTGGCAGAACTATACAGATTGGAGTT • « · · · · GlyAepTyrLyeLeuValGluIleThrProIleGlyPheAlaProThrLyeGluLysArg GGGAGATTATAAATTGGTAGAAATAACACCAATTGGCTTCGCACCTACAAAAGAAAAAAG • f r, 1 , · , v 7600 · · (flg.lA-suite 6)

8. /35 TyrSerSerAlaHieGlyArgHieThrArgGlyVaIP he ValLeuGlyPheLeuGlyPhe ATACTCCTCTGCTCACGGGAGACATACAAGAGGTGTGTTCGTGCTAGGGTTCTTGGGTTT ♦ ····♦ LeuAlaThrAlaGlySerAlaMetGlyAlaAlaSerLeuThrValSerAlaGlnSerArg TCTCGCAACAGCAGGTTCTGCAATGGGCGCGGCGTCCCTGACCGTGTCGGCTCAGTCCCG • 7700 · · · · ThrLeuLeuAlaGlylleValGlnGlnGlnGlnGlnLeuLeuAspValValLyeArgGln GACTTTACTGGCCGGGATAGTGCAGCAACAGCAACAGCTGTTGGACGTGGTCAAGAGACA • · . · · 7800 GlnGluLeuLeuArgLeuThrValTrpGlyThrLyeAenLeuGInAlaArgVaIThrAla ACAAGAACTGTTGCGACTGACCGTCTGGGGAACGAAAAACCTCCAGGCAAGAGTCACTGC ······ IleGluLys Ty rLeuGInA epGInAlaArgLeuAenSe rTrpGlyCysAlaPheArgGln TATAGAGAAGTACCTACAGGACCAGGCGCGGCTAAATTCATCGGGATGTGCGTTTAGACA • · ..7900 .. · ValCyeHisThrThrValProTrpValAenAspSerLeuAlaProAepTrpAspAsnHet AGTCTGCCACACTACTGTACCATGGGTTAATGATTCCTTAGCACCTGACTGCGACAATAT • · · · · · ThrTrpGlnGluTrpGluLyeGlnValArgTyrLeuGluAlaAenlleSerLyeSerLeu GACGTGGCAGGAATGGGAAAAACAAGTCCGCTACCTGGAGGCAAATATCAGTAAAAGTTT • 8000 · . « . GluG InAlaGIn IleGInGInGluLyeAenMetTyrGluLeuGlnLysLeuAenSerTrp AGAACAGGCACAAATTCAGCAAGAGAAAAATATGTATGAACTACAAAAATTAAATAGCTG • . · . .*8100 AspIlePheGlyAenTrpPheAspLeuThrSerTrpValLyeTyrlleGlnTyrGlyVal GGATATTTTTGGCAATTGGTTTGACTTAACCTCCTGGGTCAAGTATATTCAATATGGAGT • · « · · · LeuIlelleValAlaVallleAlaLeuArglleVallleTyrValValGlnMetLeuSer Val GCTTATAATAGTAGCAGTAATAGCTTTAAGAATAGTGATATATGTAGTACAAATGTTAAG . . . 8200 * AlaCysPheLeuPheProProArgLeuTyrProThrAep ArgLeuArgLysGlyTyrArgProVaIPheSerSerProProGlyTyrlleGlnGlnlle GlyLeuGluArgAlalleGlyLeuPheSerLeuProProProVa HleSerAsnArgSer TAGGCTTAGAAAGGGCTATAGGCCTGTTTTCTCTTCCCCCCCCGGTTATATCCAACAGAT • · · · · · P roTyrProGlnGlyProGlyThrAlaSerGlnArgArgAsnArgArgArgArgTrpLys HielleHisLysAspArgGlyCInProAlaAenGluGluThrGluGluAspGlyGlySer IleSerThrArgThrGlyAspSerGlnProThrLysLysGlnLysLysThrVaIGluAla CCATATCCACAAGGACCGGGGACAGCCAGCCAACGAAGAAACAGAAGAAGACGGTGGAAG • 8300 .... G InArgTrpArgGInIleLeuAlaLeuAlaAepSerIleTyrThrPheProAspProPro AenGlyGlyAspArgTyrTrpProTrpProIleAlaTyrIleHiePheLeuIleArgGIn ThrValGluThrAspThrGlyProGlyArg CAACGGTGGAGACAGATACTGGCCCTGGCCGATAGCATATATACATTTCCTGATCCGCCA 8400 AlaAepSerProLeuAspGInThrlleGInHi eLeuGInGlyLeuThrIleGInGluLeu LeuIleArgLeuLeuThrArgLeuTyrSerIleCyeArgAspLeuLeuSerArgSerPhe GCTGATTCGCCTCTTGACCAGACTATACAGCATCTGCAGGGACTTACTATCCAGGAGCTT • · · · · · ProAepProProThrHisLeuProGluSerGlnArgLeuAlaGluThr LeuThrLeuGInLeuIleTyrGInAsnLeuArgAepTrpLeuArgLeuArgThrAlaPhe CCTGACCCTCCAACTCATCTACCAGAATCTCAGAGACTGGCTGAGACTTAGAACAGCCTT • · · 8500 . * LeuGInTyrGlyCyeGluTrpIleGInGluAlaPheGInA1aA1aAlaArgAlaThrArg MetGlyAlaSerGlySerLyeLyeH ieSerArgProProArgGlyLeuGlnGlu CTTGCAATATGGGTGCGAGTGGATCCAAGAAGCATTCCAGGCCGCCGCGAGGGCTACAAG (fig.lA-suite 7)

9. /35 • · · · · · GluThrLeuAlaGlyAlaCyeArgGlyLeuTrpArgValLeuGluArglleGlyArgGly ArgLeuLeuArgAleArgAlaGlyAlaCysGlyGlyTyrTrpAenGluSerGlyGlyGlu AGAGACTCTTGCGGGCGCGTGCAGGGGCTTGTGGAGGGTATTGGAACGAATCGGGAGGGG . 8600 .... IleLeuAlaValProArgArglleArgGlnGlyAlaGluIleAlaLeuLeu TyrSerArgPheGlnGluGlySerAspArgGluGlnLysSerProSerCysGluGlyArg AATACTCGCGGTTCCAAGAAGGATCAGACAGGGAGCAGAAATCGCCCTCCTGTGAGGGAC • « . · · 8700 GlnTyrGlnGlnGlyAepPheMetAenThrProTrpLyeAepProAlaAlaGluArgGlu GGCAGTATCAGCAGGGAGACTTTATGAATACTCCATGGAAGGACCCAGCAGCAGAAAGGG • · · · · · LysAsnLeuTyrArgGlnGlnAsnMetAepAspValAspSerAepAapAspAepGlnVal AGAAAAATTTGTACAGGCAACAAAATATGGATGATGTAGATTCAGATGATGATGACCAAG • . · 8800'* · . ArgValSerValThrProLysValProLeuArgProMetThrHisArgLeuAlalleAsp TAAGAGTTTCTGTCACACCAAAAGTACCACTAAGACCAATGACACATAGATTGGCAATAG ······ MetSerHieLeuIleLyeThrArgGlyGlyLeuGluGlyMetPheTyrSerGluArgArg ATATGTCACATTTAATAAAAACAAGGGGGGGACTGGAAGGGATGTTTTACAGTGAAAGAA . 8900 . · . · HisLysIleLeuAsnlleTyrLeuGluLysGluGluGlyllelleAlaAspTrpGlnAen GACATAAAATCTTAAATATATACTTAGAAAAGGAAGAAGGGATAATTGCAGATTGGCAGA 9000 TyrThrHieGlyProGlyValArgTyrProMetPhePheGlyTrpLeuTrpLysLeuVal ACTACACTCATGGGCCAGGAGTAAGATACCCAATGTTCTTTGGGTGGCTATGGAAGCTAG • · · · · · ProValAepVa IProG InG luGlyG luAs pThrG luThrHiaCysLeuVa lHieProAla TACCAGTAGATGTCCCACAAGAAGGGGAGGACACTGAGACTCACTGCTTAGTACATCCAG « · * 9100 * · GlnThrSerLyePheAepAepProHisGlyGluThrLeuValTrpGluPheAepProLeu cacaaacaagcAagtttgatgacccgcatggggagacactagtctgggagtttgatccct 9 9 9 9 9 9 LeuAlaTyrSerTyrGluAlaPhelleArgTyrP roGluGluPheGlyHisLysSerGly TGCTGGCTTATAGTTACGAGGCTTTTATTCGGTACCCAGAGGAATTTGGGCACAAGTCAG . 9200 . . . LeuProGluGluGluTrpLyeAlaArgLeuLysAlaArgGlylleProPheSer GCCTGCCAGAGGAAGAGTGGAAGGCGAGACTGAAAGCAAGAGGAATACCATTTAGTTAAA 9300 GACAGGAACAGCTATACTTGGTCAGGGCAGGAAGTAACTAACAGAAACAGCTGAGACTGC AGGGACTTTCCAGAAGGGGCTGTAACCAAGGGAGGGACATGGGAGGAGCTGGTGGGGAAC . . ' . 9400 . ' GCCCTCATATTCTCTGTATAAATATACCCGCTAGCTTGCATTGTACTTCGGTCGCTCTGC GGAGAGGCTGGCAGATTGAGCCCTGGGAGGTTCTCTCCAGCAGTAGCAGGTAGAGCCTGG ... — ... _ . . — - - - 9 5 0 0 · - — * ··* — -- - —......-» — - -. — ... —..._ .. - . GTGTTCCCTGCTAGACTCTCACCAGCACTTGGCCGGTGCTGGGCAGACGGCCCCACGCTT 9600 GCTTGCTTAAAAACCTCCTTAATAAAGCTGCCAGTTAGAAGCA (fig.lA-suite 8)

10. /35 FIG IB AGTCGCTCTGCGGAGAGGCT GGCAGATTGAGCCCTtGGAGGTTCTCTCCAGCACT AGCAG ·*«««· GTAGAGCCTGGGTGTTCCCTGCTAGACTCTCACCAGCACTTGGCCGGTGCTGGGCAGACT . . * 100 . . GGCTCCACGCTTGCTTGCTTAAAGACCTCTTCAATAAAGCTGCCATTTAGAAGTAACCTA GTGTGTGTTCCCATCTCTCCTAGTCGCCGCCTGCTCAACTCGCTACTCGGTAATAAAAAG • 200 . « « . ACCCTGGTCTGTTACGACCCTGGTCTGTTAGCACCCTTTCTGCTTTGGGAAACCGAAGCA . · . · · 300 CGAAAATCCCTAGCAGATTGGCGCCCGAACAGGGACTTGAAGGAGAGTGAGAGACTCCTG AGTACGGCTGACTGAAGGCAGTAAGGGCGGCAGGAACC A ACCACGACGGAGTGCTCCTAG . . · <.00 · . AAAGGCGCGGGTCGGTACCAGACGGCGTCAGGACCGGGGAGACAAGAGGCCTCCTGGTTG a · a a · a CAGGTAAGTGCAACACAAAAAGGAAATAGCTGTCTTTTATCCAGGAAGGGATAATAAGAT .a 500 a a · a gagdketglyalaarcasnseryalleuserglylyslysalaaspgluleuclu ACAGTGCGAGATGGGCGCGAGAAACTCCGTCTTGTCAGGGAAGAAAGCAGATGAATTAGA a a a a a 600 LYSILEARGLEUARGPROGLYGLYLYSLYSLYSTYRKETLEULYSHISVALVALTRPALA AAAAATTAGACTACGACCCGGCGGAAAGAAAAAGTACATCTTGAAGCATCTAGTATGGGC ALAASNGLULEUA5PARGPKEGLYLEUALAGLUSERLEULEUGLUASNLYSGLUGLYCYS AGCAAATGAATTAGATAGATTTGGATTAGCAGAAAGCCTGTTGGAGAACAAAGAAGGATG a a a 700 a a glnlysileleuservalleualaproleuvalprothrglysergluasnleulysser TCAAAAAATACTTTCGGTCTTAGCTCCATTAGTGCCAACAGGCTCAGAAAATTTAAAAAG tleutyrasnthrvalcysvaliletrpcysilehisalagluglulysvallyshisthr CCTTTATAATACTGTCTGCGTCATCTGGTGCATTCACGCAGAAGACAAAGTGAAACACAC a 800 a a a a GLUGLUALALYSGLNILEVALGLNARGHISLEUVALMETGLUTHRGLYTHRALAGLUTHR TGAGGAAGCAAAACACATAGTGCAGAGACACCTAGTGATGGAAACAGGAACACCAGAA AC a a a a a 900 METPROLYSTHRSERARCPROTHRALAPROPHESERGLYARGGLYGLYASNTYRPROVAL TATGCCAAAAACAAGTAGACCAACAGCACCATTTAGCGGCAGAGGAGGAAATTACCCAGT * GLNGLNILEGLYGLYASNTYRTHRHISLEUPROLEUSERPROARGTHRLEUASNALATRP ACAACAAATAGCTGGTAACTATACCCACCTACCATTAAGCCCGAGAACATTAAATGCCTG • a a 1000 a a YALLYSLEUILEGLUGLULYSLYSPHEGLYALAGLUVALVALSERGLYPHEGLNALALEU GGTAAAATTAATAGAGGAGAAGAAATTTGGAGCAGAAGTAGTGTCAGGATTTCAGGCACT SERGLUGLYCYSLEUPROTYRASPILEASNCLNrtETLEUASNCYSYALGLYASPHISCLN GTCAGAAGGCTGCCTCCCCTATGACATTAATCAGATGTTAAATTGTGTGGGAGACCATCA a 1100 a a a a ALAALANETCLNILEILEARGASPILEILEASNGLUGLUALAALAASPTRPASPLEUCLN AGCGGCTATGCAGATCATCAGAGATATTATAAATGAGGAGGCTGCAGATTGGGACTTGCA a a a a a 1200 HISPR0GLNGLNALAPR0GLNGLNGLYGLNLEUARGGLUPR0SERGLYSERASP1LEALA GCACCCACAACAAGCTCCACAACAAGGACAGCTTAGGGAGCCGTCAGGATCAGATATTGC GLYTHRTHRSERTHRVALGLUGLUGLNILEGLNTRPMETTYRARGGLNGLNASNPR01LE AGGAACAACTAGTACAGTAGAAGAACAAATCCAGTGGATGTACAGACAACAGAACCCCAT a a · 1300 a a FIG. ipIE990955

11. /35 PROVALCLYASNILETYRaRGARGTRPILEGLNLEUGLYLEUGLNLY5CYSVALARGHET accagtaggcaacatttacacgagatggatccaactgcggttgcaaaaatgtgtcagaat • · « · « · TYRASNPROTHRASNILELEUASPVALLYSGLNCLYPROL YSGLUPROPHEGLNSERTYR GTAT AACCCAACAAACATTCTAGATGTaaaacaagggcc AAAAGAGCC ATTTCAGAGCTA . 1500 .... VALASPARGPHETYRLYSSERLEUARGALAGLUGLNTHRASPPROALAVALLYSASNTRP TGTAGACAGGTTCTACAAAAGTTTAAGAGCAC AACAAACAGATCCAGCAGTAA AGAATTG • · · . . 1500 metthrglnthrleuleuileglnasnalaasnproaspcyslysleuvalleulysgly GATCACTCAAACACTGCTGATTCAAAATCCTAACCCAGATTGCAAGCTAGTGCTCAAGGG • « .... LEUGLYTHRASNPROTHRLEUGLUGLUMETLEUTKRALACYSGLNGLYVALGLYGLYPRO GCTGGGTACGAATCCCACCCTAGAAGAAATGCTGACGGCCTGTCAAGGAGTAGGGGGGCC • · . 1600 . . GLYGLNLYSALAARGLEUMETALAGLUALALEULYSGLUALALEUALAPROALAPR0ILE AGGACAGAACGCTAGATTAATGGCAGAAGCCCTGAAAGAGGCCCTCGCACCAGCGCCAAT • · · · · · POLYALLEUGLULEUTRP PROPHEALAALAALAGLNGLNLYSGLYPROARGLYSPROILELYSCYSTRPASNCYSGLY CCCTTTTGCAGCAGCCCAACACAAGGGACCAACAAAGCCAATTAAGTGTTGCAATTGTGG * 17 00 .... GLUGLYARGTHRLEUCYSLYSALAHETGLNSERPROLYSLYSTHRGLYMETLEUGLUHET LYSGLUGLYHISSERALAARGGLNCYSARGALAPROARGARGGLNGLYCYSTRPLYSCYS GAAGGAAGGACACTCTGCAAGGCAATGCAGAGCCCCAAGAAGACAGOGATGCTGGAAATG « · . . · 1800 TRPLYSASNGLYPROCYSTYRGLYGLNHETPROLYSGLNTHRGLYGLYPHEPHEARGPRO clylysmetasphisvalmetalalyscysproasnargglnalaclypheleuclyleu TGGAAAAATGGACCATGTTATGGCCAAATGCCCAAACAGACACGCGGGTTTTTTAGGCCT trpproleuglylysglualaproglnpheprohisglyserseralaserglyalaasp glyprotrpglylyslysproargasnpheprohetalaglnvalhisglnglyleuthr TGGCCCTTGGGGAAAGAAGCCCCGCAATTTCCCCATGGCTCAAGTGCATCAGGGGCTGAC . . . 1900 · . ALAASNCYS SERPROARG ARGTHRSERCYSGL YSERAL ALYSGLULEUH1 SALALEUGLY PROTHRALAPROPROGLUGLUPROALAVALASPLEULEULYSASNTYRHETHISLEUGLY gccaactgctcccccagaacaaccagctgtggatctgctaaagaactacatgcacttggg ···... glnalaalagluarclysglnargclualaleuglnglyglyaspargglyphealaala lysglnglnarggluserargglylysprotyrlysgluvalthrgluaspleuleuhis caagcagcagagagaaagcacagggaagccttacaaggagctgacacaggatttgctgca • 2000 . · · « proglnpheserleutrparcargprovalvalthralahisilecluclyglnproyal leuasnserleupheglyglyaspglnw cctcaattctctctttggaggagaccagtagtcactgctcatattcaaggacagcctgta • · · · · 2100 gluvalleuleuaspthrglyalaaspaspserileyalthrglyilecluleuglypro gaagtattattagatacaggggctgatgattctattgtaacaggaatagagttaggtcca histyrthrprolysilevalglyglyileglyglypheileasnthrlysclutyrlys CATT AT ACCCC AAAAATAGT AGGAGGAATAGGAGGTTTTATTAATACTAAAGAATACAAA • · · 22 00 · · ASNVALCLUILEGLUVALLEUGLYLYSARCILELYSGLYTHRILEMETTHRGLYASPTHR AATGTAGAAATAGAAGTTTTAGCCAAAAGGATTAAAGGGACAATCATGACAGGGGACACC PROILEASNILEPHE GLYARGASNLEULEUTHRALALEUGLYMETSERLEUASNLEUPRO CCGATTAACATTTTTGGTAGAAATTTACT AAC ACCTCTGGCG ATGTCTCTAAATCTTCCC « 2300 · · · · ILEALALYSVALGLUPROYALLYS SERPROLEULYSPROGLYLYSASPGLYPROLYSLEU ATAGCTAAGGTAGAGCCTGTAAAGTCGCCCTTAAAGCCAGGAAAGGATGGACCAAAATTG • · · · · 2A00 LYSGLNTRPPROLEUSERLYSGLULYSILEVALAtALEUARGGLUILECYSGLULYSHET AAGCAGTGGCCATTATCAAAAGAAAAGATAGTTGCATTAAGAGAAATCTGTGAAAAGATG (fig.lE-suite 1) .

12. /35 . GLULYSASPGLYCLNLEUGLUGLUALAPROPROTHRASNPROTYRASNTHRPROTHRPHE GA AAAAGATCGTCAGTTGGAGGAAGCTCCCCCGACC A AT CCATATAACACCCCC AC AT TT . . . 2500 AL A I LEL YSL YSL YS AS PL Y S A SNL Y ST RP ARCHE T LE U I LE ASPP HE ARGGLUL EUASN GCTATAAAGAAAAAGGATAAAAACAAATGGAGAATGCT GATACATTTT AGGGAACT AA AT ··.·.. ARGVALTHRGLNASPPHE THRGLUVALGLNLEUCLYILEPROHI SPROALAGLYLEUALA AGGG TC AC TC A AG ACTTT ACGG A AGT CC AAT T AG GA AT ACC AC ACCCTGC AGG AC T AGCA • 26 00 · · « · LYSARCLYSARGILETHRVALLEUASPILEGLYASPALATYRPHESERILEPROLEUASP AAAAGGAAAAGGATTACAGTACTGGATATAGGTGACGCATATTTCTCTATACCTCT AGAT • · · · . 2700 GLUGLUPHEARGGLNTYRTHRALAPHETHRLEUPROSE RVALASNASNALAGLUPROGLY GA AG AAT TT AGGCAGT AC ACTGCCTT T ACTTT ACC ATC AGT AAAT AATGC AG AGCC AGG A ··«··« LYSARGTYRILETYRLYSVALLEUPROGLNGLYTRPLYSGLYSERPROALA ILEPHEGLN AAACGATACATTTATAAGGTTCTGCCTCAGGGATCGAAGGGGTCACCAGCCATCTTCCAA • · « 28 00 * · Τ Y RT HRKE T AR GHI S V ALLE UGLUPROPHE ARGLYS ALA ASN PROASP YALTHRLEUVAL TACACT ATGAGACATGTGCT AG AACCCTTCAGGAAGGC AAATCCAGATGTGACCTTAGTC ····.· GLNTYRMETASPASPILELEUILEALASERASPARGTHRASPLEUGLUHI SASPARGVAL C AGT AT ATGG ATG ACATCTT AAT AGCT AGTG ACAGGAC AGACCTGGAAC ATG ACAGCGTA • 2900 · · /♦ · VALLEUGLNLEULYSGLULEULEUASNSER tLEGLYPHESERSERPROGLUGLUL YSPHE GTTTTACAGTTAAAAGAACTCTTAAATAGCATAGCGTTTTCATCCCCAGAAGAGAAATTC • · · · · 3000 GLNLYS ASPPROPROPHEGLNTRPMETGLYTYRGLULEUTRPPROTHRLYSTRPLYSLEU CAAAAAGATCCCCCATTTCAATGGATGGGGTACGAATTGTGGCCGACAAAATGGAAGTTG « · . « · · GLNLYS ILEGLULEUPROGLNARGGLUTHRTRPTHRVALASNASPILEGLNLYSLEUVAL CAAAAGATAGAGTTGCCACAAAGAGAGACCTGGACAGTGAATGATATACAGAAGTTAGTA • « . 3X00 · · GLYVALLEUASNTRP ALAALAGLNILETYRPROGLYILELYSTHRLYSHISLEUCYSARG GG AGTATTAAATTGGGCAGCTCA AATTTATCC AGGT AT AAAAACCAAACATCTCTGTAGC ····*· LEUILEARGGLYLYSHETTHRLEUTHRGLUGLUVALCLNTRPTHRGLUMETALAGLUALA TTAATTAGAGGAAAAATGACTCTAACAGAGGAAGTTCAGTGGACTGAGATGGCAGAAGCA • 3200 · · · · GLUTYRCLUGLUASNLYS ILE ILELEUSERGLNGLUGLNGLUGLYCYSTYRTYRGLNGLU GAATATGACGAAAATAAAATAATTCTCAGTCAGGAACAAGAAGGATGTTATTACCAAGAA • · · · · 3300 SERLYSPROLEUGLUALATHRVALILELYSSERGLNASPASNGLNTRPSERTYRLYSILE AGCAAGCCATTAGAAGCCACGGTGATAAAGAGTCAGGACAATCAGTGGTCTTATAAAATT ····· HI SGLNGLUASPLYS ILELEULYS VALGLYL YSPHE AL ALYS ILELYSASNTHRHISTHR CACCAAGAAGACAAAATACTGAAAGTAGGAAAATTTGCAAAGATAAAGAATACACATACC • · · 3400 · · ASNGLYVALARGLEULEUALAHISVALILEGLNLYSILECLYLYSGLUALAILEVALILE AATGGAGTTAGACTATTAGCACATGTAATACAGAAAATAGGAAAGGAAGCAATAGTGATC • · · · · . · TRPGLYGLNVALPROLYSPHEHISLEUPROYALGLULYSASPYALTRPGLUGLNTRPTRP TGGGGACAGGTCCCAAAATTCCACTTACCAGTTGAGAAGGATGTATGGGAACAGTGGTGG • 3500 « · · · THRASPTYRTRPGLNVALTHRTRPILEPROGLUTRPASPPHEILESERTHRPRQPROLEU ACAGACTATTGGCAGGTAACCTGGATACCGGAATGGGATTTCATCTCAACACCACCATTA • · · · · 3600 VALARGLEUVALPHEASNLEUVALLYSASPPROILECLUGLYGLUGLUTHRTYRTYRVAL GTAAGATTAGTCTTCAATCTAGTGAAGGACCCTATAGAGCGAGAAGAAACCTATTATGTA • · . · · · · ASPGLYSERCYSSERLYSGLNSERLYSGLUGLYLYSALAGLYTYRILETHRASPARGGLY GATGGATCATGTAGTAAACAGTCAAAAGAAGGAAAAGCAGGATATATCACAGACAGGGGC (fig.IB-suitft 2 )

13. /35 . . . 3700 LYSASPLYSVALLYSYALLE UGLUGLNTHRTHRASNGLNGLNALACLULEUGLUALAPHE AAAGACAAGGT AAAAGTGTT AGAACAGACT ACT AATCAACAAGCACAATTGGAAGCATTT • · · · · · LEUME TALALEUTHRASPSERGLYPROLYSALAASN I LE I LEVALASPSERGLNTYRYAL CTCATGCCATTGACAGACTCAGGGCCAAAGGCAAAT AT T AT AGTAGACTCACAATATGTT « 38 00 « . . , KETGLYlLEILETHRGLYCYSPROTHRGLUSERGLUSE RARGLEUVALASNGLNILEILE ATGGGAAT AATAACAGGATGCCCTAC AG AATCAGAGACC AGGCTACTTAACCAAATAATA • · · · « 3900 GLUGLUHETILELYSLYSTHRGLUILETYRVALALATRPVALPROALAH ISLYSGLYILE gaagaaatgatcaaaaagacagaaatttatctggcatgggtaccagcacacaaaggtata • · · · · · GLYGLYASNGLNGLUILEAS PH ISLEUYALSERGLNGLY ILEARGGLNVALLEUPHELEU GGAGGAAACCAAGAAATACACCACCT AGTTAGTCAAGGGATTAGACAAGTTCTCTTCTTG • · · 4000 · · GLULYSILEGLUPROALAGLNCLUGLUHISSERLYSTY RH I SSERASNILELYSGLULEU GAAAAGATAGAGCCAGCACAAGAAGAACATAGTAAATACCATAGTAACATAAAAGAATTG • · · · « · VALPHELYSPHEGLYLEUPROARGLEUVALALALYSGLN I LEYALASPTHRCYSASPLYS GTATTCAAATTTGGATTACCCAGACTAGTCGCCAAACAGATAGTAGACACATGTGATAAA • 4100 « · · · CYSHISGLNLYSGLYGLUALAILEHISGLYGLNVALASNSERASPLEUGLYTHRTRPGLN TGTCATCAAAAAGGAGAAGCTATACATGGGCAGGTAAATTCAGACCTA^GGACTTGCCAA • · · « · 4200 METASPCYSTHRHISLEUGLUGLYLYSILEYALILEYALALAYALHISVALALASERGLY ATGGATTGTACCCATCTAGAGGGAAAAATAGTCATAGTTGCAGTACATGTAGCTAGTGGA • · · « · · PHEILEGLUALAGLUVALILEPROGLNGLUTHRGLYARGGLNTHRALALEUPHELEULEU TTCATAGAAGCAGAAGTAATTCCACAAGAAACAGGAAGACAGACAGCACTATTTCTGTTA • · « 43 00 · · LYSLEUALASERARGTRPPROILETHRHISLEUHISTHRASPASNGLYALAASNPHEALA AAATTCGCAAGCAGATGGCCTATTACACATCTGCACAC AGATAATGGTGCTAACTTTGCT • · · · · · SERGLNGLUVALLYSMETVALALATRPTRPALAGLYILEGLUHISTHRPHEGLYVALPRO TCGCAAGAAGTAAAGATGCTTGCATGGTGGGCAGCCATAGAGCACACCTTTCGCGTACCA • 44 00 « · · · TYRASNPROGLNSERGLNGLYVALYALGLUALAMETAS NHI SHISLEULYSASNGLNILE TACAATCCACAGAGTCAGGGAGTAGTGGAAGCAATGAATCACCACCTGAAAAATCAAATA • · · « · 4500 aspargileargcluglnalaasnservalgluthrilevalleumetalavalhiscys GATAGAATCAGGGAACAAGCAAATTCAGTAGAAACCATAGTATTAATGGCAGTTCATTGC • · · · « « HETASNPHELYSARGARGGLYGLYileglyaspmetthrproalagluargleuileasn ATGAATTTTAAAAGAAGGGGAGGAATAGGGGATATGACTCCAGCAGAAAGATTAATTAAC • · · 4600 · · «ETILETHRTHRGLUGLffGLUILEGLNPHEGLNGLNSERLYSASNSERLYSPHELYSASN ATCATCACTACAGAACAAGAAATACAATTTCAACAATCAAAAAACTCAAAATTTAAAAAT • · · « · · PHEARGVALTYRTYRARGGLUGLYARGASPGLNLEUTRPLYSGLYPROGLYGLULEULEU TTTCCGGTCTATTACAGAGAAGGCAGAGATCAGCTGTGGAAGGGACCCCGTGAGCTATTC • 47 00 · « · · TRPLYSGLYGLUGLYALAVALILELEULYSVALGLYTHRASPILELYSVALVALPROARG TGGAAAGGGGAAGGAGCAGTCATCTTAAAGGTAGGAACAGACATTAAGGTAGTACCCAGG • « · · · 4800 ARGLYSALALYSILEILELYSASPTYRGLYGLYGLYLYSGLUMETASPSERSERSERHIS Q«ETGLUGLUGLULYSARGTRPILEVALVALPROTHR AGAAAGGCTAAAATTATCAAAGAT'TATGGAGGAGGAAAAGAGATGGATAGTAGTTCCCAC HETGLUASPTHRGLYCLUALAARGGLUVALALA TRPARGILEPROGLUARGLEUGLUARGTRPH ISSEJiLEUILELYSTYRLEULYSTYRLYS ATGGAGGATACCGGAGAGGCTAGAGAGGTGGCATAGCCTCATAAAATATTTGAAATATAA « · · 4900 · · (fie.IB-suite 3 )

14. /35 THRLYSASPLEUGLNLYSALACYSTYRVAL PROHISHISLYSYALGLYTRPALATRPTRP AACT AAAGATCTACAAAAGGCT TGCTATGTGCCCCATCATAAGGTCGCATGGGCATCGTG ...... THRCYSSERARGVALILEPHEPROLEUGLNGLUGLYSERHISLEUCLUVALGLNGLYTYR GACCTGCAGCAGAGTAAT CT TCCC AC T AC AG G AAGG A AGCC AT T TAG A AGT AC A AGGGT A . 5000 .... TRPASNLEUTHRPROGLUARGGLYTRPLEUSERTHRTYRALAVAL ARG1LETHRTRPTYR TTGGAATTTGAC ACCAGAAAGACGGTGGCTCAGTACTT ATGCAGTGAGGATAACCTGG T A • · . · . 5100 SERLYSASPPHETRPTHRASPVALTHRPROGLUTYRALAASPILELEULEUHISSERTHR CT CA AAGG ACTT TTGG AC AG ATGT AACACC AG AAT ATGCAGATATT TT AC TGC AT ACC AC TYRPHEPROCYSPHETHRALAGLYGLUVALARGARCALAILEARGGL YGLUARGIEULEU TTATTTCCCTTGCTTTACACCCGGAGAAGTGAGAAGGGCCATCAGGGCAGAACGACTGCT • · · 5200 · · SERCYSCYSARGPHEPROARGALAHISLYSHISGLNVALPROSERLEUGLNTYRLEUALA GTCTTGCTGCAGGTTCCC AAGAGCTCATAAGCACCAGGTACCAAGTCTACAGTACTTAGC LEUARGVAL.VALSERHI SVALARGSERGLNGLYGLUASNPROTHRTRPLYSGLNTRPARG X MET SERASPPROARGGLUARG ILEPROPROGLYASNSERGLYGLU ACTGAGAGTAGT AAGTCATGTCAGATCCCAGGGAGAGAATCCCACCTGGAAACAGTGGAG . 5300 . · · . ARGASPASNARGARGSERLEUARGVALALALYSGLNASNSERARGGLYASPLYSGLNARG GLUTHRILEGLYGLUALAPHEGLUTRPLEU ASNARGTHRVALGLUG^UILE ASNARGGLU AAG AGACAAT AGGAGAAGCCTTCGAGTGCCT AAACAGAAC AGT ACAGG AG AT AAACAGAG • · . · 5 A 00 GLYGLYLYSPROPROTHRGLUGLYALAASNPHEPROGLYLEUALALYSVALLEUGLYILE AL AV ALASNH I SLEUPROARGGLULEU ILE PHEGLN VALTRPGLNARGSERTRPGLUTY R AGGCGGTAAACCACCTACCCAGGGAGCTAATTTTCCAGGTTTGCCAAAGGTCTTGGGAAT ..···· LEUALA TRPHISASPGLUGLNGLYHETSERCLNSERTYRTHRLYSTYRARGT YRLEUCYSLEUILE ACTGGCATGATGAACAAGGGATGTCACAAAGCTATACAAAATACAGATACTTGTGTTTAA . · · 5500 · · GLNL YSALALEUPHEHETHI SCYSLYSLYSGLYCYS ARGCYSLEUGLYGLUGLYHISGLY TACAAAAGGCTTTATTTATGCATTGCAAGAAAGGCTGTAGATGTCTAGGGGAAGGACACG ··«··· ALAGLYGLYTRPARGPROGLYPROPROPROPROPROPROPROGLYLEUALA R METGLU GGGCAGGGGGATGGAGACCAGGACCTCCTCCTCCTCCCCCTCCAGGACTAGCATAAATGG • 5600 · · · * GLUARGPROPROGLUASNGLUGLYPROGLNARGGLUPROTRPASPGLUTRPVALVALGLU AAGAAAGACCTCCAGAAAATGAAGGCCCACAAAGGGAACCATGGGATGAGTGGGTAGTGG « · · . · 5700 VALLEULYSGLULEULYSGLUGLUAL ALE ULYSHISPHEASPPROARGLEULEUTHRALA AAGTTCTGAAAGAACTCAAAGAAGAAGCTTTAAAGCATTTTGATCCTCGGCTTCTAACCG • . · « · · TATI KETGLUTHRPROLEUARGGLUGLNGLUASNSER LEUGLYASNHISILETYRASNARGH1SGLYASPTHRLEUGLUGLYALAGLYCLULEUILE CACTTGGTAATCATATCTATAATAGACATGGAGACACCCTTGAGGGAGCAGGAGAACTCA • · · 5800 · · LEUCLUSERSERASNGLUARGSERSERTYRILESERGLUALAALAALAALAILEPROGLU ARGILELEUGLNARGALALEUPHEILEHISPHEARGSERGLYCYSSERHISSERARGILE TTAGAATCCTCCAACGAGCGCTCTTCATACATTTCAGAAGCGGCTGCACCCATTCCAGAA SERALAASNLEUGLYGLUGLUILELEUSERGLNLEUTYRARGPROLEUGLUALACYSTYR GLYGLNPROGLYGLYGLYASNPROLEUSERTHRILE PROPROSERARGSERHETLEU TCGGCCAACCTGGGGGAGGAAATCCTCTCTCAACTATACCGCCCTCTAGAAGCATGCTAT • 5900 · · · · ASNTHRCYSTYRCYSLYSLYSC YSCYSTYRHISCYSGLNPHECYSPHELEUL YSLYSGLY AACACATCCTATTGCAAAAAGTGTTGCTACCATTGCCAGTTTTGTTTTCTTAAAAAGGGC • · * . · 6000 LEUGLYILESERTYRGLULYSSERHISARGARGARGARGTHRPROLYSLYSALALYSALA ARTiflET ARCS ERHISTHRGL YGLUGLUGLULE UARG ARGARGLEUARCLEU (f ig . lB-sui te 4)

15. /35 Τ 1 CC GC A T * AC T T A T C *C A A CT C AC AC AC C A G * A C A AC * AC T C CC A AC A ACC C T A ACCC T « · · « « · ASNTHRSERSERALASERASNCLU 1LΕΗ ISLFULEUH I SGLNTHRSERLYSTYRCLYLEUSERTRPLTSSERALAALATYRARC ENV METGLYCYSLEUGLYASNGLNLEULEUILEALA AATACATCTTCTCCATCAAACGACT AACTATCCCTTCTCTTCCAAATCACCTCCTTATCC • · · 6100 · · HISLEULEU ILECYSSERLYSCYSLEUTRPILEILECYSILECLNTYRVALThRVALPHETYRCLYVAL CCATCTGCTCTAAGTGTCTATGGATTATTTGTATTCAATATCTCACAGTCTTTTATCGTG • · · · . · PROALATRPARGASNALATHRILEPROLEUPHECYSALathrlysasnarcaspthrtrp TACCAGCTTGGAGGAATGCGACAATTCCCCTCTTCTGT GCA ACCAAGA ATAGGGATACTT • 6200 · · « · glythrtkrglncysleuproaspas rasp asptyrsergluleualaleuasnvalthr CGCGAACAACTCAGTGCCTACCAGATAATGATCATTATTCAGAATTGGCCCTTAATGTTA • · · · · 6300 GLUSERPHEAS PAL ATRPGLUASNTHRVALTHRGLUGLNALAlLECLUASPYALTRPGLN CAGAAAGCTTTGATCCTTGGGACAATACAGTCACAGAACAGGCAATAGAGGACGTATCGC ·««.·· LEUPHEGLUTKRSERILELYSPROCYSVALLYSLEUSERPROLEUCYSILETHRHETARG AACTCTTTGAGACCTCAATAAAGCCTTGTCTAAAATTATCCCCATTATGCATTACTATGA • « · 6 A 00 · · cysasnlyssergluthrasplystrpglyleuthrlysserserthrthrthralaser catgcaataaaagtgacacagataaatggggattgacaaaatcatcajcaacaacaccat thrthrthrthrthrthralalysservalgluthrarcaspuevalasngluthrser caACaacaacaacaacaacagcaaaatcagtacagacaagacacatagtcaatcagact a • 6500 · · · « PROCYSVALVALHISASPASNCYSTHRGLYLEUGLUGLNGLUPROKETILESERCYSLYS GTCCTTGTGTAGTTCATGATAATTGCACAGGCTTGGAACAAGAGCCAATGATAAGCTGTA • · · · · 6600 PHEASNHETTHRGLYLEULYSARGASPLYSLYSLYSCLUTYRASNGLUTHRTRPTYRSER AATTCAACATGACAGGGTTAAAAAGAGACAACAAAAAGGAGTACAATCAAACTTGGTACT ALAASPLEUYALCYSGLUGLNCLYASNSERTHRGLYASNCLUSERARGCYSTYRHETASN CTGCACATCTGGTTTGTCAACAAGGGAATAGCACTGGTAATGAAAGTACATGTTACATGA • · · 6700 · · HI SCYSASNTHRSERYALILEGLNGLUCYSCYSASPLYSASPTYRTRPASPALAILEARC ATCACTCTAATACTTCTGTTATCCAAGACTGTTGTGACAAAGATTATTCCGATGCTATTA • · « · « · CYSARGTYRCYSALAPROPROGLYTYRALALEULEUARGCYSASNASPTHRASNTYRSER gatgtagatattgtgcacctccaggttatgctttgcttagatgtaatgacacaaattatt « 6000 · ♦ · · glyphehetproasncysserlysvalyalyalsersercysthrarchethetgluthr caggctttatgcctaactcttctaaggtagtcgtctcttcatgcacaagcatgatggaga • · · · « 6900 glnthrserthrtrppheargpheasnglythrarcalagluasnargthrtyriletyr cacagacttctacttggtttcgctttaatggaactacagcagaaaatagaacctatattt ««···· trphisglyargaspasnargthrileileserleuasnlyshistyrasnleuthrret actggcatggtagagataataggactataattagtctaaataaccattataatctaacaa • · · 7000 · · LYSCYSARCARCPROCLYASNLY5THRYALLEUPR0VALTHRILEKETSERALALEUYAL tgaaatctacaacaccacgaaataagacagttttaccactcaccattatctctccaucc ······ phehisserglnprovalasngluargprolysglnalatrpcysarcpheglyglyasn ttttccactcacaaccagtcaatcagaggccaaagcaggcatggtgtaggtttggaggaa • 7100 · · · · TRPLYSGLUALA1LELYSGLUYALLYSGLNTHRILEYALLYSHISPROARCTYRTHRGLY A TTG GA ACG AGGCA AT AA A AGAGGTGAAGCAG ACC ATT GTC A AAC ATCCC AGGT AT ACTC . . . . . 7200 thrasnasnthrasplysileasnleuthralaproargclyglyaspprogluvalthr GAACTAACAATACTGATAAAATCAATTTGACGGCTCCTAGAGCAGCAGATCCGGAACTTA (fig.1B-suite 5)

16. /35 ΡΗΕ ·1£ T T RPTHR AS NC Y S A RGGL YGL UPH E L EUT YRCYSLYSrtE TAS NTRP PHEL E UA SN CCT T C A TGTGGAC A A ATT GC AG AGG AGAG T T TC TCT AC TC T A A A ATG A AT TG GT T TCT AA • · · 7300 · . TRPVALCLUASPARGSERLEUTHRTHRGLNLYSPROLYSGLUARGH ISLYSARGASNTYR ATTGGGTAGAAGAT AGGAGTCT AACTACCCAGAAGCCAAAGGAACGCCAT AAAACCAATT . · · · < · VALPRQCYSHISILEARGGLNILEILEASNTHRTRPΗISLYSVALGLYLYSASNVALTYR acgtaccatgtcatattacacaaataatcaacacttggcataaagtaggcaaaaatgttt • 74 00 · · · . LEUPROPROARGGLUGLYASPLEUTHRCYSASNSERTHRVALTKRSERLEUILEALAASN atttgcctccaagagaggcacacctcacgtgtaactccacagtgaccagtctcatagcaa . · · « · 7500 ileasntrpthraspglyasnglnthrserilethrketseralagluvalalagluleu ACataaattggactgatcgaaaccaaactagtatcaccatgagtgcagaggtggcagaac tyrargleugluleuglyasptyrlysleuvalgluilethrproileglyleualapro tgtatcgattggaattgggagattataaattagtagaaatcactccaattggcttggccc • « · 7600 . · thrasnvallysargtyrthrthrglyglythrserargasnlysargglyvalpheval CCACAAATGTGAAGAGGTAC ACTACTGGTGGCACCTCAAGAAATAAAACAGCGGTCTTTG leuglypheleuglypheleualathralaglyseralametglyalaalaserleuthr tgctagggttcttgggttttctcgcaacggcaggttctgcaatgggcgcggcgtcgttga • 77 00 · « · · valthralaglnserargthrleuleualaclyilevalglnglnglnglnglnleuleu ccgtgaccgctcagtcccggactttattggctgggatagtgcagcaacagcaacagctgt • · · « · 7800 aspvalvallysargglnglngluleuleuargleuthrvaltrpglythrlysasnleu tggacctggtcaagagacaacaagaattgttgcgactgaccgtctggggaacaaacaacc glnthrarcvalseralaileglulystyrleulysaspglnalaglnleuasnalatrp tccagactagggtctctgccatcgagaagtacttaaaggaccaggcgcagctaaatgctt • *- · · 79 00 · · glycysalapheargclnvalcyshisthrthrvalprotrpproasnalaserleuthr gcggatgtgcgtttacacaagtctgtcackctactgtaccatggccaaatgcaagtctaa • · a a a a proasptrpasnasngluthrtrpglnclutrpgluarglysvalasppheleugluala caccagattggaacaatgagacttggcaagagtgggagcggaaggttgacttcttggagg • 8000 a a . · asnilethralaleuleugluglualaglnileglnglnglulysasnhettyrgluleu caaatataacggccctcctagaagaggcacaaattcaacaacagaagaacatgtatgaat a a a a a 8100 » glnlysleuasnsertrpaspyalpheglyasntrppheaspleuthrsertrpilelys tacaaaagttgaatagctgggatgtgtttggcaattggtttgaccttacttcttggataa tyrileglntyrglyiletyrileilevalclyvalileleuleuargilevaliletyr agtatatacaatatggaatttatataattgtaggagtaatactgttaagaatagtgatct a a a 8200 a · ileyalglnmetleualaargleuargglnglytyrargproyalpheserserpropro atatagtacaaatcctagctaggttaagacaggggtataggccagtgttctcttccccac tat2arcproileproasnargilearcleucysglnprolyslysala ART2VALASPPR0TYRPR0THRGLYSERGLYSERALAASNGLNARGARGCLN sertyrphegln***tkrhisthrclnglnaspproalaleuprothrlysgluglylys cctcttatttccagtagacccatacccaacaggatccggctctgccaaccaaagaaggca a 8300 a a a LYSLYSGLUTHRVALGLUALAALAVALALATHRALAPROGLYLEUGLYARG >TAT(fin) LYSARGARGARCTRPARCGLNARGTRPGLNGLNLEULEUALALEUALAASPARGILETYR lysglyaspglyglyglyserglyglyasn$ersertrpprotrpglnileglutyrile AAAAAGGAGACGGTGGAGGCAGCGGTGGCAACAGCTCCTGGCCTTGGCAGATAGAATATA a a a a a 8400 ( £ 5.«.lB-suite 6)

17. /35 SCRPHP PR OA S p PR 0 PR 0 T HR AS P ΤΗ RPROL EU A 5 PL € (J AL * I LE GL NGLNLf UGLNASN HI SP HE LEU ILEARGGLNLFUILEARGLEULEUTHRTRPLEUPKESERASNCYSARGTHR TTCATTTCCTGATCCGCCAACTGATACCCCTCTTGACTTGGCTATTCAGCAACTGCAGAA ♦ · · · « · LEUALAI LEGLUSERILEPROASPPROPROTHRASNI LEPR OGLUALALEUCYSASPLFU leuleuserargalatyrglni leleuglnproilepheglnarcleuseralathrtyr CCTTGCT ATCGAGAGCAT ACC AG ATCCT CC A ACC A AT AT T CC AC AGGC TC TCTGCG AC CT . · · 8500 · . F METGLYGLYALA ARGARGILEARCARGSERPROGLNALA · ART2 (fin) GLYGLUPHEGLYGLUVALLEUARGLEUGLULEUTHRTYRLEUGLNTYRGLYTRPSERTYR acggagaattcggagaagtcctcacgcttgaactcacctacctacaatatcgctggagct • · · · « · ileserlyslysargserlysproprogluilecysaspargaspsercysglyargval pheglnclualavalglnalaalaargaspleuargglnargleuleuargalaarggly atttccaacaagcggtccaagccgccagagatctgcgacagagactcttccgggcgcctg • 86 00 · · « · glyargasntyrglyargleuphelysglyvalgluaspglyserserglnserleugly glulysleutrpclualaleuglnarcglyclyargtrpileleualaileproarcarg gggagaaattatgggaggctcttcaaaccggtgcaagatggatcctcgcaatccctagga • · · * 8700 glyleuasplysglyleuserserleusercysgluglyglnlystyrasnglnglyglu ileargglnglyleugluleuthrleuleu · gcattagacaagggcttgagctcactctcttgtgagggccaaaaatacaatcagggagaa • · · · / · « tyrhetasnthrprotrpargasnproalaglugluarglyslysleuprotyrarclys tacatgaatactccatggagaaacccagctgaagagaggaaaaaattaccatacagaaaa • · . 88 00 · · glnasnileaspaspileaspglugluaspaspaspleuvalglyileprovalgluala caaaatatagatgatatagatgaggaagatgatgacttggtagggataccagttgacgcc • · · · · · argvalproleuargthrhetsertyrlysleualaileasphetserhispheilelys agagttcccctaagaacaatgacttacaaattggcaatagatatctctcattttataaaa « 89 00 · · · · GLULYSGLYCLYLEUGLUGLYILETYRTYRSERALAARGARGHISARGILELEUASPILE gaaaaggggggactcgaagcgatttattacagtgcaagaagacatagaatcttagacata . · « · · 9000 tyrleuglulysglugluglyileileproasptrpglnilehisserclyproclyile TACTTAGAAAAGGAAGAAGG CATC ATACCAG ATT GGCAGATACACTCCGG ACC AGGAATT • · · « · · ARCTYRLEULYSHETPHEGLYTRPLEUTRPLYSLEUILEPROVALASNVALSERASPGLU AGATACCTAAAGATGTTTGCCTCGCTATGGAAATTAATCCCTGTAAATGTATCAGATGAC • · · 9100 < · ALAGLNGLUASPGLUGLUHI STYRLEUVALHISPROALAGLNTHRSERGLNTRPASPASP GCACAGGAGGATGAGGAGCATTATTTAGTGCACCCAGCTCAAACTTCCCAGTGGGATGAC • · · · · · PROTRPGLYGLUYALLEUALATRPLYSPHEASPPROTHRLEUALATYRTHRTYRGLUALA CCTTGGGGAGAGGTTCTAGCATGGAAGTTTGATCCAACTCTAGCCT ACACTTATGAGGCA • 9200 · · · · TYRILEARGTYRPROGLUGLUPHEGLYSERLYSSERGLYLEUSERGLULYSGLUVALLYS TATATTAGATACCCAGAAGACTTTGGAAGCAAGTCAGGCCTGTCAGAGAAAGAGGTTAAA • « « « · 9300 ARGARGLEUALAALAARGGLYLEULEUGLUMETALAASPARGLYSGLUTHRSER AGAAGGCTAGCCGCAAGAGGCCTTCTTGAAATGGCTGACAGGAAGGAAACTAGCTGAGAC • · · · · · AGCAGGGACTTTCCACAAGGGGATGTCATGGGCAGGTACTGGGGAGGAGCCGGTTCGGAA • · · 9400 · · CACCCACTTTCTTGATGTATAAATATCACTGCATTTCGCTCTGTATTCAGTCGCTCTGCG GAGACGCTCGCAGATTGAGCCCTGGGAGGTTCTCTCCAGCACT AGCAGGTAGAGCCTGGG • 9500 · · · · TGTTCCCTGCTAGACTCTCACCAGCACTTGGCCGGTGCTGGGCAGAGTGGCTCCACGCTT . . . . 9600. (fig.lB-suite 7)

18. /35 FIG. 1C sequence LTR CIVET versus HIV-2 ROD X 8960 8970 8980 8990 9000 9010 TGGAAGGGATTTATTACAGTGCAAGAAGACATAGAATCTTAGACATATACTTAGAAAAGG ·««««! TGGAAGGGATGTTTTACAGTGAAAGAAGACATAAAATCTTAAATATATACTTAGAAAAGG X 8950 8960 8970 8980 89^90 9020 9030 9040 9050 9060 AAGAAGGCATCATACCAGATTGGCAGATACACTCCGGA---CCAGGAATTAGATACCTAA AAGAAGGGATAATTGCAGATTGGCAGAACTACACTCATGGGCCAGGAGTAAGATACCCAA 9010 9020 9030 9040 9050 9080 9090 9100 9110 9120 AGATGTTTGGCTGGCTATGGAAATTAATCCCTGTAAATGTATCAGATGAGGCACAGGAGG • · · · · · · «·········· ·· · · · «·· «··· · · · · · · « · · · • · · · « · · ·······«··· · · « · · · * « ·«·· · · · · · · · · · * TGTTCTTTGGGTGGCTATGGAAGCTAGTACCAGTAGATGTCCCACAAGAAGGGGAGGACA 9070 9080 9090 9100 9110 9140 9150 9160 9170 9180 ATGAGGAGCATTATTTAGTGCACCCAGCTCAAACTTCCCAGTGGGATGACCCTTGGGGAG ·····' CTGAGACTCACTGCTTAGTACATCCAGCACAAACAAGCAAGTTTGATGACCCGCATGGGG 9130 9140 9150 9160 9170 9200 9210 9220 9230 9240 AGGTTCTAGCATGGAAGTTTGATCCAACTCTAGCCTACACTTATGAGGCATATATTAGAT ·····«···« ··**«··«·· AGACACTAGTCTGGGAGTTTGATCCCTTGCTGGCTTATAGTTACGAGGCTTTTATTCGGT 9190 9200 9210 9220 9230 9260 9270 9280 9290 9300 ACCCAGAAGAGTTTGGAAGCAAGTCAGGCCTGTCAGAGAAAGAGGTTAAAAGAAGGCTAG ····«·« · · ···««·· · · ACCCAGAGGAATTTGGGCACAAGTCAGGCCTGCCAGAGGAAGAGTGGAAGGCGAGACTGA 9250 9260 9270 9280 9290 9320 9330 9340 9350 CCGCAAGAGGCCTTCTTGAAATGGCT-GACAGGAAGGAAACT— AAGCAAGAGGAATACCATTTAGTTAAAGACAGGAACAGCTATACTTGGTCAGGGCAGGAA 9310 9320 9330 9340 9350 FIG. 1C

19. /35 9360 9370 9380 9390 AGCTGACACAGCAGGGACTTTCCACAAGGGGATGTCATG--GGGA GTAACTAACAGAAACAGCTGAGACTGCAGGGACTTTCCAGAAGGGGCTGTAACCAAGGGA 9370 9380 9390 9400 9410 9400 9410 9420 9430 9440 9450 GGTACTGGGGAGGAGCCGGTTGGCAACACCCACTTTCTTGATGTATAAATATCACTGCAT GGGACATGGGAGGAGCTGGTGGGGAACGCCCTCATATTCTCTGTATAAATATACCCGCTA 9430 9440 9450 9460 9470 9460 XX 10 20 30 40 TTCGCTCTGTA—TTCTGGAAGGGATTTATTACAGTGCAAGAAGACATAGAATCTTAGAC • *············ · ·····«·· ·«**«······ ··«··«· · • ···········«* « ·······* ···*«····«· ··«···« « GCTTGCATTGTACTTCTGGAAGGGATGTTTTACAGTGAAAGAAGACATAAAATCTTAAAT 9490 XX 10 20 30 40 50 60 70 80 90 ATATACTTAGAAAAGGAAGAAGGCATCATACCAGATTGGCAGATACACTCCGGA---CCA ATATACTTAGAAAAGGAAGAAGGGATAATTGCAGATTGGCAGAACTACACTCATGGGCCA 50 60 70 80 90 100 110 120 130 140 J.50 CGAATTAGATACCTAAAGATGTTTGGCTGGCTATGGAAATTAATCCCTGTAAATGTATCA GGAGTAAGATACCCAATGTTCTTTGGGTGGCTATGGAAGCTACTACCAGTAGATGTCCCA 110 120 130 140 150 160 170 180 190 200 210 GATGAGGCACAGGAGGATGAGGAGCATTATTTAGTGCACCCAGCTCAAACTTCCCAGTGG * CAAGAAGGGGAGGACACTGAGACTCACTGCTTAGTACATCCAGCACAAACAAGCAAGTTT 170 180 190 200 210 220 230 240 250 260 270 GATGACCCTTGGGGAGAGGTTCTAGCATGGAAGTTTGATCCAACTCTAGCCTACACTTAT GATGACCCCCATGGGGAGACACTAGTCTGGGAGTTTCATCCCTTGCTGGCTTATAGTTAC 230 240 250 260 270 280 290 300 310 GAGGCATATATTAGATACCCAGAAGAGTTTGGAAGCA cicccTiTuuccc 290 (fig.lC-sulte 1) FIG. 2

20. /35 ( HIV-2.P 1 versus ( HIV-l.P KIV2------ K1V1------ _ env4 -.! 1 ί Q LL IA WRWGWKWGTM I LA — S AC L v It * LLG1LMICSA YfCTCYVTVr Y ----— gv?t-?k::ati hrvkekyqkl i-it * TEKLVVTVYY * * * κ it * * CVPVWKEATT 60 70 env5 90 100 KIV2------ PLFCATiRKR- -DT-UG T1QCLPDNDD YOEITL-NVT EAFDAWNNTY ★ it it * *· * * 1· * * ·· it it * * * * Ε IV1------ TLFCASDAKA YDTEVEMVWA THACVPTDPN PQEVVLVKVT ENFNMVKKDH 110 [ 120 env6 i3o 140 150 El V2------ TEQA1EDVVH LFETSIKPCV KLTPLCVAKKlCSSTESSTCN N7TSKSTSTT ** ♦ It « * 11 «11« * it It it it HXV1------ VEQMHEDIIS LWDQSLKPCV XLTPLCVSLK CTDL----GK ATNTHSSNTN 160 170 1 80 1 90 200 EIV2------ —TTTPTDQE QEISEDTPCA RADNCSGLGE EETINCQFXM TGLERDKKKQ ★ ttt HIV1------ SSSGEMMHEK GEIK-- -NCSFKIS TSIRGKVQKE YAFFYKLDII 210 220 ^30 env7 240 I 250 HIV2------ Y—NET-WYS KVVCETNNSI NQTQCYHNHC HTSVITESCD KHYWDjAIRFR * * * * ****** ♦ ftlVl------ 11 1 Ο V X A Λ V C 260 env8 270 280 290 300 HIV2------ YCAPPGYALL RC-NDT-NYS CFAPNCSKVV ASTCTRMMET QTSTWF-GFN ttt* * * * ♦ ★ * * * * * *♦ ** it H1V1------ YCAPACFAIL 1 KCNNKTFNGT CP-CTHVS TVQCTHGIRP VVSTQLLL-N 310 320 330 340 350 ΚΙΫ2------ GTRAE-H RTYIYWHGRD K-RTII-SLN KYYKLSLECK RPCNKTVKQI * t* * * it tit * ** * * * * KIV1------ GSLAEEEVVI RSANFT-D KAKTIIVQLH QSVE—IHCT RPKNHTRKSI 360 A env9 380 390 400 BIV2------ MLMS—CHVF KSHYQPINKR PROAVCWFKC -KVKDAHQEV KETLAKHPRY * * * It X* It it KIV1------ RIQRGPGRAF vtigkic::— MRQAHCNISR AKWIiAT-L KQ1ASKLREQ FIG. 2

21. /35 I envlO 410 i 420 430 440 450 Η I V 2------ RCTNDTRNIS F AA P GKCSDP EVAYKWTNCR GEFLYCKtlTU FLU — WI--* ★ * * * * * * if ·* *<.·*««** « * HIVl------ FG11HKT— II FKQSS-GGDP EIVTHSFNCC CEFFYCNSTQ LFNSTVJFNST HIV2------ HIVl------ 460 Jr 470 kthhnyapch envll 480 IKOI1NTVHK 490 VGRKVYA.PPR 500 EGELSCNSTV * * SGQIRCSSNI tr I< WSTEGSNNTE 1 ** CSDT1TLPCR *** ** * IKQF1NMWQE * * * * * VGRAMYAPPI 510 520 530 540 550 HIV2------ TSIIANIDWQ HNNQTNITFS AEVAELYRL- —ELGDYKLV EITPIGFAPT * * *** * ** ** * ♦ ♦ ♦ «·*« HIVl------ TGLLLTRDGG KNHNGSEIFR PGCGDHRDNW RSELYKYKVV K1EPLGVAPT env3 560 570 580 590 600 HIV2------ KEKRYSSAHG RHTRCVFVLG —FLGFLATA GSAHGAAS—· LTVSAQSRTL * ** * *- * * ***** * ** *** * *** * * * HIVl------ KAKRR--VVQ. REKRAVGI-C ALFLGFLGAA GSTMGARSHT LTVQA--RQL 610 620 630 | 640 envl 650 HIV2------ lacivqqqqq LLDVVKRQQE LLRLTVWGTK NLOARVTAIE KYLQDOARLN * ****** ** ** ** ***** * **** * * ** ** * HIVl------ LSGIVQQQNN LLRAIEAQQH LLQLTVWGIK QLQARILAVE RYLKDQQLLG 660 670 680 690 700 HIV2------ SHCCAFROVC HTTVPW-- VXDSLAPDVD HHTWQEWEKQ VRYLEAKISK *** ★ * *** * * * **** ** * HIVl------ IUGCSGKLIC TTAVPWliASt SKKSLEQIWH SHTWHEVDRE XHHYTSL1HS 710 env2. 720 730 7 40 7 50 HIV2------ SLEQAQIQQE KNMYELQKLN SUDIFGNUFD LTSWVKYIQY GVLIIVAVIA * * *★* ** ** * *** * * ** * HIVl------ LIEESQNQQE KNEQELLELD KWASLWNWFH ITNWLWYIKI FIMIVGGLVC (fig.2 - suite 1)

22. /35 760 770 7 80 790 800 HIV2------ LRIVIYVVQM LSRLRKGYRP V-FSSPPGYI QQIHIHKDRC QPANEETEED **x* * ***** * ** * ** HIV1------ LRIV/AVLSI VNRVRQGYSP LSFQT- -HLPTPRG PDRPEG1EEE 810 820 830 840 850 HIV2----- GCSNGGDRYW ** ** PWPIAYIRFL IRQLIRLLT- * * * -LYSIC RDLLSRSFLT **** HIVl----- GGERDRDRSI RLVNGSLA-L IWDOLRSLCL FSYHRL- RDLLLIVTRI 860 870 880 890 900 HIV2----- LQLIYQNLRD WLRLRTA—F LQYGCEWIQE AFQ— —AAA RATRETL- * * * * *** * * * * HIVl----- VELLG—RRG WEALKYWWNL LQYWSQELKH SAVSLLNATA 1AVAEGTDRV 910 920 930 938 BIV2------ -AGACRG LVRVLERIGR GILAVPRRIR QGAEIALL ** ***** ** * ** HIVl------ IEVVQGACRA -IRHIPRRIR QGLERILL (fig. 2 - suite 2) FIG. 3

23. /35 ( ENV-mac ( versus f . SXV-RCC 10 20 30 40 50 MCCLGNQLLIAlC—SKCLW11 CIQYVTVFYGVPAHRNATIPLFCATKNRDTWGTTQCL • ···*··* · · · · ········«· · ··««·««··« ·£·«·«« · * « • ···««·· · · · · «····«···· · ····«««··· «··«««· ··« MM---NOLLI AILLASACLVY-CTOYVTVFYGVPTWKNATIPLFCATRNRDTWGTIQCL 10 20 30 40 50 60 70 80 90 100 110 PONODYSELALNVTESFOAWENTVTEQAIEDVWQLFETSIKPCVKLSPLCITMRCNKSET ·«···· · · « · « · « · · · ·«··«··*···· «·««·«···«· ··· · · · ····«· · ····· · · · · ·····<««··♦· « · «·«····«·· «·· · · · PDNDOYQEITLNVTEAFDAWNNTVTEQAIEOVWKLFETSIKPCVKLTPLCVAMKCSSTES 60 70 80 90 100 110 120 130 140 150 160 170 DKKGLTKSSTTTASTTTTTTAKSVETRDIVNETS--PCVVHONCTGLEQEPM ISCKFNM STGNNTTSKST—STTTTTP-----T-DQEQEISEDTPCARADNCSCLGEEET INCQFNM 120 130 140 150 160 180 190 200 210 220 230 TGLKRDKKKEYNETWYSAOLVCEQGNSTGNESRCYMNHCNTSVIQECCDKDYWDAIRCRY TGLERDKKKQYNETMYSKDVVCETNNST-NQTQCYMNHCNTSVITESCDKHYWDAIRFRY 170 180 > 190 200 210 220 240 250 260 270 280 290 CAPPGYALLRCNDTNYSGFKPNCSKVVVSSCTRMMETQTSTHFRFNGTRAENRTYIYWHG ·····«···«········« ···«««· · ····«········ ················ ·*··*····«·«««·«·« ······· · ···«·· ······ ················ CAPPGYALLRCNDTNYSGFAPNCSKVVASTCTRMMETQTSTWFGFNGTRAENRTYIYWHG 230 240 250 260 270 280 300 310 320 330 340 350 fcONRTIISLNKHYNLTMKCRRPGNKTVLPVTIMSALVFHS—QPVNERPKOAHCRFGGNW RDNRTIISLNKYYNLSLHCKRPGNKTVKQIMLMSGHVFHSHYQPINKRPRQAWCWFKGKW 290 300 310 320 330 340 360 370 380 390 400 KEAIKEVKQTIVKHPRYTGTNNTDKINLTAPRGG-DPEVTFHWTNCRGEFLYCKMNWFLN koahqevketlakhpryrgtndtrnisfaapgkgsopevaymktncrgeflycnmtwfln 350 360 370 380 390 400 FIG

24. /35 420 430 440 450 460 WVEDR5LTT QKPKERHKRNYVPCHIRQIINTWHKVGKNVYLPPREGOLTCNSTVTSLIAN WIEN--------KT-H-RNYAPCHIKQI INTWHKVCRNVYLPPRECELSCNSTYTSI IAN 410 420 430 440 450 480 490 500 510 520 1NKT0CNQTSITMSAEVAELYRLELGOYKLV£ITPI CLAPTNVKRYTTG-GTSRNKRGVF Λ « · · · · · ·······«················ ··· «·· · « ···« ; · ··· · · ····«············«·····* · * · ··· « · «··· IDWONNNQTNITFSAE VAELYRL6LGD YKL VE I TP I GFAPTKE KRYSSAHG—RHTRGVF 460 470 480 490 500 510 540 550 560 570 580 VLGFLGFLATAGSAMGAASLTYTAQSRTLLAGIVQQQQQLLOWKRQQELLRLTVWGTKN <········«·«·········· ·*····«······«····««·······*···«···«« «Λ·····*·············· ·····«·····«·*··········«······««··«« VLGFLGFLATAGSAHCAASLTVSAOSRTLLAGIYQQQQQLLDVVKRQQELLRLTYWGTKN 520 530 540 550 560 570 600 610 620 630 640 LQTRVSA IEKYLKDOAQLNAWGCAFRQVCHTTVPWP NASLTPDWNNETWQEWERKVDFLE LQARVTAIEKYLQDQARLNSWGCAFRQYCHTTVPWVNDSLAPDWDNHTWQEWEKQVRYLE 580 590 600 610 620 630 660 670 680 690 700 AN ITALLEEAQIQQEKNHYELQKLNSWDVFGNWFDLTSWIKYIQYGIY11VGVILLRIVI ANISKSLEQAQIQQEKNHYELOKLNSWDIFGNWFDLTSWVKYIQYGVL11VAYIALRIVI 640 650 660 670 680 690 720 730 740 750 760 YIVQMLARLROGYRPVFSSPPSYFQ*THTQQOPALPTKEGKKGDGGGSGGNSSWPWQIEY ««····«·····«······« · · · · · · · · · · · · · · ··«····«········<··· · · · · · · · · « «···« YVVQHLSRLRKGYRPVFSSPPGYIQQIHIHKDRGQPANEETEEDGGSNGGDRYWPWPIAY 700 710 720 730 740 750 780 790 800 810 820 IHFLIRQLIRLLTWLFSNCRTLLSRAYQILQPIFORLSATYGEFGEVLRLELTYLQYGWS IHFLIRQLIRLLTRLYSICROLLSRSFLTLQLIYONLRDK-------LRLRTAFLQYGCE 760 770 780 790 800 840 850 860 870 880 YFQEAVQAA-RDLRQRLLRA-RGEKLWEALQRGGRWILAIPRR1RQGLELTLL WIQEAFOAAARATRETLAGACRG—LKRVLERIGRGILAVPRRIRQGAEIALL 810 820 830 640 850 (fig. 3-suite 1)

25. /35 (GAG-mac FIG. 4 (versus (GAG-ROD 10 20 30 40 50 VQHKKEIAYFYPGRDNKIEHEMGARNSVLSGKKAOELEKIRLRPGGKKKYMLKHVVHAAN :::::::: :::::::: ::: ::::: MGARNSVLRGKKADELERIRLRPGGKKKYRLKHIVHAAN 10 20 30 70 80 90 100 110 ELDRFGLAESLLENKEGCQKILSVLAPLVPTGSENLKSLYNTVCVIHCIHAEEKVKHTEE KLDRFGLAESLLESKEGCQKILTVLDPMVPTGSENLKSLFNTVCVIWCIHAEEKVKDTEG 40 50 60 70 80 90 130 140 150 160 170 akgivgrhlvmetgtaetmpktsrptapfsgrcgnypyqqiggnythlplsprtlnawvk AKQIVRRHLVAETGTAEKMPSTSRPTAPSSEKGGNYPVQHVGGNYTHIPLSPRTLNAWVK 100 110 120 130 140 150 190 200 210 220 230 LI EEKKFGAEVYSGFQALSEGCLPYDINQMLNCYGDHQAAMQIIROIINEEAADHDLQHP LVEEKKFGAEVVPGFQALSEGCTPYDlNQMLNCYGDHQAAMQII RE IINEEAAEWDVQHP 160 170 180 190 200 210 250 260 270 280 290 QQAPQO-GQLREPSGSDIAGTTSTVEEQIQHMYRQQNPIPVGNIYRRWIQLGLQKCVRMY IPGPLPAGQLREPRGSDIAGTTSTVEEQIQWMFRPGNPVPVGNIYRRWIQIGLQKCVRMY 220 230 240 250 260 270 3^0 310 320 330 340 350 NPTNILDVKQGPKEPFQSYVDRFYKSLRAEQTDPAVKNHMTQTLLIQNANPDCKLVLKGL NPTNILOIKQGPKEPFQSYVDRFYKSLRAEQTDPAVKNWMTQTLLVQNANPDCKLVLKGL 280 290 300 310 320 330 360 370 380 390 400 410 GTNPTLEEMLTACQGVGGPGQKARLMAEALKEALAPAP IPFAAAQQKGPRKPIKCWNCGK « ····*««···········«···«······« «······«··* * · *······ GRNPTLEEMLTACQGVGGPGQKARLMAEALKEVIGPAPIPFAAAQQ--RKAFKCWNCGK 340 350 360 370 ‘ 380 390 FIG. 4

26. /35 420 430 440 450 460 470 eghsarocraprrogcwkcgkmohvmakcpnroagflglgphgkkprnfpmaqvhqgltp EGHSAROCRAPRROGCWKCGKPGHIHTNCPDRQAGFLGLGPWGKKPRNFPVAOVPQGLTP 400 410 420 430 440 450 480 490 500 510 TAPPEEPAVOLLKNYMHLGKOORESRGKPYKEVTEDLLHL------------------NS • · · · «····· « · ···««· ««a········· ·· • · · · ««·««« «· ·« ««« « ··«<««····«· ·· tappvopavollekymoogkrqreqrerpykevtedllhleogetpyrepptedllhlns 460 470 480 490 500 510 (fig.4 - suite 1)

27. /35 ( POL-mac F IG . 5 ( versus ( POL-ROD 10 20 30 40 50 vlelwegrtlckahqspkktgmlehkkngpcygqmpkqtggffrpwplgkeapqfphgss • · · · » • ••/baa·· · · · e · · « · · « ·· · · · · a · · TGRFFRTGPLGKEAPQLPRGPS 10 20 70 80 90 100 ASGADANCSPRRTSCGSAKELHALGQAAERKOREALOGGDRGF----------------• · · · a a « · a a a a a a a a a a a a a · a · a · a · « a a a · a a a a · SAGADTNSTPSGSSSGSTGE1YAAREKTERAERETIQGSDRGLTAPRAGGDTIQGATNRG 30 40 50 60 70 80 110 120 130 140 150 160 -AAPQFSLWRRPYYTAHIEGQPYEYLLDTGADDSIYTGIELGPHYTPKIVGGIGGFINTK LAAPQFSLWKRPVVTAYIEGQPVEVLLOTCADDSIVAGIELGNNYSPKIVGGIGGFINTK 90 100 110 120 130 140 170 180 190 200 210 220 EYKNVEIEVLGKRIKGTIMTGOTPINIFGRNLLTALGHSLNLPIAKVEPVKSPLKPGKDG EYKNVEIEYLNKKVRATIMTGDTPINIFGRNILTALGMSLNLPVAKVEPIKIHLKPGKDG 150 160 170 180 190 200 230 240 250 260 270 280 PKLKQWPLSKEKIVALREICEKHEKDGQLEEAPPTNPYNTPTFAIKKKDKNKWRMLIDFR PKLRQHPLTKEKIEALKEICEKMEKEGQLEEAPPTNPYNTPTFAIKKKOKNKHRMLIDFR 210 220 230 240 250 260 290 300 310 320 330 340 ELNRVTODFTEVQLGIPHPAGLAKRKRITVLDIGDAYFSIPLDEEFRQYTAFTLPSVNNA « a a ······· ««a········· a a a a a a aaaaaaaaa · a · !·!!!ί!·ίίίί a a a a a a a a a a aaaaaaaaaaaa a a a a a a aaaaaaaaa a «a aaaaaaaaaaaa ELNKVTQDFTEIQLGIPHPAGLAKKRRITYLOVGDAYFSIPLHEDFRPYTAFTLPSVNNA 270 280 290 300 310 320 350 360 370 380 390 400 EPGKRYIYKVLPOGHKGSPAIFQYTMRHVLEPFRKANPDVTLVQYMOOILIASORTDLEH XXX EPGKRYIYKVLPOGWKGSPAIFQHTHRQVLEPFRKANKBVniQYHDDILlASDRTDLEH 330 340 350 360 370 380 FIG.5 26/35 510 520 530 550 550 560 OR Y VL OLKEL L NS I GE S S PE E KF OKDPP F QKMGYEL wP T K WKLOK I E LPORE T faT VNOI 0 OR V VL QL KE LLNCLGF S T POE KF CKDPP YHWMG YEL WP T K WK LOK ( QLP QKE I KTVNOIQ 390 500 510 520 530 550 570 560 590 500 510 520 KLVGVLNWAAQIYPGIKTKHLCRLIRCKMTLTEEVOWTEHAEAEYEENKIILSQEOEGCY ·······««·· **···«««····««······«·«·*·· a « a · ··· «·····«·· a ··«····«··* ·····*······««·*···*····«·· ·«·· ··· «·····«·· · KL YG YLNWAAOLYPGIKTKHLCRL1RGKHTLTEEVOWTELAEAELEENR11LSQEOECHY 550 560 570 580 590 500 530 540 550 560 570 580 YQESKPLEATVIKSQDNOWSYKIHQEDKILKVGKFAKIKNTHTNGYRLLAHVI OKIGKEA YOEEKELEATVQKDQENQHTYKIHQEEKILKVGKYAKYKNTHTNGIRLLAQVVOK IGKEA 510 520 530 550 550 560 590 600 610 620 630 640 I V I WGQVPKFHLPVEKDYKEQWWTDYWQVTWlPEWOFISTPPLYRLVFNLVKDPIEGEET ··«« ··«···*· · a a · · ···*·«·· ··« ······«· ···« · ♦ « · a · ···· ···««··« · · a · · ···«···· a a a aaaaaaaa a a a a a a · a ·· LV IWGRIPKFHLPVEREIWEQWWONYWQYTWIPDWOFVSTPPLVRLAFNLYGOPIPGAET 570 580 590 600 610 620 650 660 670 680 690 700 YYYOGSCSK0SKECKAGY1TDRGKDKVKYLEQTTNQOAELEAFLHALTDSGPKAN11 YDS : :::: ::::::::: ::::::::: :::::::::::::: ::::::::: :::::: Fytdgscnrqskegkagyvtdrgkdkvkkleqttnqoaeleafamaltdscpkvniivds 630 640 650 660 670 680 710 720 730 740 750 760 QYVNG11TGCPTESESRLVNQIIΕΕΛΙΚΚΤΕIYVAWVPAHKGIGGNQEIOHLVSQGIRQV >····' QYVMG ISASQPTESESKIVNQIIEEKiKKEAIYVAKVPAHKG IGGNQEYDHLVSQGIRQV 690 700 710 720 730 740 770 780 790 800 810 820 LFLEKIEPAOEEHSKYHSNIKELVFKFCLPRLVAKQIVOTCDKCHQKGEAIHGQVNSDLG aaaaaaaaaaaaa a a a a · a · · a a a a a a a a a a a a ··*««·····* a a aaaaaaaaaaaaa a a a a a a a a a a a a a a a a a · a « aaaaaaaaaaa · · lflekiepaqeehekyhsnvkelshkfgipnlvaroivnscaqcqqkgeaihgqvnaelg 750 760 770 780 790 800 630 840 850 860 870 880 THOMOCTHLEGKIVIYAYHVASGFIEAEVIPQETGROTALFLLKLASRWPITHLHTDNCA • •«••••••«•A* ··«·««······«····«· ···········«···*··«······· aaaaaaaaaaaaa iaaaa«a«aaaaaaaaa·· eeaaaaaaaaaaaaeaaeaeeaaaa* twqhdcthlegkiiiyavhvasgfjeaevipqescrqtalfllklasrwpithlhtonga 810 820 830 840 850 860 890 900 910 920 930 940 nfasqevkmvawwagiehtfgvpynposqcyveahnhhlknqidrireqansyetivlka :: :::::::::: ::: :::::::::::::::::::::::: ::::::: ::::::: nftsqevkhvawwicieosfgvpynpqsqgvveamnhhlknoisrireqantietiylka 870 880 890 900 910 920 950 960 970 980 990 vhchnfkrrggigdktpaerlinhitteqeiqfqosknskfknfrvyyregrdolwkgpg aaaaaaaaaaaaaaaa aaaeaaaaaaaaaaa a a a · a a a a a a ••••••••ίίϊ! ·····«······<··· <«·*·«····«*··· · · a · · · a · a a ·····«··*««· ihcmnfkrrggigomtpserlinmitteqeiqfloaknsklkdfrvyfregrdqlkkgpc 930 940 950 960 970 980 1010 1020 1030 -1040 1050 ellwkgegavilkvgtqikvyprrkakiikdygggkekdssshhedtgeareva (fig-5-suite 1)

28. 29/35 ellhkgegavlvkvgtdiki iprrkakiirdyggrqemdsgshlegaredgema 990 1000 1010 1020 1030 (fig. 5-suite 2)

29. 30/35 ( Q . m a c F IG . 6 ( v ®rsus ( Q.ROD 10 20 30 40 50 meeekrwiyyptwriperlerwhslikylkyktkdlqkacyv,phhkvgwawwtcsrvifp • · · ·«··««···· · · · ···· ·<·«««···« · **«««···«<···«««····· HEE DKRWIVVPTWRVPGRHEKWHSLYKYLKYKTKDLEKVCYVPHHKVCWAWWTCSRVIFP 10 20 30 40 50 70 80 90 100 110 LQEGSHLEVOGYWNLTPERGWLSTYAVRITWYSKDFWTDVTPEYAOILLHSTYFPCFTAG LKGNSHLEIQAYWNLTPEKGWLSSYSVR ITWYTEKFWTDVTPDCADVLIHSTYFPCFTAG 70 80 90 100 110 130 140 150 160 170 E VRRAIRGERLLSCCRFPRAHKHQVPSLQYLALRVVSHY-RSQGENPTWKQWRRDNRRSL ··«··«··· « · · · · · « · · «··«·« ··· · · · « · ·· « · « · · · ·«···«··«.«««·· · · · · ·····« ··· « · · · · · · « · · · · · EVRRAIRGEKLLSCCNYPRAHRAOVPSLOFLALVWQQNDRPQRDSTTRKQRRRDYRRGL 130 140 150 160 170 180 190.. 200 210 RVAKQNSRGDKORGGKPPTEGANFPGLAKVLGILA RLAKQDSRSHKORSSESPTPRTYFPGVAEVLEILA 190 200 210 FIG. 6

30. 31/35 ( R. mac FIG.7( versus R.ROD 10 20 30 40 50 ME---ERPPENEGPQREPWOEWVVEVLKELKEEALKHFDPRLLTALGNHIYNRHGDTLE • · ·· · · « · · · * « · · «·····«««··«· · · · ·· ······· • · ·· · · · « · · · « · · ···*···««···· ··· · « «······ ΜAEAPTELPPVOGTPLREPGOEW11EILREIKEEALKHFDPRLLIALGKYIYTRHGOTLE 10 20 30 40 50 60 70 80 90 100 GAGELIRILQRALFIHFRSGCSHSRIGQPGGGNPLSΤIPPSRSML GARELIKVLQRALFTHFRACCGHSRIGQTRGGNPLSAIPTPRNMQ 70 80 90 100 FIG. 7

31. 32/35 ( X . mac FIG.8 ( versus ( X.ROD 10 20 30 40 50 MSOPRERIPPGNSGEETI GEAFE WLNRT VEE INRE A VNHLPREL IFQVJ/QRS WE YWHDEQ • ···· ««··········«· ···«··* ««··<·«·««·«··«······· ··«··· • « · · · ·«····*«······ ···«··· «·«···········«·····«· ·««··« HTOPRETVPPGNSGEETI GEAFAWLNRTYEAINREAVNHLPRELIFQVWORSWRYWHDEQ 10 20 30 40 50 70 80 90 100 110 GMSOSYTKYRYLCLIQKALFMHCKKGCRCLGEGHGAGGWRPGPPPPPPPGLA GMSESYTKYRYLCIIQKAVYHHVRKGCTCLGRGHGPGGWRPGPPPPPPPGLV 70 80 90 100 110

32. 33/35 ( F . mac FIG.9 ( versus ( F.ROD 10 20 30 40 50 MGGAISKKRSKPPEICO-ROSCGRVGRNYGRLFK-GVEOGSSQSLGGLDKGLSSLSCEGQ ·· a a a · · a · · · * · · a · ·- a a · a · • · a a a · · a « a a a a a a a *< · a a a · MGASGSKKHSRPPRGLQERLLRARAGACGGYWNESGGEYSRFOE—GSDREQKSPSCEGR 10 20 30 40 50 60 70 80 90 100 110 KYNOGEYHNTPWRNPAEERKKLPYRKQNIOOIOEEDDDLVGIPVEARVPLRTMSYKLAID • a a a a a a a · · a a a a a a a * · a a a a a · a a a a a a a a a • ·· a a a a a ·· a a a a a a a a a a a a a a · a a a a a « a · a QYQQGDFMNTPWKDPAAEREKNLYRQQNMODVDSDDDDGYRYSVTPKYPLRPHTHRLAID 60 70 80 90 100 110 120 130 140 150 160 170 MSHFIKEKGGLEGIYYSARRHRILOIYLEKEEGI1PDWQI—HSGPGIRYLKMFGWLWKL • · · ·· a a a a a ·· a a a a a ·······«·· a · · a a a · · · * · · · * · · MSHLIKTRGGLEGMFYSERRHKILNIYLEKEEGIIADWQNYTH-GPGVRYPMFFGWLWKL 120 130 140 150 160 170 180 190 200 210 220 230 ipvnvsdeaqedeehylvhpaqtsqwddpwgevlawkfdptlaytyeayirypeefgsks • a a a a a aaaaaaaa a a a · · * a a a a ··· a a a aaaaaaaa · a a a · a a a aaaaaaaa a a a a a a a a a a a · a a a a aaaaaaaa a· vpydvpqegedtethclvhpaqtskfddphgetlywefdpllaysyeafirypeefghks 180 190 200 210 220 230 240 250 260 GLSEKEVKRRLAARGLLEMAORKETS • · · · · · · a · · K · «aaaaaaaa GLPEEEWKARLKARGIPFS 240 250 FIG. 9

33. 34/35 ( TAT.mac FIG.10 ( versus ( TAT.ROD 10 20 30 40 50 MET PLREQENSLESSNERSSYISEAAAA IPESANLGEΕILSQLYRPLEACYNTCYCKKCC ME TPLKAPE SSLKSCNEPFSRTS EQDVATQELARQGE E IL SQL YRPL Ef CNNS CYCKRCC 10 20 30 40 50 70 80 90 100 110 YHCQFCFLKKGLGISYEKSHRRRRTPKKAKANTSSASNERP---I PNRIRLCOPKKAKKE ··««·········· ········· · · · « « « · · ··«·*··«······ «·«···««· « · · « ··· · YHCQMCFLNKGLGICYERKGRRRRTPKKTKTHPSPT----POKSI STRTGDSQPTKKQKK 70 80 90 100 110 120 130 TVEAAYATAPGLGR. • « · · « · « «· ··*· « · · « · TVEATVETDTGPGR 120 130 FIG. 10

34. 35/35 ART . mac FIG.11 ( versus ( ART.ROD 10 20 30 40 50 mrshtgeeelrrrlrlihllhqtskyglskksaayrhllvdpyptgsgsanqrrqkrrrw : :: : : :::: ::::: ::: : : : χ:: :::: KNERADEEGLQRKLRL IRLLHQTN-----------------PYPQGPGTASQRRNRRRRW 10 20 30 40 70 80 90 100 110 RQRWQOLLALADRIYSFPOPPTDTPLDLAIQQLQNLA IESIPDPPTNIPEALCDLRRIRR « · · * «···· · · ·*«·· · ··· ·· ·· · · ····· · · ··· · «···· · « · ♦ · · · · · * · · · ·· · · ····· · · KORWROILALAOS IYTFPOPPAOSPLDQTIOHLQGLTIQELPDPPTHLPESORLAET 50 60 70 80 90 100 SPQA FIG. 11