MXPA00009026A - Methods for making character strings, polynucleotides and polypeptides having desired characteristics - Google Patents
Methods for making character strings, polynucleotides and polypeptides having desired characteristicsInfo
- Publication number
- MXPA00009026A MXPA00009026A MXPA/A/2000/009026A MXPA00009026A MXPA00009026A MX PA00009026 A MXPA00009026 A MX PA00009026A MX PA00009026 A MXPA00009026 A MX PA00009026A MX PA00009026 A MXPA00009026 A MX PA00009026A
- Authority
- MX
- Mexico
- Prior art keywords
- sequence
- characters
- strings
- nucleic acid
- oligonucleotides
- Prior art date
Links
- 229920000023 polynucleotide Polymers 0.000 title claims description 61
- 239000002157 polynucleotide Substances 0.000 title claims description 61
- 229920001184 polypeptide Polymers 0.000 title claims description 29
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 256
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 245
- 230000002068 genetic Effects 0.000 claims abstract description 81
- 238000005215 recombination Methods 0.000 claims abstract description 80
- 229920000272 Oligonucleotide Polymers 0.000 claims description 326
- 102000004169 proteins and genes Human genes 0.000 claims description 129
- 108090000623 proteins and genes Proteins 0.000 claims description 129
- 230000002194 synthesizing Effects 0.000 claims description 98
- 230000015572 biosynthetic process Effects 0.000 claims description 93
- 238000003786 synthesis reaction Methods 0.000 claims description 88
- 230000027455 binding Effects 0.000 claims description 56
- 239000000203 mixture Substances 0.000 claims description 55
- 230000035772 mutation Effects 0.000 claims description 55
- 150000001413 amino acids Chemical class 0.000 claims description 51
- 230000000875 corresponding Effects 0.000 claims description 47
- 230000000694 effects Effects 0.000 claims description 44
- 238000004458 analytical method Methods 0.000 claims description 36
- 239000002773 nucleotide Substances 0.000 claims description 35
- 125000003729 nucleotide group Chemical group 0.000 claims description 31
- 239000002253 acid Substances 0.000 claims description 27
- 230000014509 gene expression Effects 0.000 claims description 27
- 150000007513 acids Chemical class 0.000 claims description 24
- 238000004166 bioassay Methods 0.000 claims description 24
- 229920001850 Nucleic acid sequence Polymers 0.000 claims description 22
- 239000011159 matrix material Substances 0.000 claims description 22
- 229920000642 polymer Polymers 0.000 claims description 22
- 238000006062 fragmentation reaction Methods 0.000 claims description 20
- 238000000338 in vitro Methods 0.000 claims description 20
- 230000000295 complement Effects 0.000 claims description 18
- 108091006028 chimera Proteins 0.000 claims description 17
- 239000000758 substrate Substances 0.000 claims description 16
- 239000003446 ligand Substances 0.000 claims description 15
- 239000011248 coating agent Substances 0.000 claims description 14
- 238000000576 coating method Methods 0.000 claims description 14
- 230000004048 modification Effects 0.000 claims description 14
- 238000006011 modification reaction Methods 0.000 claims description 14
- 239000007790 solid phase Substances 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000003780 insertion Methods 0.000 claims description 12
- 230000036961 partial Effects 0.000 claims description 11
- 230000002103 transcriptional Effects 0.000 claims description 11
- 239000000126 substance Substances 0.000 claims description 10
- 108091005503 Nucleic proteins Proteins 0.000 claims description 9
- 238000009396 hybridization Methods 0.000 claims description 9
- 238000010367 cloning Methods 0.000 claims description 8
- 230000034994 death Effects 0.000 claims description 7
- 230000035897 transcription Effects 0.000 claims description 7
- 101700080605 NUC1 Proteins 0.000 claims description 6
- 230000003899 glycosylation Effects 0.000 claims description 6
- 238000006206 glycosylation reaction Methods 0.000 claims description 6
- NOESYZHRGYRDHS-UHFFFAOYSA-N insulin Chemical compound N1C(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(NC(=O)CN)C(C)CC)CSSCC(C(NC(CO)C(=O)NC(CC(C)C)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CCC(N)=O)C(=O)NC(CC(C)C)C(=O)NC(CCC(O)=O)C(=O)NC(CC(N)=O)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CSSCC(NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2C=CC(O)=CC=2)NC(=O)C(CC(C)C)NC(=O)C(C)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2NC=NC=2)NC(=O)C(CO)NC(=O)CNC2=O)C(=O)NCC(=O)NC(CCC(O)=O)C(=O)NC(CCCNC(N)=N)C(=O)NCC(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC(O)=CC=3)C(=O)NC(C(C)O)C(=O)N3C(CCC3)C(=O)NC(CCCCN)C(=O)NC(C)C(O)=O)C(=O)NC(CC(N)=O)C(O)=O)=O)NC(=O)C(C(C)CC)NC(=O)C(CO)NC(=O)C(C(C)O)NC(=O)C1CSSCC2NC(=O)C(CC(C)C)NC(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CC(N)=O)NC(=O)C(NC(=O)C(N)CC=1C=CC=CC=1)C(C)C)CC1=CN=CN1 NOESYZHRGYRDHS-UHFFFAOYSA-N 0.000 claims description 6
- 101700006494 nucA Proteins 0.000 claims description 6
- 230000001419 dependent Effects 0.000 claims description 5
- 238000005304 joining Methods 0.000 claims description 5
- 238000004519 manufacturing process Methods 0.000 claims description 5
- 108091007521 restriction endonucleases Proteins 0.000 claims description 5
- 239000007787 solid Substances 0.000 claims description 5
- 230000001629 suppression Effects 0.000 claims description 5
- 108091005771 Peptidases Proteins 0.000 claims description 4
- 239000004365 Protease Substances 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 4
- 230000002401 inhibitory effect Effects 0.000 claims description 4
- 230000000051 modifying Effects 0.000 claims description 4
- 102000016911 Deoxyribonucleases Human genes 0.000 claims description 3
- 108010053770 Deoxyribonucleases Proteins 0.000 claims description 3
- 102000033147 ERVK-25 Human genes 0.000 claims description 3
- 102000004877 Insulin Human genes 0.000 claims description 3
- 108090001061 Insulin Proteins 0.000 claims description 3
- 108010050904 Interferons Proteins 0.000 claims description 3
- 102000014150 Interferons Human genes 0.000 claims description 3
- 102000003960 Ligases Human genes 0.000 claims description 3
- 108090000364 Ligases Proteins 0.000 claims description 3
- 102000008109 Mixed Function Oxygenases Human genes 0.000 claims description 3
- 108010074633 Mixed Function Oxygenases Proteins 0.000 claims description 3
- 108010038512 Platelet-Derived Growth Factor Proteins 0.000 claims description 3
- 102000010780 Platelet-Derived Growth Factor Human genes 0.000 claims description 3
- 239000003112 inhibitor Substances 0.000 claims description 3
- 102000020504 Collagenase family Human genes 0.000 claims description 2
- 108060005980 Collagenase family Proteins 0.000 claims description 2
- 229940104302 Cytosine Drugs 0.000 claims description 2
- OPTASPLRGRRNAP-UHFFFAOYSA-N Cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 claims description 2
- 102000003972 Fibroblast Growth Factor 7 Human genes 0.000 claims description 2
- 108090000385 Fibroblast Growth Factor 7 Proteins 0.000 claims description 2
- 108050007372 Fibroblast growth factor family Proteins 0.000 claims description 2
- 102000018233 Fibroblast growth factor family Human genes 0.000 claims description 2
- 102000004269 Granulocyte Colony-Stimulating Factor Human genes 0.000 claims description 2
- 108010017080 Granulocyte Colony-Stimulating Factor Proteins 0.000 claims description 2
- 210000001624 Hip Anatomy 0.000 claims description 2
- 102000014429 Insulin-like growth factor Human genes 0.000 claims description 2
- 108050003490 Insulin-like growth factor Proteins 0.000 claims description 2
- 108010063738 Interleukins Proteins 0.000 claims description 2
- 102000015696 Interleukins Human genes 0.000 claims description 2
- 206010024324 Leukaemias Diseases 0.000 claims description 2
- 239000004367 Lipase Substances 0.000 claims description 2
- 210000000282 Nails Anatomy 0.000 claims description 2
- 102000004140 Oncostatin M Human genes 0.000 claims description 2
- 108090000630 Oncostatin M Proteins 0.000 claims description 2
- 230000000712 assembly Effects 0.000 claims description 2
- 230000001580 bacterial Effects 0.000 claims description 2
- 229960002424 collagenase Drugs 0.000 claims description 2
- 201000010099 disease Diseases 0.000 claims description 2
- 239000003623 enhancer Substances 0.000 claims description 2
- 230000002708 enhancing Effects 0.000 claims description 2
- 230000012010 growth Effects 0.000 claims description 2
- 230000002363 herbicidal Effects 0.000 claims description 2
- 239000004009 herbicide Substances 0.000 claims description 2
- 108090001060 lipase Proteins 0.000 claims description 2
- 102000004882 lipase Human genes 0.000 claims description 2
- 235000019421 lipase Nutrition 0.000 claims description 2
- 239000000813 peptide hormone Substances 0.000 claims description 2
- 239000003375 plant hormone Substances 0.000 claims description 2
- 102000005162 pleiotrophin Human genes 0.000 claims description 2
- 108010056011 pleiotrophin Proteins 0.000 claims description 2
- 102000005969 steroid hormone receptors Human genes 0.000 claims description 2
- 108020003113 steroid hormone receptors Proteins 0.000 claims description 2
- 125000003275 alpha amino acid group Chemical group 0.000 claims 3
- 239000008241 heterogeneous mixture Substances 0.000 claims 3
- 102000003745 Hepatocyte Growth Factor Human genes 0.000 claims 1
- 108090000100 Hepatocyte Growth Factor Proteins 0.000 claims 1
- 229940040461 Lipase Drugs 0.000 claims 1
- 241001367079 Una Species 0.000 claims 1
- 239000010437 gem Substances 0.000 claims 1
- 229910001751 gemstone Inorganic materials 0.000 claims 1
- 229940079322 interferon Drugs 0.000 claims 1
- 230000001172 regenerating Effects 0.000 claims 1
- 230000022983 regulation of cell cycle Effects 0.000 claims 1
- 238000007423 screening assay Methods 0.000 claims 1
- 238000003892 spreading Methods 0.000 claims 1
- 238000000126 in silico method Methods 0.000 abstract description 36
- 238000000034 method Methods 0.000 description 132
- 235000018102 proteins Nutrition 0.000 description 125
- 229920003013 deoxyribonucleic acid Polymers 0.000 description 93
- 108020004705 Codon Proteins 0.000 description 30
- 210000004027 cells Anatomy 0.000 description 30
- 238000004422 calculation algorithm Methods 0.000 description 26
- 238000006243 chemical reaction Methods 0.000 description 24
- 231100000350 mutagenesis Toxicity 0.000 description 22
- 238000002703 mutagenesis Methods 0.000 description 22
- 230000001537 neural Effects 0.000 description 19
- 238000005457 optimization Methods 0.000 description 18
- 102000004190 Enzymes Human genes 0.000 description 17
- 108090000790 Enzymes Proteins 0.000 description 17
- 238000010276 construction Methods 0.000 description 17
- 239000000047 product Substances 0.000 description 17
- 239000000543 intermediate Substances 0.000 description 15
- 102000004196 processed proteins & peptides Human genes 0.000 description 15
- 108090000765 processed proteins & peptides Proteins 0.000 description 15
- 229940110715 ENZYMES FOR TREATMENT OF WOUNDS AND ULCERS Drugs 0.000 description 14
- 229940019336 antithrombotic Enzymes Drugs 0.000 description 14
- 241000349774 Bikinia letestui Species 0.000 description 13
- 241000196324 Embryophyta Species 0.000 description 13
- 230000001404 mediated Effects 0.000 description 13
- 230000001976 improved Effects 0.000 description 12
- 239000000463 material Substances 0.000 description 12
- 229920000160 (ribonucleotides)n+m Polymers 0.000 description 11
- 241000700605 Viruses Species 0.000 description 11
- 238000010586 diagram Methods 0.000 description 11
- 230000018109 developmental process Effects 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 10
- 238000011160 research Methods 0.000 description 10
- 238000009795 derivation Methods 0.000 description 9
- 238000011161 development Methods 0.000 description 9
- 239000007788 liquid Substances 0.000 description 9
- 239000002609 media Substances 0.000 description 9
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 description 8
- 230000013016 learning Effects 0.000 description 8
- 230000003505 mutagenic Effects 0.000 description 8
- 238000001742 protein purification Methods 0.000 description 8
- 101700070228 IFN Proteins 0.000 description 7
- 101700066403 IFNA1 Proteins 0.000 description 7
- 101700023446 IFNT Proteins 0.000 description 7
- 238000007792 addition Methods 0.000 description 7
- 230000002596 correlated Effects 0.000 description 7
- 238000003199 nucleic acid amplification method Methods 0.000 description 7
- 238000002515 oligonucleotide synthesis Methods 0.000 description 7
- 102000004965 antibodies Human genes 0.000 description 6
- 108090001123 antibodies Proteins 0.000 description 6
- 229920001222 biopolymer Polymers 0.000 description 6
- 238000003776 cleavage reaction Methods 0.000 description 6
- 238000009510 drug design Methods 0.000 description 6
- 230000002255 enzymatic Effects 0.000 description 6
- 238000005755 formation reaction Methods 0.000 description 6
- UFWIBTONFRDIAS-UHFFFAOYSA-N naphthalene Chemical compound C1=CC=CC2=CC=CC=C21 UFWIBTONFRDIAS-UHFFFAOYSA-N 0.000 description 6
- 210000001519 tissues Anatomy 0.000 description 6
- 241000894006 Bacteria Species 0.000 description 5
- 102000002004 Cytochrome P-450 Enzyme System Human genes 0.000 description 5
- 108010015742 Cytochrome P-450 Enzyme System Proteins 0.000 description 5
- -1 GROß Proteins 0.000 description 5
- 230000003321 amplification Effects 0.000 description 5
- 238000000137 annealing Methods 0.000 description 5
- 230000001413 cellular Effects 0.000 description 5
- 239000003153 chemical reaction reagent Substances 0.000 description 5
- 230000001276 controlling effect Effects 0.000 description 5
- 230000001808 coupling Effects 0.000 description 5
- 238000010168 coupling process Methods 0.000 description 5
- 238000005859 coupling reaction Methods 0.000 description 5
- 230000001186 cumulative Effects 0.000 description 5
- 238000005259 measurement Methods 0.000 description 5
- 230000002503 metabolic Effects 0.000 description 5
- 231100000219 mutagenic Toxicity 0.000 description 5
- 238000002360 preparation method Methods 0.000 description 5
- 238000004805 robotic Methods 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 238000010187 selection method Methods 0.000 description 5
- 238000007619 statistical method Methods 0.000 description 5
- 229960005486 vaccines Drugs 0.000 description 5
- 206010008531 Chills Diseases 0.000 description 4
- 229920000453 Consensus sequence Polymers 0.000 description 4
- 108010028143 Dioxygenases Proteins 0.000 description 4
- 102000016680 Dioxygenases Human genes 0.000 description 4
- 108090000787 Subtilisin Proteins 0.000 description 4
- 229940035893 Uracil Drugs 0.000 description 4
- 239000000654 additive Substances 0.000 description 4
- 230000003197 catalytic Effects 0.000 description 4
- 230000003750 conditioning Effects 0.000 description 4
- 238000006073 displacement reaction Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 239000003814 drug Substances 0.000 description 4
- 238000002866 fluorescence resonance energy transfer Methods 0.000 description 4
- 238000009472 formulation Methods 0.000 description 4
- 238000010348 incorporation Methods 0.000 description 4
- XEEYBQQBJWHFJM-UHFFFAOYSA-N iron Chemical compound [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 4
- PXHVJJICTQNCMI-UHFFFAOYSA-N nickel Chemical compound [Ni] PXHVJJICTQNCMI-UHFFFAOYSA-N 0.000 description 4
- 238000010647 peptide synthesis reaction Methods 0.000 description 4
- 230000000750 progressive Effects 0.000 description 4
- 238000003753 real-time PCR Methods 0.000 description 4
- 102000005962 receptors Human genes 0.000 description 4
- 108020003175 receptors Proteins 0.000 description 4
- 230000002829 reduced Effects 0.000 description 4
- 238000002864 sequence alignment Methods 0.000 description 4
- 241000894007 species Species 0.000 description 4
- 108091006090 transcriptional activators Proteins 0.000 description 4
- 101700072041 CXCL1 Proteins 0.000 description 3
- 102100018698 CXCL1 Human genes 0.000 description 3
- 229940088598 Enzyme Drugs 0.000 description 3
- 102000003951 Erythropoietin Human genes 0.000 description 3
- 108090000394 Erythropoietin Proteins 0.000 description 3
- 241000588724 Escherichia coli Species 0.000 description 3
- 241000233866 Fungi Species 0.000 description 3
- 210000004392 Genitalia Anatomy 0.000 description 3
- 102100009534 TNF Human genes 0.000 description 3
- 101710040537 TNF Proteins 0.000 description 3
- 108060008683 Tumor Necrosis Factor Receptors Proteins 0.000 description 3
- 102000003298 Tumor Necrosis Factor Receptors Human genes 0.000 description 3
- 210000004102 animal cell Anatomy 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 3
- 239000007795 chemical reaction product Substances 0.000 description 3
- 101710027542 codAch2 Proteins 0.000 description 3
- 238000010192 crystallographic characterization Methods 0.000 description 3
- 230000004059 degradation Effects 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 230000000593 degrading Effects 0.000 description 3
- 238000001784 detoxification Methods 0.000 description 3
- 238000003379 elimination reaction Methods 0.000 description 3
- 229940105423 erythropoietin Drugs 0.000 description 3
- 238000003384 imaging method Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 150000002500 ions Chemical class 0.000 description 3
- 230000000670 limiting Effects 0.000 description 3
- 239000003550 marker Substances 0.000 description 3
- 238000010369 molecular cloning Methods 0.000 description 3
- 231100000299 mutagenicity Toxicity 0.000 description 3
- 230000001717 pathogenic Effects 0.000 description 3
- 238000003909 pattern recognition Methods 0.000 description 3
- 230000002093 peripheral Effects 0.000 description 3
- 230000000704 physical effect Effects 0.000 description 3
- 229920001690 polydopamine Polymers 0.000 description 3
- 238000000513 principal component analysis Methods 0.000 description 3
- 238000000746 purification Methods 0.000 description 3
- 230000003252 repetitive Effects 0.000 description 3
- 230000001850 reproductive Effects 0.000 description 3
- 238000003757 reverse transcription PCR Methods 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 238000006467 substitution reaction Methods 0.000 description 3
- 230000001225 therapeutic Effects 0.000 description 3
- 230000003612 virological Effects 0.000 description 3
- 101700023105 3L21 Proteins 0.000 description 2
- 108010011619 6-Phytase Proteins 0.000 description 2
- 102000002281 Adenylate Kinase Human genes 0.000 description 2
- 108020000543 Adenylate Kinase Proteins 0.000 description 2
- 208000000409 Breast Neoplasms Diseases 0.000 description 2
- 102100008428 CCL2 Human genes 0.000 description 2
- 101710040446 CD40 Proteins 0.000 description 2
- 102100013137 CD40 Human genes 0.000 description 2
- 102000033243 CDKN2A Human genes 0.000 description 2
- 101710022338 CDKN2A Proteins 0.000 description 2
- 229920001405 Coding region Polymers 0.000 description 2
- 102000007644 Colony-Stimulating Factors Human genes 0.000 description 2
- 108010071942 Colony-Stimulating Factors Proteins 0.000 description 2
- 102000004127 Cytokines Human genes 0.000 description 2
- 108090000695 Cytokines Proteins 0.000 description 2
- 101700033006 EGF Proteins 0.000 description 2
- 102100010813 EGF Human genes 0.000 description 2
- 102100007405 FGF7 Human genes 0.000 description 2
- 101700033323 FGF7 Proteins 0.000 description 2
- 108091005957 GFP derivatives Proteins 0.000 description 2
- 102000002464 Galactosidases Human genes 0.000 description 2
- 108010093031 Galactosidases Proteins 0.000 description 2
- 102000018997 Growth Hormone Human genes 0.000 description 2
- 108010051696 Growth Hormone Proteins 0.000 description 2
- 229940047124 Interferons Drugs 0.000 description 2
- 108010002352 Interleukin-1 Proteins 0.000 description 2
- 102100001056 KITLG Human genes 0.000 description 2
- 101710028765 KITLG Proteins 0.000 description 2
- 101700028499 LECG Proteins 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 2
- 206010028980 Neoplasm Diseases 0.000 description 2
- 210000000440 Neutrophils Anatomy 0.000 description 2
- 241000842783 Orna Species 0.000 description 2
- LLKYUHGUYSLMPA-UHFFFAOYSA-N Phosphoramidite Chemical compound NP([O-])[O-] LLKYUHGUYSLMPA-UHFFFAOYSA-N 0.000 description 2
- MUMGGOZAMZWBJJ-DYKIIFRCSA-N Testostosterone Chemical compound O=C1CC[C@]2(C)[C@H]3CC[C@](C)([C@H](CC4)O)[C@@H]4[C@@H]3CCC2=C1 MUMGGOZAMZWBJJ-DYKIIFRCSA-N 0.000 description 2
- 102000006601 Thymidine Kinase Human genes 0.000 description 2
- 108020004440 Thymidine Kinase Proteins 0.000 description 2
- 231100000765 Toxin Toxicity 0.000 description 2
- 229910052770 Uranium Inorganic materials 0.000 description 2
- HBOMLICNUCNMMY-XLPZGREQSA-N Zidovudine Chemical compound O=C1NC(=O)C(C)=CN1[C@@H]1O[C@H](CO)[C@@H](N=[N+]=[N-])C1 HBOMLICNUCNMMY-XLPZGREQSA-N 0.000 description 2
- 229960002555 Zidovudine Drugs 0.000 description 2
- 102000012086 alpha-L-Fucosidase Human genes 0.000 description 2
- 108010061314 alpha-L-Fucosidase Proteins 0.000 description 2
- 239000011324 bead Substances 0.000 description 2
- 230000033228 biological regulation Effects 0.000 description 2
- 230000001488 breeding Effects 0.000 description 2
- BBBFJLBPOGFECG-VJVYQDLKSA-N calcitonin Chemical compound N([C@H](C(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC=1NC=NC=1)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(=O)NCC(=O)N[C@@H](CO)C(=O)NCC(=O)N[C@@H]([C@@H](C)O)C(=O)N1[C@@H](CCC1)C(N)=O)C(C)C)C(=O)[C@@H]1CSSC[C@H](N)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H]([C@@H](C)O)C(=O)N1 BBBFJLBPOGFECG-VJVYQDLKSA-N 0.000 description 2
- 201000011510 cancer Diseases 0.000 description 2
- 239000000969 carrier Substances 0.000 description 2
- 238000006555 catalytic reaction Methods 0.000 description 2
- 238000004113 cell culture Methods 0.000 description 2
- 239000002975 chemoattractant Substances 0.000 description 2
- 230000001889 chemoattractant Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010205 computational analysis Methods 0.000 description 2
- 238000005520 cutting process Methods 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 230000029087 digestion Effects 0.000 description 2
- 239000000539 dimer Substances 0.000 description 2
- 239000006185 dispersion Substances 0.000 description 2
- 229940079593 drugs Drugs 0.000 description 2
- 241001493065 dsRNA viruses Species 0.000 description 2
- 238000001976 enzyme digestion Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- PEDCQBHIVMGVHV-UHFFFAOYSA-N glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 2
- 239000003102 growth factor Substances 0.000 description 2
- 239000000122 growth hormone Substances 0.000 description 2
- 230000002458 infectious Effects 0.000 description 2
- 230000002757 inflammatory Effects 0.000 description 2
- 230000002452 interceptive Effects 0.000 description 2
- 238000009114 investigational therapy Methods 0.000 description 2
- 229910052742 iron Inorganic materials 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 238000007834 ligase chain reaction Methods 0.000 description 2
- 239000012528 membrane Substances 0.000 description 2
- 230000003278 mimic Effects 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 229910052759 nickel Inorganic materials 0.000 description 2
- 230000003287 optical Effects 0.000 description 2
- 210000000056 organs Anatomy 0.000 description 2
- 230000037361 pathway Effects 0.000 description 2
- 238000006366 phosphorylation reaction Methods 0.000 description 2
- 230000000865 phosphorylative Effects 0.000 description 2
- 229940085127 phytase Drugs 0.000 description 2
- 230000001681 protective Effects 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 230000008929 regeneration Effects 0.000 description 2
- 238000011069 regeneration method Methods 0.000 description 2
- 101710022861 rub Proteins 0.000 description 2
- 230000001568 sexual Effects 0.000 description 2
- 230000019491 signal transduction Effects 0.000 description 2
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N silicon dioxide Inorganic materials O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 2
- 238000010532 solid phase synthesis reaction Methods 0.000 description 2
- 230000003595 spectral Effects 0.000 description 2
- 210000004215 spores Anatomy 0.000 description 2
- 239000007858 starting material Substances 0.000 description 2
- 230000001360 synchronised Effects 0.000 description 2
- 238000001308 synthesis method Methods 0.000 description 2
- 238000010189 synthetic method Methods 0.000 description 2
- 239000003053 toxin Substances 0.000 description 2
- 108020003112 toxins Proteins 0.000 description 2
- BNIFSVVAHBLNTN-XKKUQSFHSA-N (2S)-4-amino-2-[[(2S)-2-[[(2S)-2-[[(2S)-2-[[(2S)-1-[(2S)-4-amino-2-[[2-[[(2S)-2-[[(2S)-2-[[(2S)-1-[(2S)-6-amino-2-[[(2S)-2-[[(2S)-2-[[(2S,3R)-2-amino-3-hydroxybutanoyl]amino]-4-methylsulfanylbutanoyl]amino]-5-(diaminomethylideneamino)pentanoyl]amino]hexan Chemical compound C[C@@H](O)[C@H](N)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCCCN)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CO)C(=O)NCC(=O)N[C@@H](CC(N)=O)C(=O)N1[C@H](C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(N)=O)C(O)=O)CCC1 BNIFSVVAHBLNTN-XKKUQSFHSA-N 0.000 description 1
- 101700012833 3S11 Proteins 0.000 description 1
- XRZWVSXEDRYQGC-UHFFFAOYSA-N 4-cyclohexylpyrrolidin-1-ium-2-carboxylate Chemical compound C1NC(C(=O)O)CC1C1CCCCC1 XRZWVSXEDRYQGC-UHFFFAOYSA-N 0.000 description 1
- 241000186046 Actinomyces Species 0.000 description 1
- 108091022082 Acyl transferases Proteins 0.000 description 1
- 102000019632 Acyl transferases Human genes 0.000 description 1
- PQSUYGKTWSAVDQ-ZVIOFETBSA-N Aldosterone Chemical compound C([C@@]1([C@@H](C(=O)CO)CC[C@H]1[C@@H]1CC2)C=O)[C@H](O)[C@@H]1[C@]1(C)C2=CC(=O)CC1 PQSUYGKTWSAVDQ-ZVIOFETBSA-N 0.000 description 1
- 241000004176 Alphacoronavirus Species 0.000 description 1
- 229920002287 Amplicon Polymers 0.000 description 1
- 102400000068 Angiostatin Human genes 0.000 description 1
- 108010079709 Angiostatins Proteins 0.000 description 1
- 102000007592 Apolipoproteins Human genes 0.000 description 1
- 108010071619 Apolipoproteins Proteins 0.000 description 1
- 108010083590 Apoproteins Proteins 0.000 description 1
- 102000006410 Apoproteins Human genes 0.000 description 1
- 241000239290 Araneae Species 0.000 description 1
- 241000712891 Arenavirus Species 0.000 description 1
- DJHGAFSJWGLOIV-UHFFFAOYSA-K Arsenate Chemical compound [O-][As]([O-])([O-])=O DJHGAFSJWGLOIV-UHFFFAOYSA-K 0.000 description 1
- 241000228212 Aspergillus Species 0.000 description 1
- 102000002723 Atrial Natriuretic Factor Human genes 0.000 description 1
- 101800001288 Atrial natriuretic factor Proteins 0.000 description 1
- 101800001866 Atrial natriuretic peptide Proteins 0.000 description 1
- 241000972773 Aulopiformes Species 0.000 description 1
- 241000588807 Bordetella Species 0.000 description 1
- 241000589968 Borrelia Species 0.000 description 1
- 241000589562 Brucella Species 0.000 description 1
- 102000001902 CC Chemokines Human genes 0.000 description 1
- 108010040471 CC Chemokines Proteins 0.000 description 1
- 101700006000 CCL2 Proteins 0.000 description 1
- 102100016449 CCL5 Human genes 0.000 description 1
- 101700078950 CD44 Proteins 0.000 description 1
- 102100003735 CD44 Human genes 0.000 description 1
- 102000019388 CXC chemokine Human genes 0.000 description 1
- 108050006947 CXC chemokine Proteins 0.000 description 1
- 102100009641 CXCL10 Human genes 0.000 description 1
- 101710032181 CXCL10 Proteins 0.000 description 1
- 101700012002 CXCL5 Proteins 0.000 description 1
- 102100009682 CXCL5 Human genes 0.000 description 1
- 102100009685 CXCL6 Human genes 0.000 description 1
- 101700033050 CXCL6 Proteins 0.000 description 1
- 108060001064 Calcitonin Proteins 0.000 description 1
- 102400000113 Calcitonin Human genes 0.000 description 1
- 229960004015 Calcitonin Drugs 0.000 description 1
- 241000222120 Candida <Saccharomycetales> Species 0.000 description 1
- NSQLIUXCMFBZME-MPVJKSABSA-N Carperitide Chemical compound C([C@H]1C(=O)NCC(=O)NCC(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@H](C(NCC(=O)N[C@@H](C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CO)C(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](CSSC[C@@H](C(=O)N1)NC(=O)[C@H](CO)NC(=O)[C@H](CO)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CO)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(O)=O)=O)[C@@H](C)CC)C1=CC=CC=C1 NSQLIUXCMFBZME-MPVJKSABSA-N 0.000 description 1
- 210000002421 Cell Wall Anatomy 0.000 description 1
- 102000010991 Chaperonin Cpn60 Human genes 0.000 description 1
- 108050001186 Chaperonin Cpn60 Proteins 0.000 description 1
- 108010055292 Chemokine CCL2 Proteins 0.000 description 1
- 108010055166 Chemokine CCL5 Proteins 0.000 description 1
- 241000606161 Chlamydia Species 0.000 description 1
- 241001112696 Clostridia Species 0.000 description 1
- 229920000062 Coding strand Polymers 0.000 description 1
- 102000008186 Collagen Human genes 0.000 description 1
- 108010035532 Collagen Proteins 0.000 description 1
- 102000016917 Complement C1s Human genes 0.000 description 1
- 108010028774 Complement C1s Proteins 0.000 description 1
- 108010078546 Complement C5a Proteins 0.000 description 1
- OMFXVFTZEKFJBZ-HJTSIMOOSA-N Corticosterone Chemical compound O=C1CC[C@]2(C)[C@H]3[C@@H](O)C[C@](C)([C@H](CC4)C(=O)CO)[C@@H]4[C@@H]3CCC2=C1 OMFXVFTZEKFJBZ-HJTSIMOOSA-N 0.000 description 1
- 101710007399 DDS Proteins 0.000 description 1
- 101710007887 DHFR Proteins 0.000 description 1
- 102100005838 DHFR Human genes 0.000 description 1
- 108009000206 DNA Mismatch Repair Proteins 0.000 description 1
- 230000004544 DNA amplification Effects 0.000 description 1
- 108090000626 DNA-directed RNA polymerases Proteins 0.000 description 1
- 102000004163 DNA-directed RNA polymerases Human genes 0.000 description 1
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 1
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 1
- 241000668709 Dipterocarpus costatus Species 0.000 description 1
- 241001661194 Dives Species 0.000 description 1
- 102100006567 EXOC1 Human genes 0.000 description 1
- 108010042407 Endonucleases Proteins 0.000 description 1
- 102000004533 Endonucleases Human genes 0.000 description 1
- 241000224431 Entamoeba Species 0.000 description 1
- 231100000655 Enterotoxin Toxicity 0.000 description 1
- 229940116977 Epidermal Growth Factor Drugs 0.000 description 1
- 241000206602 Eukaryota Species 0.000 description 1
- 231100000776 Exotoxin Toxicity 0.000 description 1
- 102100009906 F7 Human genes 0.000 description 1
- 102100006624 F9 Human genes 0.000 description 1
- 101710003421 FGF Proteins 0.000 description 1
- 102100014166 FGL1 Human genes 0.000 description 1
- 101700043581 FGL1 Proteins 0.000 description 1
- 102100008658 FN1 Human genes 0.000 description 1
- 101700006177 FUT2 Proteins 0.000 description 1
- 102100019331 FUT2 Human genes 0.000 description 1
- 108010076282 Factor IX Proteins 0.000 description 1
- 108010023321 Factor VII Proteins 0.000 description 1
- 229960000301 Factor VIII Drugs 0.000 description 1
- 108010054218 Factor VIII Proteins 0.000 description 1
- 102000001690 Factor VIII Human genes 0.000 description 1
- 108010014173 Factor X Proteins 0.000 description 1
- 229940012952 Fibrinogen Drugs 0.000 description 1
- 108010049003 Fibrinogen Proteins 0.000 description 1
- 102000008946 Fibrinogen Human genes 0.000 description 1
- 229940019698 Fibrinogen containing hemostatics Drugs 0.000 description 1
- 108010067306 Fibronectins Proteins 0.000 description 1
- 241000710831 Flavivirus Species 0.000 description 1
- 108091006011 G proteins Proteins 0.000 description 1
- 101700048391 GCP2 Proteins 0.000 description 1
- 108091000058 GTP-Binding Proteins Proteins 0.000 description 1
- 102000030007 GTP-Binding Proteins Human genes 0.000 description 1
- 241000224466 Giardia Species 0.000 description 1
- 240000001340 Gmelina philippensis Species 0.000 description 1
- 102000006771 Gonadotropins Human genes 0.000 description 1
- 108010086677 Gonadotropins Proteins 0.000 description 1
- 208000001786 Gonorrhea Diseases 0.000 description 1
- 206010018612 Gonorrhoea Diseases 0.000 description 1
- 108010043121 Green Fluorescent Proteins Proteins 0.000 description 1
- 102000004144 Green Fluorescent Proteins Human genes 0.000 description 1
- 240000004282 Grewia occidentalis Species 0.000 description 1
- 101700021054 HCC1 Proteins 0.000 description 1
- 101700042506 HIRUD Proteins 0.000 description 1
- 101700086186 HPS1 Proteins 0.000 description 1
- 241000606790 Haemophilus Species 0.000 description 1
- 241000589989 Helicobacter Species 0.000 description 1
- 102000001554 Hemoglobins Human genes 0.000 description 1
- 108010054147 Hemoglobins Proteins 0.000 description 1
- 241000700721 Hepatitis B virus Species 0.000 description 1
- 210000003494 Hepatocytes Anatomy 0.000 description 1
- 241000238631 Hexapoda Species 0.000 description 1
- 229940006607 Hirudin Drugs 0.000 description 1
- 102000002265 Human Growth Hormone Human genes 0.000 description 1
- 108010000521 Human Growth Hormone Proteins 0.000 description 1
- 239000000854 Human Growth Hormone Substances 0.000 description 1
- 208000006572 Human Influenza Diseases 0.000 description 1
- 102000008100 Human Serum Albumin Human genes 0.000 description 1
- 108091006822 Human Serum Albumin Proteins 0.000 description 1
- 102100004115 ICAM1 Human genes 0.000 description 1
- 102100001475 ITGB2 Human genes 0.000 description 1
- 206010022000 Influenza Diseases 0.000 description 1
- UGQMRVRMYYASKQ-KMPDEGCQSA-N Inosine Natural products O[C@H]1[C@H](O)[C@@H](CO)O[C@@H]1N1C(N=CNC2=O)=C2N=C1 UGQMRVRMYYASKQ-KMPDEGCQSA-N 0.000 description 1
- UGQMRVRMYYASKQ-KQYNXXCUSA-N Inosine Chemical compound O[C@@H]1[C@H](O)[C@@H](CO)O[C@H]1N1C2=NC=NC(O)=C2N=C1 UGQMRVRMYYASKQ-KQYNXXCUSA-N 0.000 description 1
- 102400000022 Insulin-Like Growth Factor II Human genes 0.000 description 1
- 108090001117 Insulin-Like Growth Factor II Proteins 0.000 description 1
- 102000004218 Insulin-like growth factor I Human genes 0.000 description 1
- 108090000723 Insulin-like growth factor I Proteins 0.000 description 1
- 108010008212 Integrin alpha4beta1 Proteins 0.000 description 1
- 108010064593 Intercellular Adhesion Molecule-1 Proteins 0.000 description 1
- 108010002350 Interleukin-2 Proteins 0.000 description 1
- 108090001007 Interleukin-8 Proteins 0.000 description 1
- 229940047122 Interleukins Drugs 0.000 description 1
- 102100012430 LACTB Human genes 0.000 description 1
- 101700051639 LACTB Proteins 0.000 description 1
- 108010001831 LDL receptors Proteins 0.000 description 1
- 102100012475 LDLR Human genes 0.000 description 1
- 102100011875 LTF Human genes 0.000 description 1
- 108010054278 Lac Repressors Proteins 0.000 description 1
- 229940078795 Lactoferrin Drugs 0.000 description 1
- 108010063045 Lactoferrin Proteins 0.000 description 1
- 101800001171 Leader peptide Proteins 0.000 description 1
- 241000589248 Legionella Species 0.000 description 1
- 208000007764 Legionnaires' Disease Diseases 0.000 description 1
- 241000222722 Leishmania <genus> Species 0.000 description 1
- 241000589902 Leptospira Species 0.000 description 1
- 239000005089 Luciferase Substances 0.000 description 1
- 108060001084 Luciferase family Proteins 0.000 description 1
- 108010064548 Lymphocyte Function-Associated Antigen-1 Proteins 0.000 description 1
- 101700075357 MYC Proteins 0.000 description 1
- 229920002521 Macromolecule Polymers 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 241000203353 Methanococcus Species 0.000 description 1
- 101710017500 MitHPPK/DHPS Proteins 0.000 description 1
- 241000204031 Mycoplasma Species 0.000 description 1
- 101710042084 NAP1L4 Proteins 0.000 description 1
- 101710034254 NCU02305 Proteins 0.000 description 1
- 238000005481 NMR spectroscopy Methods 0.000 description 1
- 241000588653 Neisseria Species 0.000 description 1
- 108010015406 Neurturin Proteins 0.000 description 1
- 102000001839 Neurturin Human genes 0.000 description 1
- 241000187654 Nocardia Species 0.000 description 1
- 102000004316 Oxidoreductases Human genes 0.000 description 1
- 108090000854 Oxidoreductases Proteins 0.000 description 1
- 101710043203 P23p89 Proteins 0.000 description 1
- 101710030036 PPBP Proteins 0.000 description 1
- 102100009687 PPBP Human genes 0.000 description 1
- 208000003154 Papilloma Diseases 0.000 description 1
- 102000003982 Parathyroid hormone Human genes 0.000 description 1
- 108090000445 Parathyroid hormone Proteins 0.000 description 1
- 241000606860 Pasteurella Species 0.000 description 1
- 102000035443 Peptidases Human genes 0.000 description 1
- 108010067902 Peptide Library Proteins 0.000 description 1
- 241000709664 Picornaviridae Species 0.000 description 1
- 208000000474 Poliomyelitis Diseases 0.000 description 1
- 108050006987 Poxvirus Proteins 0.000 description 1
- RJKFOVLPORLFTN-STHVQZNPSA-N Progesterone Natural products O=C(C)[C@@H]1[C@@]2(C)[C@H]([C@H]3[C@@H]([C@]4(C)C(=CC(=O)CC4)CC3)CC2)CC1 RJKFOVLPORLFTN-STHVQZNPSA-N 0.000 description 1
- 241000589516 Pseudomonas Species 0.000 description 1
- 108010066717 Q beta Replicase Proteins 0.000 description 1
- 101700074906 RBM39 Proteins 0.000 description 1
- 108090000103 Relaxin Proteins 0.000 description 1
- 102000003743 Relaxin Human genes 0.000 description 1
- 241000702263 Reovirus sp. Species 0.000 description 1
- 241000724205 Rice stripe tenuivirus Species 0.000 description 1
- 241000606701 Rickettsia Species 0.000 description 1
- 101710023044 SAB0520 Proteins 0.000 description 1
- 102100014131 SARNP Human genes 0.000 description 1
- 101700079654 SARNP Proteins 0.000 description 1
- 108060007362 SEC2 Proteins 0.000 description 1
- 108060007364 SEC3 Proteins 0.000 description 1
- 101710024482 SETBP1 Proteins 0.000 description 1
- 101710043352 SHFL Proteins 0.000 description 1
- 102100000430 SOCS7 Human genes 0.000 description 1
- 101700061005 SOCS7 Proteins 0.000 description 1
- 241000580858 Simian-Human immunodeficiency virus Species 0.000 description 1
- 229920001533 Single-stranded nucleotide Polymers 0.000 description 1
- 206010040844 Skin exfoliation Diseases 0.000 description 1
- 108010026080 Somatomedins Proteins 0.000 description 1
- 102000013275 Somatomedins Human genes 0.000 description 1
- 102000005157 Somatostatin Human genes 0.000 description 1
- 108010056088 Somatostatin Proteins 0.000 description 1
- 229960000553 Somatostatin Drugs 0.000 description 1
- 241000589970 Spirochaetales Species 0.000 description 1
- 241000295644 Staphylococcaceae Species 0.000 description 1
- 229960005202 Streptokinase Drugs 0.000 description 1
- 108010023197 Streptokinase Proteins 0.000 description 1
- 231100000617 Superantigen Toxicity 0.000 description 1
- 102000019197 Superoxide Dismutase Human genes 0.000 description 1
- 108010012715 Superoxide Dismutase Proteins 0.000 description 1
- RJKFOVLPORLFTN-LEKSSAKUSA-N Syngestrets Chemical compound C1CC2=CC(=O)CC[C@]2(C)[C@@H]2[C@@H]1[C@@H]1CC[C@H](C(=O)C)[C@@]1(C)CC2 RJKFOVLPORLFTN-LEKSSAKUSA-N 0.000 description 1
- 101700057439 TOXA Proteins 0.000 description 1
- 101710037438 TST Proteins 0.000 description 1
- 101710037010 TUBGCP2 Proteins 0.000 description 1
- 229960003604 Testosterone Drugs 0.000 description 1
- ZRKFYGHZFMAOKI-QMGMOQQFSA-N Tgfbeta Chemical compound C([C@H](NC(=O)[C@H](C(C)C)NC(=O)CNC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@H](CC(N)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H]([C@@H](C)O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H]([C@@H](C)O)NC(=O)[C@H](CC(C)C)NC(=O)CNC(=O)[C@H](C)NC(=O)[C@H](CO)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](NC(=O)[C@H](C)NC(=O)[C@H](C)NC(=O)[C@@H](NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CCSC)C(C)C)[C@@H](C)CC)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](C)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H]([C@@H](C)O)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](C)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(=O)N1[C@@H](CCC1)C(=O)N1[C@@H](CCC1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(O)=O)C1=CC=C(O)C=C1 ZRKFYGHZFMAOKI-QMGMOQQFSA-N 0.000 description 1
- 108010046075 Thymosin Proteins 0.000 description 1
- 102000007501 Thymosin Human genes 0.000 description 1
- 108090000373 Tissue plasminogen activator Proteins 0.000 description 1
- 102000003978 Tissue plasminogen activator Human genes 0.000 description 1
- 206010044248 Toxic shock syndrome Diseases 0.000 description 1
- 231100000650 Toxic shock syndrome Toxicity 0.000 description 1
- 102000004887 Transforming Growth Factor beta Human genes 0.000 description 1
- 108090001012 Transforming Growth Factor beta Proteins 0.000 description 1
- 241000589886 Treponema Species 0.000 description 1
- 241000224526 Trichomonas Species 0.000 description 1
- 108010001801 Tumor Necrosis Factor-alpha Proteins 0.000 description 1
- 102000000852 Tumor Necrosis Factor-alpha Human genes 0.000 description 1
- 206010054094 Tumour necrosis Diseases 0.000 description 1
- 241000202898 Ureaplasma Species 0.000 description 1
- 229960005356 Urokinase Drugs 0.000 description 1
- 108090000435 Urokinase-type plasminogen activator Proteins 0.000 description 1
- 102000003990 Urokinase-type plasminogen activator Human genes 0.000 description 1
- 102100019577 VCAM1 Human genes 0.000 description 1
- 229940029983 VITAMINS Drugs 0.000 description 1
- 208000007089 Vaccinia Diseases 0.000 description 1
- 206010046865 Vaccinia virus infection Diseases 0.000 description 1
- 108010000134 Vascular Cell Adhesion Molecule-1 Proteins 0.000 description 1
- 241000711975 Vesicular stomatitis virus Species 0.000 description 1
- 241000607598 Vibrio Species 0.000 description 1
- 229940021016 Vitamin IV solution additives Drugs 0.000 description 1
- 208000001877 Whooping Cough Diseases 0.000 description 1
- 241000607734 Yersinia <bacteria> Species 0.000 description 1
- 241000607479 Yersinia pestis Species 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 239000012190 activator Substances 0.000 description 1
- 230000000996 additive Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 238000005273 aeration Methods 0.000 description 1
- 238000003314 affinity selection Methods 0.000 description 1
- 229960002478 aldosterone Drugs 0.000 description 1
- 150000001335 aliphatic alkanes Chemical class 0.000 description 1
- 101710037563 alpha-delta-Bgt-2 Proteins 0.000 description 1
- 238000005576 amination reaction Methods 0.000 description 1
- 239000012491 analyte Substances 0.000 description 1
- 230000002587 anti-hemolytic Effects 0.000 description 1
- 108010082685 antiarrhythmic peptide Proteins 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 108091007172 antigens Proteins 0.000 description 1
- 102000038129 antigens Human genes 0.000 description 1
- 229940000489 arsenate Drugs 0.000 description 1
- 238000007845 assembly PCR Methods 0.000 description 1
- 244000052616 bacterial pathogens Species 0.000 description 1
- 238000010364 biochemical engineering Methods 0.000 description 1
- 230000000903 blocking Effects 0.000 description 1
- 239000003633 blood substitute Substances 0.000 description 1
- 229960003773 calcitonin (salmon synthetic) Drugs 0.000 description 1
- 230000024881 catalytic activity Effects 0.000 description 1
- 101710014509 celF Proteins 0.000 description 1
- 239000006143 cell culture media Substances 0.000 description 1
- 230000010261 cell growth Effects 0.000 description 1
- 210000003850 cellular structures Anatomy 0.000 description 1
- 229920002678 cellulose Polymers 0.000 description 1
- 239000001913 cellulose Substances 0.000 description 1
- 238000001311 chemical methods and process Methods 0.000 description 1
- 239000002962 chemical mutagen Substances 0.000 description 1
- 229920001436 collagen Polymers 0.000 description 1
- 229960005188 collagen Drugs 0.000 description 1
- 238000010835 comparative analysis Methods 0.000 description 1
- 239000004074 complement inhibitor Substances 0.000 description 1
- 102000006834 complement receptors Human genes 0.000 description 1
- 108010047295 complement receptors Proteins 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000011109 contamination Methods 0.000 description 1
- 101700041767 ctxA Proteins 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 230000002939 deleterious Effects 0.000 description 1
- 238000006477 desulfuration reaction Methods 0.000 description 1
- 230000003009 desulfurizing Effects 0.000 description 1
- 230000001809 detectable Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000010432 diamond Substances 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 238000010790 dilution Methods 0.000 description 1
- 241001492478 dsDNA viruses, no RNA stage Species 0.000 description 1
- 101700070651 entC1 Proteins 0.000 description 1
- 108020002598 entD Proteins 0.000 description 1
- 238000009585 enzyme analysis Methods 0.000 description 1
- 239000000262 estrogen Substances 0.000 description 1
- 101700009135 etxB Proteins 0.000 description 1
- 238000004299 exfoliation Methods 0.000 description 1
- 239000002095 exotoxin Substances 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 229960004222 factor IX Drugs 0.000 description 1
- 229940012413 factor VII Drugs 0.000 description 1
- 229940012426 factor X Drugs 0.000 description 1
- 101710011878 faeB-hpsB Proteins 0.000 description 1
- 238000000855 fermentation Methods 0.000 description 1
- 230000004151 fermentation Effects 0.000 description 1
- 230000037320 fibronectin Effects 0.000 description 1
- 238000011049 filling Methods 0.000 description 1
- 239000012467 final product Substances 0.000 description 1
- 108091006031 fluorescent proteins Proteins 0.000 description 1
- 102000034387 fluorescent proteins Human genes 0.000 description 1
- RRDQTXGFURAKDI-UHFFFAOYSA-N formaldehyde;naphthalene-2-sulfonic acid Chemical compound O=C.C1=CC=CC2=CC(S(=O)(=O)O)=CC=C21 RRDQTXGFURAKDI-UHFFFAOYSA-N 0.000 description 1
- 238000004508 fractional distillation Methods 0.000 description 1
- 230000005714 functional activity Effects 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 150000004676 glycans Polymers 0.000 description 1
- 239000002622 gonadotropin Substances 0.000 description 1
- 239000005090 green fluorescent protein Substances 0.000 description 1
- 239000001963 growth media Substances 0.000 description 1
- 239000003721 gunpowder Substances 0.000 description 1
- 150000008282 halocarbons Chemical class 0.000 description 1
- 210000003702 immature single positive T cell Anatomy 0.000 description 1
- 230000002519 immonomodulatory Effects 0.000 description 1
- 230000003053 immunization Effects 0.000 description 1
- 238000002649 immunization Methods 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 239000003999 initiator Substances 0.000 description 1
- 239000002054 inoculum Substances 0.000 description 1
- 229960003786 inosine Drugs 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 102000002467 interleukin receptors Human genes 0.000 description 1
- 108010093036 interleukin receptors Proteins 0.000 description 1
- 229960005431 ipriflavone Drugs 0.000 description 1
- 239000004922 lacquer Substances 0.000 description 1
- 235000021242 lactoferrin Nutrition 0.000 description 1
- 150000002605 large molecules Chemical class 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000011068 load Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004949 mass spectrometry Methods 0.000 description 1
- 238000002844 melting Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 229910052751 metal Inorganic materials 0.000 description 1
- 102000016397 methyltransferase family Human genes 0.000 description 1
- 108060004795 methyltransferase family Proteins 0.000 description 1
- 230000000813 microbial Effects 0.000 description 1
- 230000002906 microbiologic Effects 0.000 description 1
- 238000000386 microscopy Methods 0.000 description 1
- 239000003226 mitogen Substances 0.000 description 1
- 230000037230 mobility Effects 0.000 description 1
- 102000035365 modified proteins Human genes 0.000 description 1
- 108091005569 modified proteins Proteins 0.000 description 1
- 238000000329 molecular dynamics simulation Methods 0.000 description 1
- 238000002887 multiple sequence alignment Methods 0.000 description 1
- 238000000491 multivariate analysis Methods 0.000 description 1
- 108010041042 naphthalene dioxygenase Proteins 0.000 description 1
- 230000002352 nonmutagenic Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000002777 nucleoside Substances 0.000 description 1
- 150000003833 nucleoside derivatives Chemical class 0.000 description 1
- 230000000414 obstructive Effects 0.000 description 1
- 230000002246 oncogenic Effects 0.000 description 1
- 231100000590 oncogenic Toxicity 0.000 description 1
- 238000010397 one-hybrid screening Methods 0.000 description 1
- 239000003960 organic solvent Substances 0.000 description 1
- 230000002188 osteogenic Effects 0.000 description 1
- 239000000199 parathyroid hormone Substances 0.000 description 1
- 229960001319 parathyroid hormone Drugs 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 244000052769 pathogens Species 0.000 description 1
- 108010012038 peptide 78 Proteins 0.000 description 1
- 201000005702 pertussis Diseases 0.000 description 1
- 239000012071 phase Substances 0.000 description 1
- 125000003012 phosphonothioyl group Chemical group [H]P(*)(*)=S 0.000 description 1
- PTMHPRAIXMAOOB-UHFFFAOYSA-N phosphoramidic acid Chemical compound NP(O)(O)=O PTMHPRAIXMAOOB-UHFFFAOYSA-N 0.000 description 1
- 238000000596 photon cross correlation spectroscopy Methods 0.000 description 1
- 238000000053 physical method Methods 0.000 description 1
- 238000007747 plating Methods 0.000 description 1
- 230000001402 polyadenylating Effects 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 229920003245 polyoctenamer Polymers 0.000 description 1
- 229920001282 polysaccharide Polymers 0.000 description 1
- 239000005017 polysaccharide Substances 0.000 description 1
- 150000004804 polysaccharides Polymers 0.000 description 1
- 229920002745 polystyrene-block- poly(ethylene /butylene) Polymers 0.000 description 1
- 230000002028 premature Effects 0.000 description 1
- 238000007639 printing Methods 0.000 description 1
- 239000000186 progesterone Substances 0.000 description 1
- 229960003387 progesterone Drugs 0.000 description 1
- 230000001902 propagating Effects 0.000 description 1
- 230000004952 protein activity Effects 0.000 description 1
- 230000036678 protein binding Effects 0.000 description 1
- 238000002818 protein evolution Methods 0.000 description 1
- 238000001814 protein method Methods 0.000 description 1
- 230000001698 pyrogenic Effects 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 230000000637 radiosensitizating Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000001718 repressive Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000717 retained Effects 0.000 description 1
- 201000005404 rubella Diseases 0.000 description 1
- 235000019515 salmon Nutrition 0.000 description 1
- 229940120657 salmon calcitonin Drugs 0.000 description 1
- 108010068072 salmon calcitonin Proteins 0.000 description 1
- 101700068703 sec1 Proteins 0.000 description 1
- 238000009394 selective breeding Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 239000000377 silicon dioxide Substances 0.000 description 1
- 238000003530 single readout Methods 0.000 description 1
- 238000002741 site-directed mutagenesis Methods 0.000 description 1
- 150000003384 small molecules Chemical class 0.000 description 1
- 239000002904 solvent Substances 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- NHXLMOGPVYXJNR-ATOGVRKGSA-N somatostatin Chemical compound C([C@H]1C(=O)N[C@H](C(N[C@@H](CO)C(=O)N[C@@H](CSSC[C@@H](C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC=2C=CC=CC=2)C(=O)N[C@@H](CC=2C=CC=CC=2)C(=O)N[C@@H](CC=2C3=CC=CC=C3NC=2)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](C(=O)N1)[C@@H](C)O)NC(=O)CNC(=O)[C@H](C)N)C(O)=O)=O)[C@H](O)C)C1=CC=CC=C1 NHXLMOGPVYXJNR-ATOGVRKGSA-N 0.000 description 1
- 230000000638 stimulation Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 230000004083 survival Effects 0.000 description 1
- 235000007586 terpenes Nutrition 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 229960000187 tissue plasminogen activator Drugs 0.000 description 1
- 238000006257 total synthesis reaction Methods 0.000 description 1
- 101700080113 toxB Proteins 0.000 description 1
- 108090000464 transcription factors Proteins 0.000 description 1
- 102000003995 transcription factors Human genes 0.000 description 1
- 230000026683 transduction Effects 0.000 description 1
- 238000010361 transduction Methods 0.000 description 1
- 230000001052 transient Effects 0.000 description 1
- 230000010415 tropism Effects 0.000 description 1
- 201000008827 tuberculosis Diseases 0.000 description 1
- 238000010396 two-hybrid screening Methods 0.000 description 1
- 241001430294 unidentified retrovirus Species 0.000 description 1
- 230000035899 viability Effects 0.000 description 1
- 238000011179 visual inspection Methods 0.000 description 1
- 239000011782 vitamin Substances 0.000 description 1
- 235000013343 vitamin Nutrition 0.000 description 1
- 229930003231 vitamins Natural products 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
- 238000002424 x-ray crystallography Methods 0.000 description 1
Abstract
"In silico"nucleic acid recombination methods, related integrated systems utilizing genetic operators and libraries made by in silico shuffling methods are provided.
Description
METHODS FOR ELABORATING CHAINS OF CHARACTERS, POLYUCLEOTIDES AND POLYPEPTIDES THAT HAVE
DESIRED FEATURES
CROSS REFERENCES WITH RELATED APPLICATIONS This application is a continuation in part of "METHODS FOR ELABORATING CHAINS OF CHARACTERS, POLYUCLEOTIDES
AND POLYPEPTIDES THAT HAVE DESIRED CHARACTERISTICS "of Selifonov et al., USSN 09 / 416,375, filed on October 12, 1999, which is a non-provisional" METHODS FOR DEVELOPING CHAINS OF CHARACTERS, POLYNUCLEOTIDES AND POLYPEPTIDES THAT HAVE DESIRED CHARACTERISTICS " Selifonov and Stemmer, USSN 60 / 116,447, presented on January 19, 1999 and which is also a non-provisional "METHODS FOR DEVELOPING CHARACTER CHAINS, POLYUCLEOTIDES
AND POLYPEPTIDES THAT HAVE DESIRED CHARACTERISTICS "by Selifonov and Stemmer, USSN 60 / 118,854, filed on February 5, 1999. This application is also a continuation in part of" RECOMBINATION "OF NUCLEIC ACIDS" ".MEDIATED BY OLTGONUCLEOTIDES" by Cra eri et al. Attorney's Record Number 02-296-3 US, filed with the present, which is a continuation in part of "RECOMBINATION OF PCQ8 MEDICATED NUCLEIC ACIDS. OLGONUCLEOTIDES" by Crameri et al., -
USSN 09 / 408,392, filed September 28, 1999, which is a non-provisional "RECOMBINATION OF NUCLEIC ACIDS MEDIATED BY OLIGONUCLEOTIDS" by Crameri et al., USSN 60 / 118,813, filed February 5, 1999 and which it is also a non-provisional "RECOMBINATION OF NUCLEIC ACIDS MEDIATED BY OLIGONUCLEOTIDES" by Crameri et al., USSN 09 / 141,049, filed on June 24, 1999. This application is also a continuation in part of the co-filed application "METHODS OF POPULATION OF DATA STRUCTURES TO BE USED IN EVOLUTIONARY SIMULATIONS "by Selifonov and Stemmer Attorney's file number 3271.002 O0 (presented by Majestic, Parsons, Siebert and Hsue) which is a continuation in part of" METHODS OF POBEACT OF DATA STRUCTURES FOR "USED IN ~ EVOLUTIONARY SIMULATIONS" by Selifonov and Stemmer, USSN 09 / 416,837, filed on October 12, 1999. This application also refers to the "USE OF L" OLIGONUCLEOTIDE SYNTHESIS OF VARIOUS CODON FOR SYNTHETIC RESTRUCTURING "by elch et al., USSN 09/41) 8,393, filed on September 28, 1999. The present application claims the priority and benefit of each of the applications listed in this section, according to is available under 35 USC §119 (e) and / or 35 U.S.C. §120, as appropriate. All of the foregoing applications are incorporated herein by reference. NOTIFICATION OF COPYRIGHT According to 37 C.F.R. 1.71 (e) the Applicant warns that a portion of this exhibit contains material that is subject to the copyright protection of the author. The copyright owner has no objection to the facsimile reproduction of any of these documents or the description of the patent, as it appears in the patent file or registers of the Patent and Trademark Office, but otherwise reserves all copyrights of any kind. FIELD OF THE INVENTION This invention is found in the field of genetic algorithms and the application of genetic algorithms to nucleic acid restructuring methods. BACKGROUND OF THE INVENTION Recursive recombination of nucleic acid
("restructuring") provides the rapid evolution of nucleic acids, in vi tro and in vivo. This rapid evolution provides for the generation of encoded molecules (eg, nucleic acids and proteins) with new and / or improved properties. Proteins and nucleic acids of industrial, agricultural and therapeutic importance can be created or improved through restructuring procedures of -
DNA Many of the publications of the inventors and their collaborators describe the restructuring of DNA. For example, Stemmer et al. (1994) "Rapid Evolution of a Protein" ("Rapid Evolution of a Protein") Nature 370: 389-391; Stemmer (1994) "DNA Shuffling by Random Fragmentation and Reassembly: in vitro_ Recombination for Molecular Evolution" ("Restructuring of DNA by Fragmentation and Random Reordering: In vitro Recombination for Molecular Evolution") Proc-. Nati Acad. USA 91: 10747-10751; Stemmer, Patent of E.U. No. 5,603,793"METHODS FOR IN VITRO RECOMBINATION" ("Methods for In Vitro Recombination") Stemmer et al., U.S. Patent. No. 5,830,721"DNA MUTAGENESISS ~ BY RANDOM FRAGMENTATION AND REASSEMBLY" ("Mutagenesis of DNA By Fragmentation and Random Reorganization") and Stemmer et al., Patent of E.U. No. 5,811,238"METHODS TOR" GENERA ING POLYNUCLEOTIDES HAVING DESIRED CHARACTERISTICS BYL ITERATIVE SELECTION AND RECOMBINATION "(" Methods for Generating Polynucleotides Having Desired Characteristics by Iterative Selection and Recombination ") describe, for example, a variety of restructuring techniques. Many applications of DNA restructuring technology by the inventors and their collaborators In addition to the aforementioned publications, Minshull et al, U.S. Patent No. 5,837,458"METHODS AND C0MP0SITI0NS FOR CELLULAR 7AND ~ METABOLIC ENGINEERING" ("Methods and Compositions For Cellular and Metabolic Engineering ") provides for the evolution of new metabolic trajectories and the improvement of bio-processing through recursive restructuring techniques Crameri et al. (1996)," Construction And Evolution Of Antibody- Phage Libraries By DNA _ Shuffling "(" Construction and Evolution of Antic Files phage-body by the Restructuring of DNA ") Nature Medicine 2 (1): 100-103 describes, for example, the restructuring of antibodies for antibody phage files. Additional details regarding the restructuring of DNA can also be found in several published applications, such as W095 / 22625, WO97 / 20078, WO96"/ 33207, 097/339557, O98 / 27230, 097/35966, 098/31837, W098 / 13487 , W098 / 13485 and W0989 / 42832. Many of the publications of the inventors and their collaborators, as well as those of other researchers in the field, also describe techniques that facilitate the restructuring of DNA, for example by providing for the rearrangement of genes to from small fragments of genes or even oligonucleotides encoding the gene fragments In addition to the publications noted above, Stemmer et al. (1998) US Patent No. 5,834,252"END COMPLEMENTARY POLYMERASE REACTION" - ("REACTION OF THE COMPLEMENTARY EXTREME POLYMERASE ") describes processes for amplifying and detecting an objective sequence (for example in a mixture of nucleic acids), as well as for joining large polynucleotides from fragments. View of previous publications reveals that DNA restructuring is an important new technique with many practical applications. Thus, the new techniques that facilitate the restructuring of DNA are highly desirable, In particular, techniques that reduce the number of physical manipulations necessary for restructuring procedures would be particularly useful The present invention provides significant protocols of DNA restructuring, as well as as other characteristics that will be apparent after the complete revision of this disclosure SUMMARY OF THE INVENTION The present invention provides technical rounds of "in silico" DNA restructuring in which part or all of the DNA restructuring procedures are performed or modeled in a computer system, which avoids (partially or totally) the need for physical manipulation of nucleic acids.These approaches are collectively referred to as Genetic Algorithm Guided ^ Ge e Synthesis o_ "GAGGS" (Synthesis of Genes Driven by Genetic Algorithms). In a first aspect, the invention provides methods for obtaining a "chimeric" or "recombinant" polynucleotide or polypeptide (or other bio-polymer) having a desired characteristic. In the methods, at least two strings of parent characters encoding the sequence information for one or more polypeptides and / or for one or more single stranded or double stranded polynucleotides are provided. All or part of the sequences (ie one or more sub-sequence regions) contain identity areas and areas of heterology. A set of character strings of pre-selected or pre-defined length is provided, which encodes single stranded oligonucleotide sequences which include sequence fragment overlays of at least a portion of each of the parent and parent strings. / or at least a part of polynucleotide strands complementary to the parent character strings. In a class of embodiments, the invention provides methods for generating biological polymer files. The method includes generating a diverse population of character strings in a computer where character strings are generated by the modification (recombination, mutagenesis, etc.) of the pre-existing character strings. The population - several chains of characters are then synthesized to understand the biological polymers file (nucleic acids, polypeptides, peptide nucleic acids, etc.). Typically, the members of the biological polymers files are selected for one or more activities. In a recursive aspect of the invention, an additional file or an additional set of character strings is filtered by subtracting the additional file or the additional set of character strings with members of the biological polymer file that display activity below a desired threshold. . In an additional recursive aspect! or complementary to the invention the additional file or additional set of character strings is filtered by deriving the additional file or additional set of strings, with members of the biological X-polymers file displaying activity above a desired threshold. A set of single braid oligonucleotides made according to the set of sequences ~ defined in Xas character strings of characters is provided. Part or all of the single stranded nucleotides produced are grouped under denaturing or annealing conditions where at least two of the single stranded oligonucleotides represent part of two different genitographic sequences. The resulting population of the single stranded oligonucleotides is incubated with a polymerase under conditions that result in the annealing of the single stranded fragments in the identity areas to form reassociated pairs of fragments. These identity areas are sufficient for one member of the pair to prepare the replication of the other, resulting in an increase in the length of the oligonucleotides. The resulting mixture of single stranded and double-stranded oligonucleotides is denatured into single stranded fragments. These steps are repeated, such that at least a part of the resulting mixture of the chimeric and mutagenized single stranded polynucleotides is used in the subsequent cycle stages. The recombinant polynucleotides that have been developed to a desired property are selected or selected. In another aspect, the invention provides "the use of genetic operators, for example in a computer." In these methods, the sequence strands corresponding to the oligonucleotides noted above are selected by the computer from corresponding sequence strands. - to one or more of the following sets of single stranded oligonucleotides: a) the oligonucleotides synthesized to randomly or non-randomly contain selected mutations of the genitoral sequences according to the modified sequences that include the replacement of one or more characters with other characters or the deletion or insertion of one or more characters; b) synthesized oligonucleotide sequences containing degenerate, mixed or unnatural nucleotides in one or more pre-selected positions randomly or non-randomly; and c) the chimeric oligonucleotides synthesized according to the artificial sequences of the character sub-strings designed to contain partial binding sequences of at least two genitoral sequences. In certain embodiments, the oligonucleotides of set (c) contain one or more mutated or degenerate positions defined in the sets Ca) and (b). The oligonucleotides of set (c) are optionally chimeric nucleotides with crossing points selected according to a method that allows the. identification of a plurality of sub-strings of characters that display identity in pairs (homology) between any or all of the pairs of strings that comprise sequences of different strings of genitored characters. The crossing points for making the chimeric oligonucleotide sequences are optionally selected randomly or approximately half of each or a portion of the identity areas of the identified pair
(homology) or by any other set of selected criteria. In one aspect, at least one crossing point for at least one sequence of chimeric oligonucleotides is selected from those that are not found within the identity areas detected. In one aspect, the single stranded oligonucleotide mixtures described above are grouped at least once with an additional set of polynucleotides comprising one or more double braided or single stranded polynucleotides encoded on one side and / or on the whole string of characters from any of the provided genitoral sequences and / or by the other chain (s) of characters containing (n) identity areas and areas of heterology with any of the strings of genitored characters provided. The polynucleotides of the additional set of polynucleotides can be obtained by the oligonucleotide synthesis of the oligonucleotides corresponding to any strand of parent characters (or homology thereof) or by random fragmentation (for example by enzymatic cleavage, for example by a DNase or by chemical cleavage of the polynucleotide) and / or by a restriction enzyme fragmentation of the polynucleotide encoded by previously defined character strings and / or by another chain (s) of characters containing areas of identity and areas of heterology with either of the _ genital character strings provided. That is, any nucleic acid generated by GAGGS can be further modified by any available method - to produce additionally diversified nucleic acids. In addition, any diversified nucleic acid can serve as a substrate for additional rounds of GAGGS. The above methods are suitably adapted to a wide range of synthetic oligo lengths (e.g., 10-20 nucleotides or more, 20-40 nuc? Eotides or more, 40-60 nucleotides or more, 60-100 nucleotides or more). more, 100-150 nucleotides or more etc.), a wide variety of types of genitator sequences (for example for therapeutic proteins such as EPO, insulin, growth hormones, antibodies or the like; agricultural proteins such as plant hormones, factors Resistant to the disease, herbicide-resistant factors (eg p450s, industrial proteins (eg those included in the desulfurization of bacterial oil, synthesis of polymers, proteins and detoxification complexes, fermentation or the like)) and for a wide range of variety in the number of selection / examination cycles (for example one or more cycles, two or more cycles, 3-4 or more cycles, 10 or more cycles, 10-50 or more cycles, 50-100 or more cycles or more s of 100 cycles.) Rounds of evolution of GAGGS can be modified with rounds of physical nucleic acid restructuring and / or selection trials in various formats (in vivo or in vi tro). Selected nucleic acids "(ie those with desirable properties) can be deconvolved by sequencing or other procedures such as restriction enzyme analysis, real-time PCR analysis or the like, so that processes can be initiated through the use of information of sequence to conduct the synthesis of genes, for example, without any physical manipulation of the DNA obtained by previous rounds of GAGGS Typically in the above methods, the synthesis of polynucleotides from single stranded oligonucleotides is carried out by PCR binding Other options for making nucleic acids include binding reactions, cloning, and the like In typical embodiments, sets of character strings, which encode single stranded oligonucleotides comprising fragments of genomic chains, including fragments chimeric and mutated / degenerate in length p redefined, are generated using a device that comprises a processor element, such as a computer with software for manipulating the string of sequences. In one aspect, the invention provides GAGGS of unique genitors. These methods are set forth in greater detail in the examples given herein. BRIEF DESCRIPTION OF THE FIGURES Figure 1 is a flow chart describing a portion of the evolution directed by GAGGS. Figure 2 is a flow chart describing a portion of the evolution directed by GAGGS. The flow chart of Figure 2 is optionally contiguous to that of Figure 1. Figure 3 is a flow chart describing a portion of the evolution directed by GAGGS. The flow diagram of Figure 3 is optionally contiguous to that of Figure 2.
Figure 4 is a flow chart describing a portion of the evolution directed by GAGGS. The flow chart of Figure 4 is optionally "contiguous to that of Figure 3. Figure 5 is a diagram and kinship tree showing the percentage of similarity for different subtleties (an exemplary restructuring objective). graphical alignment of points in pairs that shows the areas of homology by different subtleties Figure 7 is a graphical alignment of points in pairs that shows the areas of homology for 7 different gendered subtleties Figure 8, AC Panels, are histograms in pairs that show the conditions that determine the probaty that the crossing point selection can be controlled independently for any region through a selected gene length, as well as independently by the parent pairs. diagram showing the introduction of the marker of crossing points "indexed in the sequence of each genitor. Figure 10 shows a method for joining sligonucleotides to make nucleic acids.
Figure 11 is a continuation of Figure 13 which shows an oligonucleotide binding scheme. Figure 12 is a difference chart and a kinship tree for the restructuring of Naphthalene deoxygenase. ~ Figure 13 is a diagram of a digital system of the invention. Figure 14 is a schematic showing a geometric relationship between nucleotides. Figure 15 is a schematic of an HMM matrix. DETAILED DESCRIPTION In the methods of the invention, the algorithms
"Genetic" or "evolutionary" are used to produce chains of sequences that can be converted into physical molecules, restructured and tested for a desired property. This greatly activates forced evolution procedures, since the capacity for pre-selected substrates by restructuring reduces the actual physical manipulation of nucleic acids in restructuring protocols. In addition, the use of character strings as "virtual substrates" for restructuring protocols, when coupled with gene reconstruction methods eliminate the need to obtain physical genitous molecules that encode genes.
-
Genetic algorithms (GAs) are used in a wide variety of fields to solve problems that are not fully characterized or that are too complex to allow full characterization, but for which some analytical evaluations are available. That is, GAs are used to solve problems that can be evaluated by some quantifiable measures for the relative value of a solution (or at least the relative value of one potential solution compared to another). The basic concept of a genetic algorithm is to code as a series of parameters, a potential solution to a problem. A unique set of parameter values is treated as' the 'genome' or genetic material of an individual solution. A large population of candidate solutions is created. These solutions can be produced among themselves for one or more of the generations simulated under the survival principle of the fittest, which means the probaty that an individual solution will pass over some of these parameter values so that the set of subsequent solutions is directly related to the suitaty of the individual (that is, how good is that solution in relation to the others in the population for the selected parameter). Reproduction occurs through the use of operators such as cuzadors that simulate basic biological recombination and mutation. The simple application of these operators with reasonable-selection mechanisms has initially produced good results across a wide range of problems. An introduction to genetic algorithms can be found in David E. Goldberg (1989) Genetic Algorithms in Search Optimization and Machine Learning ^ Genetic Algorithms in Research, Optimization and Machine Training) Addison- Wesley Pub Co; ISBN: 0201157675 and en_ Timothy Masters (1993) Neural Practical Network Recipes in C ++ (Practical Models of Neural Network in C ++) (Editing of Book &Disk) Academic Pr; ISBN: 0124790402. A variety of more recent references deal with the use of genetic algorithms used to solve a variety of difficult problems, see for example http: // garage.that.msu.edu/papers/ papers-index.html and the references cited herein, http: // gaslab.cs.unr.edu/ and the references cited herein, http://www.aic.nrl.navy.mil/ and the references cited in the. present, http: // www cs gmu edu / research / gag / _ and the references cited therein and htt: // www. cs gmu.edu / research / gag / pubs. html and cited references In the present invention, a genetic algorithm (GA) is used to provide a representation character based on chains of the process of generation of bio-polymer diversity (computational evolution of the strings of characters through the application of one or more genetic operators for a planned population (for example a genitor file) of character strings is, for example, gene sequences). The representation of a population of strings of characters generated by GA (or "derivative file") is used co or a set of sequence instructions in a suitable form to control the synthesis of polynucleotides (for example through prone synthesis without error , prono-_error synthesis, parallel synthesis, pooled synthesis, chemical synthesis, chemo-enzymatic synthesis (including PCR binding of synthetic oligonucleotides) and the like). The synthesis of polynucleotides is conducted with sequences encoded by a string of characters in the derivative file. This creates a physical representation (a polynucleotide file) of the diversity of "genes" generated by computation (or any other string of_characters). The physical selection of the polynucleotides having desired characteristics is also optionally conducted (and typically). Such selection is based on the results of physical assays of the properties of the polynucleotides or polypeptides, whether translated in vitro or expressed in vivo. The sequences of those polynucleotides found to have the desired characteristics are unconverted
(for example, sequenced or when positional information is available, upon noticing the position of the polynucleotide). This is accomplished by sequencing the DNA, reading a position in an array, real-time PCR (eg TaqMan) restriction enzyme digestion or any other method noted herein or currently available. These steps are optionally repeated, for example, 1-4 or more cycles on each occasion optionally using the deconvolved sequence as a source of information to generate a new modified set of character strings to initiate the procedure. Of course, any nucleic acid that is generated in silico can be synthesized and restructured by any known DNA restructuring method, including those taught in the references by the inventors and their partners cited herein. Such synthesized DNA can also be mutagenized or modified in another way according to the existing techniques.
In summary GAGGS is an evolutionary process that includes a stage of manipulation of information_ (application of a genetic algorithm to a chain of characters representing a biopolymer such as a nucleic acid or a protein), to create a set of information elements defined (for example strings of characters) that serve as models to synthesize physical nucleic acids. The information elements can be placed in a database or otherwise manipulated in silico, for example, by recursively applying a GA to the sequences that are being produced. The corresponding physical nucleic acids can be subjected to recombination / selection or a variety of other generating methods, the nucleic acids being deconvoluted (eg, sequenced or otherwise analyzed) and the total process repeated, as appropriate, to achieve a nucleic acid. wanted. Exemplary Advantages of GAGGS There are a variety of advantages for GAGGS as compared to the prior art. For example, GAGGS does not require physical access to genes / organisms, since sequence information is used for the design and selection of oligos. A variety of public databases provide the information of extensive sequences, including, for example, Genbank ™ and those noted above. "Additional sequence databases are available in contact databases from a variety of companies that specialize in the generation and storage of genomic information." Similarly, sequences of non-culturable, inaccessible organisms can be used by GAGGS. For example, sequences of pathogenic organisms can be used without the actual management of pathogens All types of sequences suitable for the physical restructuring of DNA, including damaged and incomplete genes (eg pseudogenes), are susceptible to GAGGS. , which include the different types of mutagenesis and crosses can be controlled totally and independently in a reproducible way, removing human error and the variability of physical experiments with DNA manipulations.GAGGS has applicability for the capacity of self-learning of intelligence artificial (parameter profiles d e output of the optimization algorithm based on the "input of performance feedback, successful rates and failures of physical examinations, etc.). In GAGGS procedures, sequences with structure shift mutations (which are generally undesirable) are eliminated or fixed (discarded from the character set or repaired, in silico). Similarly, entries with premature terminations are discarded or repaired and entries with loss of sequence characteristics known to be important for the deployment of a desired property (e.g., conservative ligands for metal bonds) are discarded or repaired. In addition, wild-type genitors do not contaminate the derivative files with multiple redundant genitored molecules, as in a preferred embodiment, only a priori modified genes undergo physical restructuring and / or selection (which in some cases can be expensive or low-cost). performance or otherwise less than ideal, depending on the available assay). In addition, because actual physical recombination is not required, the protein sequences can be restructured in the same in silico form as the nucleic acid sequences, and the back-translation of the resulting restructured sequences can be used to alleviate the problems of codon usage. and to minimize the number of oligos needed to construct one or more nucleic acid coding files. In this regard, protein sequences can be restructured in silico using genetic operators based on the recognition of structural domains and folding motifs, instead of being linked by the homology criterion based on the reassociation of the DNA sequences. or of simple homology of the AA sequences. Additionally, the deviations in -base to the rational structure are easily incorporated in the construction of the files, when such information is available. The only significant operational costs of the operation of GAGGS is the cost of synthesizing large files of genes represented in silico. Synthetic binding of genes can be done, for example, by PCR binding from 40-60 bp of oligos, which can be synthesized economically by current techniques. EVOLUTION DIRECTED BY GAGGS: __ All changes in any DNA sequence during any evolutionary process can be described by a finite number of events, each resulting from the action of an elementary genetic operator. In any given gepitora sequence sub-space these changes can be accounted for and simulated exactly in a physical representation of an evolutionary process aimed at generating the sequence diversity for subsequent physical selection for the desired characteristics. Physical double stranded polynucleotides are not required to initiate GAGGS processes; instead they are generated following the initial GAGGS processes for the purpose of examination and / or physical selection and / or as a result of this examination or selection. The generation of files too large for examination / selection is not required. Genetic Algorithms (GA) _ CHARACTER CHAINS: in general, character strings can be any representation of a character ordering (for example, a linear ordering of characters provides "words" while a non-linear ordering can be used as a code to generate a linear ordering of characters). For practicality of GAGGS, the character strings are preferably those that encode polynucleotide or polypeptide chains, directly or indirectly, including any encrypted string or images or facilities of materials that can be unambiguously transformed into strings of characters representing sequences. monomers or multimers in polynucleotides, polypeptides or similar (either made of natural or artificial monomers). GENETIC ALGORITHM: Genetic algorithms are generally processes that mimic evolutionary processes. Genetic algorithms (GAs) are used in a wide variety of fields to solve problems that are not fully characterized or too complex to allow full characterization, but for which some analytical evaluation is available. That is, GAs are used to solve problems that can be evaluated by some quantifiable measures for the relative value of a solution (or at least the relative value of one potential solution in comparison to another). In the context of the present invention the genetic algorithm is a process for selecting or manipulating strings of characters in a computer, typically when the strings of characters may correspond to one or more biological polymers (eg, a nucleic acid, protein, PNA or "). similar.) A biological polymer is a polymer that shares some structural characteristics with naturally occurring polymers such as RNAs, DNAs, and polypeptides, including for example RNAs, analog RNAs, DNAs, analog DNAs, polypeptides, or analogs of polypeptides, nucleic acids of peptides etc. DIRECTED EVOLUTION OF THE CHAINS "" OF ~ CHARACTERS OR
OF SUBJECTS: A process of artificially changing a chain of characters by artificial selection, that is to say that it occurs in a reproductive population in which there are (1) varieties of individuals, with some variants that are (2) inheritable of which some varieties ( 3) different in suitability (reproductive successes determined by the selection results for a predetermined property (desired characteristic)). The reproductive population can be, for example, a physical population or a virtual population in a computer system. GENETIC OPERATORS (GOs): operations defined by the user or sets of operations that each comprise a set of logical instructions for manipulations of strings. Genetic operators are applied to make changes in the populations of individuals - in order to find interesting (useful) regions of the search spaces (populations of individuals with predetermined desired properties) by predetermined means of selection. The predetermined (or partially predetermined) means of selection include computational tools (operators that comprise logical stages conducted by information analysis that 'describe the files of the chains of characters) and physical tools for the analysis of physical properties of physical materials, which they can be constructed (synthesized) from matter for the pue of physically creating a representation of the information "describing the files of the character strings." In a preferred embodiment, some or all of the logical operations are carried out on a computer. Genetic Operators All changes in any population of any type of character strings (and thus in any physical property of physical matters encoded by such strings) can be described as the result of the random and / or predetermined application of a finite set of logical algebraic functions what I bought There are several types of genetic operators. - In its mathematical nature this statement is not an postulated abstract axiom. In fact this statement is a derivative theorem with rigorous formal proof that is easily derived from Wiles's proof of Fermat's last theorem. The fundamental implication of the Wiles Test for evolutionary molecular biology lies in the central conjecture test that states that all elliptic curves are found in the essential modular forms. Particularly, all the diversity and evolution of the living matter in the universe (that is, the plurality of matters whose properties can be described by a finite number of elliptic curves) can be described in the language of the five basic arithmetic operations: addition, subtraction, multiplication, division and modular forms (that is, the evolution of life can be effectively described by a finite combination of simple information changes in a finite population of strings of characters, for example, all DNA in the universe). This being the case, it is possible to determine the language of life forms based on nucleic acids and to define all the basic types of genetic operators that are applied to nucleic acids under evolutionary selection. Mathematical modeling of certain genetic operations has been proposed, for example in Sun (1999) "Modeling DNA Shuffling" (Modeling of DNA restructuring) Journal of Computational Biology 6 (l): 77-90; Kelly et al (1994) "A test of tJae Markovian Model of DNA evolution" (A Test of the Markovian Model of DNA Evolution) Biometrics 50 (3): 653-64j_ Boehnke et al (1991) "Statistical Methods for Multipoint Radiation Hybrid Mapping "(Statistical Methods for the Hybrid Representation of Radiation in Multiple Points) Am.J. Hum. Genet 49: 1174-1188; Irvine et al (1991) "SELEXION: Systematic evolution of Ligands by exponential enrichment with integrated optimization by non-linear analysis" (SELEXION: systematic evolution of ligands by exponential enrichment with integrated optimization by nonlinear analysis) J. Mol - Biol. 222: 739-761; Lander and Waterman (1988) "Genomics mapping by Fingerprinting Rando Ciónos: a mathematical analisis" (Genomic representation through random processes, digital printing: a mathematical analysis) Genomics 2 ^ 231-239; Lange (1997) Mathematical and Statistical Methods for Genetic Analysis (Mathematical and Statistical Methods for Genetic Analysis) Springer Verlag, NY; Sun and Waterman (1996) "A mathematical Analysis of in vi tro Molecular Selectio -Amplification" (A mathematical analysis of molecular selection-amplification in vi tro) J. Mol. Biol. 258: 650-660? Waterman (1995) "Introduction to Computational Biology" (Introduction to Computational Biology) Chapman and Hall, London, United Kingdom. Next, a description is given of certain basic genetic operations applicable to the present invention. "MULTIPLICATION" (including duplication and replication) is a form of character string reproduction, which produces additional copies of character strings that comprise genital population / string files. Multiplication operators can have.
many variations. They can be applied to individual strings or groups of identical or non-identical strings. The selection of groups of strings for multiplication can be random or derived. MUTATION: all types of mutation in each of the members of a set of chains can be described by several simple operations that can be reduced to elements that comprise the replacement of a set of characters with another set of characters. One or more characters can be mutated in a single operation. When more than one character is mutated, the character set may or may not continue over an entire chain length (a useful feature to closely simulate mutations regrouped by certain chemical mutagens). A single-point Mutation operator replaces a single character with another unique character. The nature of the new characters may vary and may be from the same set of characters that make up the different or different genomic chains (for example, to represent degenerate nucleobases, nucleobases, or unnatural amino acids, etc.) A deletion mutation is an operator more complex that removes one or more characters from the chains The individual deletions of a single point in the chains encoding the nucleic acids may not be desirable for the manipulation of chains representing the polynucleotide sequences, however, the 3x clustered deletions (continuous or dispersed) ("triple deletion structure shifts") Single-point overrides, it is believed, "are useful and acceptable for the evolutionary computation of the chains encoding the polypeptides. Insertion mutations are optionally similar to deletion mutations except that one or more of the new characters are inserted. The nature of the aggregated characters optionally varies and can be of the same set of characters that make up the different or different genomic chains, (for example, represent degenerate nucleobases, nucleobases or non-natural amino acids, etc.). Death can be defined simply as a variation of the suppression operator. This occurs when the result of the application of a genetic operator (or combinations thereof) produces a deletion of a complete individual character string or (sub) -filling of complete character strings. Death can also be defined as a variant of a prono elitism multiplication operator (multiplication by zero of the values that define the abundance level of one or more chains). Death can also be defined as an action of non-selection by default in the operators, who make the selections of the chain sub-populations and the transfer manipulations with several operations of classification and indexing of indexed string files (all chains not transferred can be considered as dead or non-existent for subsequent calculations). FRAGMENTATION OF CHAINS are an important class of non-elementary (complex) optional operators that may have advantages for the simultaneous evolution of chains in various formats of DNA restructuring. Operationally, fragmentation can be described as a formal variation of a combination of a suppression operator and a multiplication operator. However, an expert will appreciate that there are many other simple algorithmic operations that allow any given character string to fragment into a short-chain progeny. Fragmentation operations can be 'random or deviated. Different ranges of fragment sizes can be predetermined. Fragments of chains can be left in the same population with the genitoras chains or they can be transferred to a different population. The fragments of chains of several population chains can be grouped to form new populations. - - CROSSING (RECOMBINATION) This operator comprises - formally the union of a continuous part of a chain with a continuous part of another chain in such a way that one or two hybrid chains (chimeras) are formed where each of the chimeras contains the minus two areas of continuous chains connected, each one comprises the partial sequence of two different recombinant chains. The áfea / puñto where the sequence characters of the different genitoras chains is called area / crossing point / recombination. Crossing operations can be combined with mutation operations that affect one or more characters of the recombined chains in the vicinity of the area / junction point of the junction. When applied recursively to a population of character strings, complex chimeras comprising consecutively connected partial sequences of more than two genitoral chains can be formed.LIGATION is a variant of an insertion mutation operator where essentially the complete content of one string is combined with the complete content of another string so that the last character of the first string is followed by the first character of the other string The ligation operation can be combined with the mutation operation that affects one or more characters of the linked chains in the vicinity of the point of attachment.Ligation can also be visualized as a means of chimera formation. ~ ELITISM is a concept that provides a useful form of derivation that imposes the criterion of discrimination for the use of any of the genetic operators and several types of deviations can be designated and implemented positive and negative actions. The rationale for the design of elitist operators is based on the concept of suitability. The suitability can be determined using chain analysis tools that recognize several specific characteristics of the sequence (GC content, structure displacement, terminations, sequence length, specific substrings, homology properties, ligand bonds and folding motifs "etc. ) and / or indexed correlated parameters acquired from the physical selection of the physical representations of the chains of characters (stability enzymatic activity, binding of ligands, etc.) It is understood that the different criterion of elitism can be applied separately to any of the previously described genetic operators or combinations of operators It is also possible to use elitism in the same process of evolutionary computation, with several operators of the same type, when the input / output parameters of each of the similar operators can be controlled independently (or interdependent). Different criteria of elitism to control the changes in the populations of chain characters caused by the action of each of the individual -operators. SEQUENCE HOMOLOGY OR SEQUENCE SIMILARITY is an especially important form of sequence-specific elitism useful for controlling changes in the populations of character strings caused by the crossing / recombination operators in those genetic algorithms used to develop character strings that encode the polynucleotide and polypeptide sequences. Different approaches, methods and algorithms known in the art can be used to detect the homology or similarity between the different strings of characters. The optimal alignment of the sequences can be conducted by comparison, for example, by the local homology algorithm of Smith and Waterman, Adv. Appl. Ma th. 2: 482 (1981), by the homology alignment algorithm of Needleman and Wunsch, J. Mol. Biol. . 48: 443
(1970), by searching for the similarity method of
Pearson and Lipman, Proc. Nat'l Acad Sci. USA 85: 2444 (1988), through computerized implementations of these algorithms
(GAP, BESTFIT, FASTA and TFASTA in the Wisconsin Genetics Software package, from Genetics Computer Group, 575 Science Dr.7 Madison, Wl) or even by visual inspection (see in general, Ausubel et al. ., infra) An algorithm example that is suitable for determining the percent identity of the sequence and the. Similarity of the sequence is - the BLAST algorithm, which is described in Altschul et al. , J. Mol. Biol. . 215: 403-410 (1990). The software to perform the BLAST analyzes is available publicly through the National
Center for Biotechnology Information (National Center for Biotechnology Information)
(http://www.ncbi.nlm.nih.gov/). This algorithm includes the first identification of highly registered sequence pairs (HSPs) by identifying short words of length W in the query sequence, which matches or satisfies some positive value threshold T annotation, when aligned with orna word of the same length in a database sequence. T is referred to as the neighborhood word entry threshold (Al tschul et al., Supra) These initial neighborhood word hits act as seeds to initiate searches to find the older HPSs that contain them. The word hits then extend in both directions along each sequence as far as the cumulative alignment annotation can be increased. Cumulative annotations are calculated using, for the nucleotide sequences, the M parameters (reward notation for a pair of equalization residuals, provided that> 0) and N (annotation needed to unequal the residuals, provided that < 0) For the amino acid sequences, an annotation matrix is used to calculate the cumulative annotation.The extent of the punches in each direction are interrupted when: the cumulative alignment annotations fall by the amount X from its maximum value achieved, the cumulative annotation goes up to zero or below, due to the accumulation of one or more negative annotation residue alignments, or the end of any sequence is reached.The algorithm parameters BLAST_W, T and X determine the sensitivity and speed of alignment The BLASTN program (for nucleotide sequences) uses a word length (W) of 11 as an omission, an expectation (E) of 10 , a cut of 100, M = 5, N = 4 and a comparison of both strands. For amino acid sequences, the BLASTP program uses a word length (W) of 3 as an omission, an expectation (E) of 10 and the annotation matrix BLOSUM62 (see Henikoff &Henikoff (1989) P a, Nati. Acad. Scí USA 89: 10915) In addition to calculating the percentage of sequence identity, the BLAST algorithm also performs a statistical analysis of the similarity between the two sequences (see, for example, Karlin &Altschul (1993) Proc. Nat'l. Acad. Sci USA 90: 5873-5787). A measure of similarity provided by the BLAST algorithm is the probability of the smallest sum (P (N)), which provides an indication of the probability by which an equalization between two nucleotide or amino acid sequences may occur by chance. For example, a nucleic acid is considered similar to a reference sequence (and, therefore, homologous) if the probability of the smallest sum in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.1. or less than about 0.01 or even less than about 0.001. A further example of the useful sequence alignment algorithm is PILEUP. PILEUP creates a multiple sequence alignment from a group of "related sequences that use alignments in progressive pairs." This can also plot a tree that shows the regrouping relationships used to create the alignment PILEUP uses a simplification of the progressive alignment method of Feng &; Doolittle, J. "Mol. Evol., 35: 351-360 (1987) .The method used is similar to the method described by Higgins &_ Sharp, CABI0S5: 151-153 (1989) .The program can align for example up to 300 sequences of a maximum length of 5,000 letters The multiple alignment procedure begins with the alignment in pairs of the two most similar sequences, producing a regrouping of two aligned sequences.This grouping can then be aligned to the next more related sequence or regrouping of aligned sequences Two regrouping of sequences can be aligned by a simple extension of the alignments in pairs of two individual sequences.The final alignment is achieved through a series of alignments in progressive pairs.The program can also be used to plot a dendrogram or representation of tree of the regrouped relationships The program is run by designating the sequences, specific and their coordinates of a minoacids or nucleotides by regions of sequence comparison. In this way, the different types of similarities with various levels of identity and length can be detected and recognized. For example, many homology determination methods have been designated for comparative analysis of biopolymer sequences, for spell checking in word processing, and for retrieval of data from various databases. With the understanding of the complementary interactions in double helix pairs between 4 major nucleobases in the natural nucleotides, the nucleotides that simulate the annealing of complementary homologous polynucleotide hips can also be used as a foundation of sequence-specific elitism useful for controlling operators of crossing. The elitism based on the homology of the crossing operators can thus be used (a) to find the pairs of suitable recombination chains in a population of chains and / or (b) to find / predetermine in particular the appropriate / desired areas / points of recombination through the lengths ^ of character strings selected by recombination. The establishment of predetermined types and the rigor of similarity / homology as a condition for crossing occurs is a form of elitism for the control of the formation of chimeras among the chains of representative genetic characters of varying degrees of homology. RECURSIVE USE OF GENETIC OPERATORS FOR THE EVOLUTION OF CHARACTER CHAINS. All the described genetic operators can be applied in a recursive mode and the specific parameters for each application occurrence can remain the same or can be varied systematically or randomly RANDOMIZATION IN THE APPLICATION OF GENETIC OPERATORS FOR THE EVOLUTION OF CHARACTER CHAINS . Each genetic operator can be applied to randomly selected strings and / or randomly selected positions through one or more string lengths with randomly selected occurrence frequencies within a range. UNION OF "GOS IN GAS." Applications that "determine the order - of individual GOs to produce string derivative files can be. different and may depend on the composition of a particular set of individual GOs selected to practice various GAGGS formats. The order can be linear, cyclic, parallel or a combination of the three and can typically be represented by a graph. Many GO junctions can be used to simulate natural sex and mutagenic processes to generate genetic diversity or artificial protocols, such as the restructuring of a single gene or familial DNA. However the purpose of GA is not limited to the simulation of some known physical DNA manipulation methods. Its main objective is in the provision of a formal and intelligent tool based on the understanding of natural and artificial evolution processes for the creation and optimization of evolutionary protocols of practical utility that can provide effective advantages through currently practiced methods. Synthesis of the Gene.- The physical synthesis of genes encoded by derivative files of strings obtained by the operation of genetic algorithms is the main means to create a physical representation of the matter that is susceptible of a physical test for a desired property or for "producing substrates that are further developed in methods of generating physical diversity." In this "manner, one aspect of the present invention relates to the synthesis of genes with selected sequences that follow one or more computer restructuring procedures as described in FIG. establishes in the present For GAGGS to be an effective technology in time and resources, "gene synthesis technology is typically used to construct" gene files in a consistent manner and in close fidelity to the sequence representations produced by GA manipulations. GAGGS typically uses gene synthesis methods that allow the rapid construction of "gene" 104-109 variations files. This is typically suitable for examination / selection protocols, since larger files are more difficult to process and maintain and sometimes can not be so completely sampled by a physical test or selection method. For example, existing physical testing methods in the field (including, for example, "life and death" selection methods) generally allow sampling of variations of approximately 109 or less by a particular examination of a particular file and many trials are effectively limit to sample members of 104-10s. In this way, building several smaller files is a preferred method, since large files can not be easily sampled completely. However, large archives can also be developed and sampled, for example using high-throughput screening methods. Gene Synthesis Technologies - There are many methods that can be used to synthesize genes with well-defined sequences. For the purpose of illustration clarity only, this section focuses on one of the many possible and available types of known methods for the synthesis of genes and polynucleotides. The current technique in polynucleotide synthesis is best represented by the well-known and fully developed phosphoramide chemistry that allows.
the effective preparation of oligos. This is possible, but somewhat impractical, to use this chemistry for the routine synthesis of oligos significantly greater than 100 bp, since the quality of the sequence is deteriorated by the larger oligos, with larger synthetic oligos that are generally purified before being used . Oligos of a "typical" size of 40-80 bp, can be obtained routinely and directly with very high purity and without substantial deterioration of the sequence. For example, oligonucleotides for example, to be used in vitro in gene reconstruction / amplification methods, to be used as gene tests or as restructuring targets (eg, synthetic genes or gene segments) are typically synthesized in a Chemistry according to the solid phase phosphoramidite triester method described by Beaucage and Caruthers (1981), Tetrahedron Letts, 22 (20): 1859-1862, for example using an automated synthesizer, as described in Needham-anDevanter et al. (1984) Nucleic. Acids Res. , 12: 6159-6168. The oligonucleotides can also be made and ordered according to specifications from a variety of commercial sources known to the skilled person. There are many commercial providers of oligo synthesis services and in this way it is a widely accessible technology. Any nucleic acid can be ordered according to specifications from any variety of commercial sources, such as The MidlancL Certified Reagent Company (mcrc@oligos.com), The Great American Gene Company (http://www.genco.com), ExpressGen Inc ._ (www. expressgen. com) Operon Technologies_ Inc. XAlameda, CA) and many others. Similarly, peptides and antibodies can be ordered according to specifications from any of a wide variety of sources, such as FeptidoGenic (pkim@ccnet.com), HTI Bio-products, inc. (http://www.htibio.com), BMA Biomedicals Ltd (U.K.), Bio • Synthesis, Inc. and many others. Dillon and Rosen (Biotechniques, 1990 _, _? (3J 298-300JX_) establish a relevant demonstration of the total synthesis of genes from small fragments that are easily susceptible to optimization, parallelism and high performance. Rapid baseline PCR of a gene from a set of single stranded oligonucleotides that overlap partially, without the use of ligase Several groups have also described successful applications of variations of the same PCR-based gene-binding approach for the synthesis of several genes of increased size, demonstrating its general applicability and its natural commomination for the synthesis of mutated gene files Useful references include ~ Sandhu et al (Biotechniques, 1992, 12 (l)? 5-16X_ ( genes of 220 bp from 3 oligos of 77-86 bp), Prodromou and Pearl (Protein Engineering, 1992, 5 (8) 827-829 fg'ehes de "" 522 bp, from 10 oligos of 54-86 b p); Chen et al, 1994 (JACS, 119X (11): 8799-8800), (779 bp of genes); Hayashi et al, 1994 (Biotechniques, 1994, 17: 310-314) and others. More recently, Stemmer et al (Gene, 1995, 164: 49-53) show, for example, that PCR-based binding methods are effectively useful for building large genes of up to at least 2.7 kb from dozens or even hundreds of oligos. Synthetic 40 bp. These authors also demonstrated that, from four basic stages that comprise gene synthesis protocols -based on PCR
(synthesis of oligos, gene ordering, gene amplification and optionally cloning) the amplification step of the gene can be omitted if a "circular"
PCR Numerous publications of the inventors and their collaborators, as well as other researchers in the art also describe techniques that facilitate the restructuring of DNA, for example, by provision for the binding of genes from small fragments or even oligonucleotides.
One aspect of the present invention is the ability to utilize families of restructuring oligonucleotides and crossing oligonucleotides as recombination models / intermediates in different methods of DNA restructuring. Indeed, numerous publications of the inventors and their collaborators, as well as other researchers in the art, also describe techniques that facilitate the rearrangement of genes from small fragments, including oligonucleotides. In addition to the publications noted above, Stemmer et al. (1998) Patent of "EU No. 5,834,252 END COMPLEMENTARY POLYMERASE REACTION" describes the processes for amplifying and detecting an objective sequence (for example in a mixture of nucleic acids), as well as for ordering large polynucleotides to Crameri et al. (1998) Nature 391: 288-291 provides the basic methodologies for rearranging genes, as does Crameri et al. (1998) Bio techniques 18 (2): 194-196X More recently, they have Numerous gene-gathering protocols have been described that simultaneously recombine and reconstruct genes in various applications of the inventors and their collaborators, such as "" OLIGONUCLEOTIDE MEDIATED NUCLEIC ACID RECOMBINATION "(" RECOGNITION OF NUCLEIC ACIDS MEDIATED BY OLIGONUCLEOTIDES ") by Crameri et al . Submitted on February 5, 1999 (USSN 60 / 118,813) and filed on June 24, 1999 (USSN 60 / 141,049) and filed on September 28, 1999 (USSN 09 / 408,392) and "USE OF CODON-BASED OLIGONUCLEOTIDE SYNTHESIS FOR SYNTHETIC SHUFFLLNG "(" USE OF THE SYNTHESIS OF OLIGONUCLEOTIDES BASED ON CODQN FOR SYNTHETIC RESTRUCTURING ") by Welch et al, filed on September 28, 1999 (USSN 09 / 408,393). In these modalities, synthetic recombination methods are used in which the oligonucleotides corresponding to different homologs are synthesized and rearranged in PCR or ligation reactions that include the oligonucleotides corresponding to more than one nucleic acid genitor, by which generate new recombined nucleic acids. ~ An advantage of oligonucleotide-mediated recombination is the ability to -combinate homologous nucleic acids with low sequence similarity or even to recombine non-homologous nucleic acids. In these methods of restructuring oligonucleotides of low homology, one or more sets of fragmented nucleic acids are recombined, for example with a set of oligonucleotides of cross-family diversity. Each of these crossing oligonucleotides has a plurality of sequence diversity domains corresponding to a plurality of sequence diversity domains from homologous or non-homologous nucleic acids with low sequence similarity. The fragmented oligonucleotides that are derived by comparison to one or more homologous or non-homologous nucleic acids can hybridize to one or more regions of the crossing oligos, which facilitate recombination. Such sets of oligonucleotides are selected in silico according to the methods herein. When the homologous nucleic acids are recombined, the oligonucleotide sets of gene restructuring of coat families (which are derived by comparison of homologous nucleic acids and synthesis of sets of oligonucleotide fragments, corresponding to regions of similarity and regions of diversity derived from the comparison) are hybridized and elongated (e.g., by PCR restructuring), which provide a population of recombined nucleic acids, which can be selected for a desired characteristic or property. Typically, the oligonucleotide set of coat family restructuring genes includes a plurality of types of oligonucleotide members that have subsequent consensus regions derived from a plurality of target nucleic acids homologues. Typically, oligonucleotides for restructuring family-genes are provided by alignment homologous nucleic acid sequences to select conserved regions of sequence identity and regions of sequence diversity. A plurality of restructuring oligonucleotides of family genes are synthesized (serially or in parallel) which corresponds to at least one region of sequence deviation. Additional details regarding the restructuring of families are found in USSN 09 / 408,392, cited above. The sets of fragments or subsets of fragments used in the oligonucleotide restructuring approaches can be provided by dividing one or more homologous nucleic acids (for example with a DNase) or more commonly by synthesizing a set of oligonucleotides corresponding to a plurality of oligonucleotides. regions of at least one nucleic acid (oligonucleotides corresponding to a full-length nucleic acid are typically provided as members of a set of nucleic acid fragments). In the restructuring procedures herein, these divided fragments can be used in conjunction with restructuring oligonucleotides of family genes, for example in one or more recombination reactions to produce the recombinant nucleic acids. The binding of genes by PCR from synthetic coatings complementary to simple plating is a selection method to be practiced in GAGGS. The optimization of this method can be carried out, for example including varying lengths of oligonucleotides, the number of oligos in the recombination reaction, the degree of coverage of oligonucleotides, the levels and nature of the degeneracy of the sequence, the conditions of Specific reaction and particular polymerase enzymes used in rearrangement and in controlling the accuracy of gene binding to decrease or increase the number of sequence deviations during gene synthesis. The method can also be practiced in a parallel mode, where each member of individual files, which includes a plurality of the genes proposed for the physical examination of sequences, is synthesized in containers or spatially separated container facilities or in a grouped manner , where all or part of the desired plurality of genes is synthesized in a single container. Many other synthetic methods for making synthetic nucleotides and the specific advantages of using one against the other are also known so that practicing GAGGS can be easily determined by one skilled in the art. Sequence deconvolution _ _
The sequence deconvolution is carried out in those ^ variants of polynucleotides that are found to have desired properties, in order to confirm the changes in the corresponding character strings (ie correspond to physical sequences for biopolymers) producing the changes desired in the composition of relevant material (e.g., a polynucleotide, polypeptide or the like). Sequencing and other standard recombinant techniques useful for the present invention, including sequence deconvolution, are found, for example in Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology (Guide for Techniques Methods and Enzymology of Molecular Cloning) volume 152 _ Academic Press, Inc. San Diego, CA (Berger); Sambrook et ai? Molecular Cloning-A Laboratory Manual (Molecular Cloning - A Laboratory Manual) (2nd Edition) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 1989 ("Sambrook") and Current Protocols in Molecular Biology (Current Protocols in Molecular Biology), F. M Ausubel et al. Current Protocols, a joint venture between Greene -Publishing Associates, Inc. and John Wiley & Sons, Inc. (supplemented through 1999) (("Ausubel")). In addition to sequencing GAGGS products, unique restriction sites can also be used to detect particular sequences. Sufficient information to guide an expert through restriction enzyme digestion is also found in Sambrook, Berger and Ausubel, id. Methods of cell transduction, including plant and animal cells, with nucleic acids generated by GAGGS, for example, for cloning and sequencing and / or for expression and selection of encoded molecules are generally available, as are methods of expression proteins encoded by such nucleic acids. In addition to Berger, Ausubel and Sambrook ,. the general useful references for the culture of animal cells include Freshney, (Culture of Animal Cells, a Manual of Basic Technique, A Third Edition of Wiley-Liss, New, York (1994)) and references cited herein, Humason (Animal Tissue Techniques (Techniques of Tissue
Animal), fourth edition W.H. Freeman and Company_ (1979)) and
Ricciardelli et al., In Vitro Cell Dev. Biol. 25: 1016-1024- (1989)). References for cloning, cultivation and regeneration of plant cells include Payne et al.
(1992) Plant Cell and Tissue Culture in Liquid Systems (Plant Cells and Cultivation of Tissues in Liquid Systems)
John Wiley & Sons, Inc. New York, NY (Payne); and Gamborg and Phillips (eds) (1995) Plant Cell, Tissue and Organ Culture; Fundamental Methods (Plant Cells, Tissue and Organ Culture, Fundamental Methods) Lab Manual. Springer, Springer-Verlag (Berlin Heidelberg New York) (Gamborg). A variety of cell culture media are described in Atlas and Parks (eds); The Plant Cell Hañdbook of Microb olog cal Media (The Manual of Microbiological Media)
(1993) CRC Press, Boca Raton, FL (Atlas). Additional information for the cultivation of plant cells is found in commercially available literature such as the Life Science Research Cell Culture Catalog (1998) by Sigma-Aldrich, Inc. (St Louis, MO) (Sigma-LSRCCC) and for example The Plant Cell Plant Culture Catalog _ supplement (1997) also_ from Sigma-Aldrich, Inc. (St Louis, MO) (Sigma-PCCS ). In vitro amplification methods can also be used to amplify and / or sequence the nulceic acids generated by GAGGS, for example for cloning and selection. Examples of techniques sufficient to direct the expert through the typical amplification and sequencing methods in vitro, include the chain reaction of polymerases (PCR), the ligase chain reaction (LCR), the amplification of the Qβ-replicase and other techniques mediated by RNA polymerases (eg, NASBA) are found in Berger, Sambrook and Ausubel, id as well as in Mullis et al (1987), US Patent No.
4,683, 202; PCR Protocols to Guide to Methods and Applications
(PCR Protocols a Guide for Methods and Applications)
(Innis et al.) Academic Press, Inc. San Diego, CA (1990) (Innis); Arnheim & Levinson (Octube 1, 1990) C & EN 36-47; The Journal of NIH Research (1991) 3, 81-94; Kwoh et al (1989) Proc. Nati Acad. Sci. USA 86, 1173; Guatelli et. al (1990). ' Proc. Nati Acad. Sci. USA 87, 18-74; Lome11.et al (1989) J .. Clin. Chem. 35, 1826; Landegren et al, (1988) Science 241, "1077-1080, Van Brunt (1990) Biotechnology 8, 291-294, Wu and .. Wallace, (1989) Gene 4, 560, Barringer et al (199? J Gene 89 , 117 and Sooknanan and Malek (1995) Biotechnology 13: 563-564 Improved methods of nucleic acids amplified by in vitro cloning are described in Wallace et al, U.S. Pat. No. 5,426,039, Improved methods for amplifying large acids nulceics by PCR are summarized in Cheng et al (1994) Nature 369: 684-685 and references "in the present," in which PCR amplicons of up to 40kb are generated.
The PCR rearrangement techniques were discussed supra. An "expert will appreciate that essentially any RNA can be converted into a double stranded DNA suitable for restriction digestion, for expansion and sequencing of PCR using the reverse transquiptaza and nail Xolimeraza, see, Ausubel, Sambrook and Berger, supra. gene synthesis is essentially error-free (rigorous) and, for example, carried out in a parallel mode where each of the individual members of the file was originally synthesized in spatially separated areas or containers (vessels) then deconvolution is carried out by referring the positional coding index for each proposed sequence of all members of the file.If the synthesis is carried out in a joint way (or if the members of the file were grouped during the selection) then a of the many known polynucleotide sequencing techniques Recursive GAGGS processes The na is well understood Recursive nature of the directed evolution methods, directed to the progressive improvement / in rounds of the desired properties of the polynucleotides and polypeptides. In the directed evolution
(DE) by GAGGS, one or more chain (s) of deconvoluted characters (s) that encode the sequences of those variants that display certain changes in the level of desired properties - (where the level is defined arbitrarily by increments / decrements / proportions between measures of various properties), they can be used to understand a new file of character strings for a new round of GAGGS. Recursive GAGGS, different from typical DNA restructuring, does not use the physical manipulation of polynucleotides to produce subsequent generations of gene diversity. Instead, GAGGS simply uses the sequence information that describes the beneficial changes acquired as a basis to generate additional changes that lead to subsequent changes (improvements) in the desired properties of the molecules encoded by the character strings. Recursive GAGGS can be run up in character strings that develop to the point where polynucleotides and encoded polypeptides reach arbitrarily established levels of desired characteristics or even in additional changes in characteristics that can not be obtained (for example, the rate limit). of theoretical diffusion reached of rotation - of the enzyme under the conditions of a physical test). Genetic algorithm parameters, methods and gene synthesis schemes, as well as physical trials and sequence deconvolution methods can vary in each of the different rounds / cycles of evolution driven by the recursive GAGGS. One particular aspect of this approach is that an initially random or pseudo-random approach to file generation can become progressively more targeted as information about "levels of activity become available. For example, any approach of eurístic learning or neural system approach becomes gradually more efficient in the "correct" (active) selection sequences. A variant of - "such approaches is established below, including the analysis of the main component, use of negative data, parameterization of data and the like .. Integration of GAGGS, Restructuring of DNA and Other Technologies of Directed Evolution GAGGS constitutes an independent technology and self-sufficient that can be practiced with respect to DNA restructuring or any other method of directed evolution available.However, one or more rounds of GAGGS can be and often is, practiced in combination with the physical restructuring of nucleic acids and / or in combination with site-directed mutagenesis or prone error PCR (for example as alternative cycles of a directed evolution process) or other methods of generation of diversity The polynucleotide files generated by GAGGS can undergo acid restructuring nucleic acids and can be selected _the polynucleotides found Those having the desired characteristics after the rounds in silico and / or physical restructuring and are sequenced to provide the character strings to evaluate the GAGGS processes or to form character strings for additional GAGGS operations. In this way GAGGS can be carried out "out as a" single-stranded technology or can be followed by restructuring, mutagenesis, random-start PCR etc. When the methods of the invention cause the realization of physical recombination ("restructuring") and the examination or selection for the development of individual genes, complete plasmids, viruses, multigenic groupings or even complete genomes, the techniques of the inventors and their collaborators They are particularly useful. For example, repetitive cycles of recombination and examination / selection can be carried out for the further development of nucleic acids of interest, which are generated by the performance of a GO in a chain of characters (for example followed by the synthesis of the corresponding oligonucleotides and the generation / regeneration of genes, for example by the binding of PCR). The following publications describe a variety of recursive recombination procedures and / or methods of generating related diversity that may be practiced in conjunction with the in silico process of the invention: Stemmer et al (1999) "Molecular breeding of viruses for targeting and other clinical properties Tumor Targeting "(" Molecular culture of viruses for targets and other clinical properties, aimed at the tumor ") 4: 1-4; Ness et al (1999) "DNA
Shuffling of subgenomic sequences of subtilisin "
("Restructuring DNA from the subgenomic sequences of Nature Biotechnology 17: 893-896; Chang et al. (1999)" Evolution of a cytokine using DNA family shuffling "
("Evolution of a cytosine using restructuring of DNA families") Nature Biotechnology 17: 793-797; Minshull and
Stemmer (1999) "Protein evolution by molecular breeding"
("Evolution of protein by molecular culture") Current Opinion in Chemical Biology 3: 284-290; Christians et al
(1999) "Directed evolution of thymidine kinase for AZT phosphorylation using DNA family shuffling" ("Directed evolution of thymidine kinase for AZT phosphorylation using restructuring of DNA families") Nature -
Biotechnology 17: 259-264; Crameri et al (19981"DNA shuffling of a family of genes" from diverse species accelerates directed evolution "(" DNA restructuring of a family of genes from evolution driven by accelerators of different species ") Nature 391: 288-291; Crameri et al (1997) "Molecular evolution of an arsenate detoxification pathway by DNA shuffling" ("Molecular evolution of a pathway of detoxification of arcenate by DNA restructuring") Nature Biotechnology 15: 436-43S; Zhang et al (1997) "Directed evolution of an effective" fucosidase from a galactosidase by DNA shuffling and screening "(" Directed evolution of an effective fucosidase from a galactosidase by restructuring and DNA examination ") Proccedings of the National Academy of Sciences, USA 94: 4504 L509; Patten et__al (1997),
"Applications of DNA Shuffling to Pharmaceuticals and Vasccines" ("DNA Restructuring Applications for Pharmacists and Vaccines") Current Opinion in Biotechnology 8: 724-733; Crameri et al (1996) "Construction and evolution of antibody-phage libraries by DNA shuffling" ("Construction and evolution of antibody-phage files by restructuring DNA") Nature Medicine 2: 100-103; Crameri et al (1996) "Improved green fluorescent protein by molecular evolution using DNA shuffling" ("Fluorescent protein of green enhanced by molecular evolution using DNA restructuring") Nature Biotechnology 14: 315-319; Gates et al (1996) "Affinity selective isolation of ligands from peptide libraries through display on a lac repressor" -headpiece dimer "(" Selective affinity isolation of ligands from peptide files through the display in a 'header dimer') 'lacquer repressor') Journal of Molecular Biology 255: 373-386"; Stemmer (1996)." Sexual PCR and Assembly PCR "(" Sexual PCR and binding PCR ") In: The Encyclopedia of Molecular Biology, VCH JPublishers ^ .
NY. pp 447-457; Crameri and Stemmer (1995) "Combinational multiple cassette mutagenesis creates all the permutations of mutant and wildtype cassettes" ("The mutagenesis of multiple isotopes cobinational portal creates all the permutations of the mutant and wild isotope carriers") Bio Techniques 18: 194-195; Stemmer et al (1995) "Single-step assembly of an entire plasmid form large numbers of oligodeoxyribonucleotides" ("The binding of a single step of a gene and a complete plasmid forms a large number of oligodeoxyribonucleotides") Gene 164: 49 -53; Stemmer (1995), "The Evolution of Molecular Computation" ("The Evolution of Molecular Computing") Science 270: 1510; Stemmer (1995). "Searchmg Seqúense Space" ("Search Sequence Space") Bio / Technology 13: 549-553; Stemmer (1994) "Rapid evolution of a protein in vitro by DNA shuffling" ("Rapid evolution of a protein in vitro by DNA restructuring") Nature 370: 389-391; and Stemmer (1994) "DNA shuffling by random fragmentation and reassembly: In vitro recombination for molecular evolution" ("Restructuring DNA by random fragmentation and reassembly: In vitro recombination for molecular evolution") Proceedings of the National Academy of Sciences, USES. 91: 10747XL075I7 ~ Additional details regarding "methods of DNA restructuring are found in the U.S. Patents of the inventors and their collaborators, including: U.S. Patent 5,605,793 to Stemmer (February 25, 1997)," METHODS FOR IN VITRO RECOMBINATION "U.S. Patent 5,811,238 to Stemmer et al (September 22, 1998)" METHODS FOR GENERATING POLYUCLEOTIDES WITH DESIRED CHARACTERISTICS ROR SELECTION AND ITERATIVE RECOMBINATION "U.S. Patent 5,83u, 721 of Stemmer et al (November 3, 1998) "MUTAGENESIS OF: SDN THROUGH FRAGMENTATION AND REORDENATION ^ RANDOM"; U.S. Patent 5,834,252 to Stémmer et al.
(November 10, 1998) "REACTION OF THE EXTREME POLYMERAZA
COMPLEMENTARY "and U.S. Patent 5,837,458 to Minshull et al (November 17, 1998)" METHODS AND
COMPOSITIONS FOR CELLULAR AND METABOLIC ENGINEERING ".
In addition, details and formats for nucleic acid restructuring are found in a variety of PCT publications and foreign patent applications, including: Stemmer and Crameri, "DNA MUTAGENESIS BY FRAGMENTATION AND RANDOMIZED REORGANIZATION" WO 95/22625; Stemmer and Lipschutz "CHAIN REACTION OF COMPLEMENTARY EXTREME POLYMERAZA" WO 96/33207; Stemmer and Crameri "METHODS FOR GENERATING POLYUCLEOTIDES THAT HAVE" DESIRED CHARACTERISTICS THROUGH INTERACTIVE SELECTION AND RECOMBINATION "WO 97/0078; Minshul and Stemmer," METHODS AND COMPOSITIONS FOR "CELLULAR AND METABOLIC ENGINEERING" WO 97/35966; Punnonen et al "OBJECTIVES OF THE VECTORS" OF GENETIC VACCINES "WO 99/41402; Punnonen et al" IMMUNIZATION OF ANTIGEN FILES "WO 99/41383; Punnonen et al" ENGINEERING "GENE VACCINE VECTORS" WO 99/41369; Punnonen et al "OPTIMIZATION OF LAB IMMUNOMODULATING PROPERTIES OF GENETIC VACCINES" WO 9941368 / Stemmer and Crameri "DNA MUTAGENESIS BY FRAGMENTATION AND RANDOMIZED REORGANIZATION" EP 0934999; Stemmer "DEVELOPMENT OF" CELLULAR DNA RECEPTION THROUGH RECOMBINATION "" OF "" RECURSIVE SEQUENCES "EP 0932670; Stemmer et al" MODIFICATION OF THE GUEST RANGE AND VIRUS TROPISM THROUGH THE RESTRUCTURING OF THE VIRAL GENOME "WO 9923107; Apt et al" VECTORS OF THE "HUMAN PAPILLOMA" VIRUS WO 9921979; Del Cardayre et al -
"EVOLUTION OF CELLS" AND COMPLETE ORGANISMS THROUGH RECURSIVE RECOMMENDATION "OF SEQUENCES" WO 9831837; Patten and Stemmer "METHODS AND COMPOSITIONS FOR POLYUCLEOTIDE ENGINEERING" WO 9827230; Stemmer et al and "METHODS FOR THE OPTIMIZATION OF GEHElX THERAPY THROUGH RESUURY RESTRUCTURING AND SELECTION OF SEQUENCES" WO '9813487. Certain EU applications provide additional details with respect to DNA restructuring and related techniques, including "RESTRUCTURING OF CODON MODIFIED GENES" by Patten et al presented on September 29, 1998, (USSN 60 / 102,362) January 29, 1999 (USSN 60 / 117,729) and September 28, 1999, USSN PCT / US-59/22588; "EVOLUTION OF COMPLETE CELLS AND ORGANISMS THROUGH RECURSIVE RECOMBINATION OF SEQUENCES" 'by Cardyre et al presented on July 15, 1999 (USSN 09 / 354,922); "RECOGNITION OF NUCLEIC ACIDS MEDIATED BY OLIGONUCLEOTIDES" by Craneri et al, presented on February 5, 1999 (USSN 60 / 118,813) and filed on June 24, 1999 (USSN 60 / 141,049) and filed on September 28, 1999 ( USSN 09 / 408,392) and "USE OF THE SYNTHESIS OF OLIGONUCLEOTIDES BASED ON THE CODON FOR SYNTHETIC RESTRUCTURING" by Welch et al, filed on September 28, 1999 (USSN 09 / 408,393).
-
As reviewed in the prior publications, the patents, published applications and US patent applications disclose the restructuring (or "recursive recombination") of the nucleic acids to provide new "nucleic acids with desired properties that can be carried out by numerous established methods Any of these methods is integrated with those of the present invention by the incorporation of nucleic acids corresponding to strings of characters produced by the performance of one or more GOs or one or more strings of selected genetic characters; Methods can be adapted to the present invention for the development of nucleic acids produced by GAGGS as discussed herein to produce novel nucleic acids with improved properties Both the methods of making such nucleic acids and the nucleic acids produced by these methods are a feature of the invention. invention In summary, at least five different general classes of recombination methods can be carried out (separately or "in combination") according to the present invention. First, nucleic acids such as those produced by the synthesis of nucleic acid sets corresponding to the character strings produced by GO manipulation of character strings, or available homologs of such sets or both, can be recombined in vitro by any of a variety of techniques discussed in the above references, including for example, the digestion of DNAse from nucleic acids to recombine followed by ligation and / or rearrangement of nucleic acid PCR. Second, the sets of nucleic acids that correspond to the character strings produced by the GO manipulation of the character strings, and / or the available homologs of such sets, can be recombined recursively in. live, for example by allowing recombination to occur between the nucleic acids while they are in the cells. Third, recombination methods of the whole cell genome in which the complete genomes of Xas cells are recombined can be used, optionally including the disabling of the genomic recombination mixtures with the components of the desired file such as with the sets of nucleic acids corresponding to the character strings produced by the GO manipulation of the strings of characters or the available counterparts of such sets. Fourth, synthetic recymbination methods can be used in which the oligonucleotides corresponding to different homologs are synthesized and rearranged in PCR or ligation reactions including oligonucleotides which correspond to more than one nucleic acid genitor, thereby generating new recombined nucleic acids. The oligonucleotides can be made by standard nucleotide addition methods or they can be made by the synthetic tripucleotide processes. Fifth, only in silico recombination methods can be performed, in which GOs are used in a computer for recombination sequence strands that correspond to nucleic acids or homologous proteins. The resulting strands of recombined sequences are optionally converted to nucleic acids by the synthesis of nucleic acids corresponding to the recombined sequences, for example in concert with the techniques of oligonucleotide synthesis / gene rearrangement. Any of the preceding general recombination formats can be practiced separately or together in a repetitive manner to generate a diverse set of recombinant nucleic acids. The above references in conjunction with the present disclosure provide these and other basic recombination formats as well as many modifications of these formats. With respect to the format that is used, the nucleic acids of the invention can be recombined (with each other or with related (or even unrelated) nucleic acids) to produce a diverse set of recombinant nucleic acids, including homologous nucleic acids. Other diversity generation approaches can also be used to modify the character strings or nucleic acids. Alassical diversity can be introduced into the entry or exit of the nucleic acids by the methods resulting from the modification of individual nucleotides or groups of contiguous or non-contiguous nucleotides ie methods of mutagenesis. Mutagenesis methods include, for example, recombination (PCT / US98 / 05223; Publ.
No. W098 / 42727); mutagenesis directed to oligonucleotides
(for review see, Smith, Ann. Rev. Genet., 19: 423-462
* (1985); Botstein and Shortle, Science 229: 1193-1201 (1985);
Carter, Biochem. J. 237: 1-7 (1986); Kunkel, "TheXefficiency of oligonucleotide directed mutagenesis" (The efficiency of oligonucleotide-directed mutagenesis) in Nucleic Acids & Molecular Biology Eckstein and Lilley eds. Springer Verlag, Berlin (1987)). Included among these methods is oligonucleotide-directed mutagenesis (Zoller and Smith, Nuci, Acids Res. 10: 6487-6500 (1982), Methpds in Enzymol .. 100: 468-500 (1983), and Methods in Enzymol. : 329-350 (1987)) _ mutagenesis of phosphothionate-modified DNA (Taylor et al, Nucí .. Acids Res. 13: 8749-8764 (1985); Taylor 'et ~ 7 ~ al, Nucí.' Acids Res. 13 : 8765-8787 (1985), Nakamaye and Eckstein, Nucí.
-
Acids Res. 14: 9879-9698 (1986); Sayers et al, Ñúcl. Acids Res. 16: 791-802 (1988); Sayers et al, Nucí. Acids .Res X 16: 803-814 (1988)), mutagenesis using models containing uracil (Kunkel Proc. Nat'l. Acad. Sci. USA 82: 488-492 (1985) and Kunkel et al Methods in Enzymol 154 367-382)) _ mutagenesis using duplex DNA with interstitium (Kramer et al Nucí, Acids Res. 12: 9441-9456 (1984); Kramer and Fritz Methods in Enzymol 154: 350-367 (1987); Kramer et al. Nucí - ^ Acids Res. 16: 7207 (1988)); and Fritz et al Nucí. Acids Res. 16: 6987-6999 (1988)). Additional suitable methods include repairing dot inequality (Kramer et al, Cell 38: 8-79-887 (1984), mutagenesis using host strains deficient in repair (Cárter et al Nucí, Acids Res. 13: 4431- 4443 (1985); Cárter Methods in Enzymol., 154: 382-403 (1987)), deletion mutagenesis (Eghtedarzadeh and Henokoff Nucí, Acids Res. 14: 5115 (1986)), restriction-selection and restriction-purification (Wells et al, Phil, Trans.R. Soc. Lond. A 317: 415-423 (1986), mutagenesis by total gene synthesis (Nambiar et al Science 223: _ 1299-1301 (1984), - Sakamar and Khorana Nucí. Acids Res. 14: 6361-6372
(198 * 8); Wells et al, Gene 34: 315-323 (1985); and Grundstrom et al-Nucí. Acids Res. 13: 3305-3316 (1985). Equipment for mutagenesis is commercially available (eg Bio-Rad, Amersham International, Anglian. Biotechnology). Other methods of generating diversity are proposed in the U.S. Patent. No. 5,756,316; Patent of E.U. No. 5,965,408; Ostermeíer et al. (1999) "A Combinatorial Approach to Hybrid Encounters of DNA Homology" (A Combination Approach for Hybrid Enzymes Independent of DNA Homology "Nature Biotech 17: 1205, - U.S. Patent No. 5,783,431; 5,824,485, U.S. Patent No. 5, 958, 672; "irholt et al. (1998)." Exploiting sequence space: shuffling in vivo formed complementarity, determining regions., Into a master framework "(Sequence space of .exploitation: restructuring in vivo formed from regions of complementarity determination in a master structure "Gene 215: 471, U.S. Patent No. 5,939,250, WO" 99/10539, WO "98/58085, WO 99/10539, and others. generation of diversity can be combined with each other or with restructuring reactions or in silico operations in any combination selected by the user, to produce a variety of nucleic acids, which can be examined to use any method of. amination available. After recombinations or other diversification reactions, any nucleic acid that is produced can be selected for a desired activity. In the context of the present invention this may include testing and identifying any detectable or assayable activity, by any relevant assay in the art. A variety of relevant (or not yet relevant) properties can be tested using any available assay. In accordance with the foregoing, a nucleic acid produced by recursively recombining one or more polynucleotides of the invention (produced by GAGGS methods) with one or more additional nucleic acids forms part of the invention. One or more additional nucleic acids may include another polypeptide of the invention; optionally, alternatively or in addition the one or more additional nucleic acids may include, for example, a nucleic acid encoding a sequence or sequence occurring naturally or any homologous sequence or subsequence. The recombination steps can be carried out in vivo, in vitro or in silico m as described in greater detail in the references above and in the present. Also included in the invention is a cell that contains any resulting recombinant nucleic acid, nucleic acid files produced by the recursive recombination of the nucleic acids set forth herein and populations of cells, vectors, viruses, plasmids or the like comprising the nucleic acids. archives or comprising any recombinant nucleic acid that results from the recombination (or recursive recombination) of a nucleic acid as set forth herein with -Other nucleic acid or an additional nucleic acid. The chains of corresponding sequences in a database present in a computer system or computer-readable medium are a feature of the invention. By way of example, a typical physical recombination procedure initiates at least two substrates that generally show at least some identity to each other. Let say, at least about 30%, 50%, 70%, 80% or 90% or greater sequence identity), but differ from each other in certain positions (however only in silico or in the mediated by cross-over oligonucleotide formats, nucleic acids may show little or no homology.) For example, two or more nucleic acids may recombine The difference between nucleic acids purede be any type of mutation, for example, subtractions, insertions and cancellations.Often, the different segments differ from each other in approximately 1-20 positions. generate "increased identity in relation to the starting material, the starting materials-they differ from each other in at least two nucleotide positions. That is, if there are only two substrates, they must be in at least two divergent positions. If there are three substrates, for example, one substrate may differ from the second in a single position and the second may differ from the third in a single different position. Of course, even if only one initial character string is provided, any GO in the present can be used to -modify the nucleic acid to produce a diverse array of nucleic acids that can be screened for an activity of interest. In physical restructuring procedures, the initial DNA segments can be natural variants to each other, for example allelic or species variants. More typically they are derived from one or more homologous nucleic acid sequences. The segments can also be non-allelic genes that show some degree of structural and usually functional relativity. The DNA start segments can also be variants induced from each other. For example, one segment of DNA can be produced by replication of the prone-error PCR of the other or by substitution of a mutagenic carrier isotope. Induced mutants can also be prepared - by propagating one (or both) of the segments in a mutagenic strain. In this situation, strictly speaking the second DNA segment is not a single segment but a large family of related segments. The different segments that form the starting materials are often of the same length or substantially the same length. However this does not need to be the case; For example, one segment may be a subsequence of another. The segments may be present as part of large molecules, such as vectors or may be in isolated form. In one option, the nucleic acids of interest are derived from DE by GAGGS. METHODS OF OLIGONUCLEOTIDES THAT VARY FROM COPON _ - Oligonucleotides that vary from codon are oligonucleotides, similar in sequence - but with one or more base variations, where the variations correspond to "less than one amino acid difference coded. They can be synthesized using tri-nucleotides, that is, chemically-coupled codon-based phosphoramidite, in which the tri-nucleotide phosphoramidites representing the codons for all 20 amino acids are used to introduce complete codons into oligonucleotide sequences synthesized by this solid phase technique. Preferably, all oligonucleotides of a selected length (e.g., about 20, 30, 40, 50, 60, 70, 80, 90 or 100 or more nucleotides) are synthesized which incorporate the selected nucleic acid sequence. In the present invention, codon-varying oligonucleotide sequences can be based on sequences from a selected set of nucleic acids, generated by any of the methods noted herein. Further details regarding the synthesis of tri-nucleotides can be found in USSN 09 / 408,393"USE OF CODON VARIED OLIGONUCLEOTIDE SYNTHESIS FOR SYNTHETIC SHUFFLING" (USE OF THE SYNTHESIS OF OLIGONUCLEOTIDES THAT WILL VARIABLE! "" .DE CODON FOR SYNTHETIC RESTRUCTURING "by Welch et al presented on 09/28/1999 Oligonucleotides can be elaborated by standard nucleotide addition methods or they can be elaborated by synthetic tri-nucleotide methods, an advantage of selecting the changes that correspond to the amino acid differences - encoded, is that modification of codon triplets results in few structure shifts (and, therefore, in the same way few inactive file members) Also, the synthesis that focuses on codon modification, rather than simply on Based on the variation, it reduces the total number of oligos that are necessary for a synthesis protocol.
In general, sets of oligos can be combined by binding in many different formats and different combinations schemes to effect the correlation with events and genetic operators at the physical level. As noted, the oligonucleotide coating assemblies can be synthesized and then hybridized and elongated to form full-length nucleic acids. A full-length nucleic acid is any nucleic acid desired by a researcher that is "greater than the oligos that are used in gene reconstruction methods." This may correspond to any percentage of a full-length sequence that occurs naturally, for example, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% or 90% or more of the corresponding natural sequence.The sets of oligos, often have at least about 5, sometimes about 10, often 15, generally about 20 or more overlapping nucleotide sequences to facilitate genetic reconstruction.Olig sets are optionally simplified for gene reconstruction purposes where incidental overlying regions are present ie where the elements of the repetitive sequence are present or are designed in a sequence of genes to be synthesized.The lengths of the oligos in a conjugate They can be the same or different, depending on whether they cover the regions of the sequence. To facilitate hybridization and elongation (for example during the PCR cycles) the coating regions are optionally designed with similar melting temperatures. The genitoras sequences can be squared
(conceptually or physically) and the common sequences are used to select the oligos of common sequences, combining by this the members of oligos in one or more sets to reduce the number of oligos required to elaborate the full-length nucleic acids. Similarly, oligonucleotides with some sequence similarity can be generated by cluster and / or division synthesis where the sets of oligos under synthesis are synthesized. they divide into different groups during the addition of heterologous bases, optionally followed by the stages of synthesis of the meeting (grouping) in subsequent stages where the same addition to the oligos is required. In the restructuring formats of oligos, the heterologous oligos that correspond to many different genitors can be divided and reunited during the synthesis. In synthetic single degeneration procedures, more than one nucleobase may be added during the synthetic-simple steps to produce two or more sequence variations in two or more resulting oligonucleotides. The relative percentage of the addition of corebases can be controlled to derive the synthesis towards one or more genitoral sequences. Similarly, partial generation can be practiced to avoid the insertion of stop codons during the synthesis of degenerate oligonucleotides. The oligos. that correspond to similar subsequences coming from different genitors can be of the same length or different, depending on the subsequences. In this way, in the formats of division and agupamiento, some oligos do not lengthen optionally duarnte each synthetic stage (to avoid displacement of structure some oligos are not extended for the stages corresponding to one or ~ more codons). When construction oligos are found, crossing oligos can be constructed at one or more points of difference between two or more genitoral sequences (one base change or another difference is a genetic locus that can be treated as a point for a crossing event). The crossing oligos have a sequence identity region for a first genital sequence, followed by a region of identity to a second genitora sequence, with the crossing point occurring at the locus. For example, each natural mutation can be a crossing point. Another way to derive the sequence recombination is to disable a mixture of oligonucleotides with fragments of one or more nucleic acid genitors (if more than one nucleic acid genitor is fragmented, the resulting segments can be rendered useless in a recombination mixture at different frequencies to derive the resulting recombination towards one or more genitors). Recombination events can also be designed simply by omitting one or more oligonucleotides corresponding to one or more generators of a recombination mixture. In addition to the use of families of related oligonucleotides, diversity can be modulated by the addition of the pseudo random or random oligos selected for the elongation mixture, which can be used to derive the resulting full-length sequences. Similarly, mutagenic or non-mutagenic conditions can be selected for PCR elongation, resulting in more or fewer diverse files of full-length nucleic acids. In addition to mixing the sets of oligos corresponding to different genitors in the mixture of alrgamiento, the sets of oligos that correspond to only one genitor can be lengthened for the reconstruction of that genitor. In any case, any resulting full-length sequence can be fragmented and recombined, as in the DNA restructuring methods noted in the references cited therein. Many other sets of oligonucleotides and synthetic variations that can be correlated by events and genetic operators at the physical level, are found in "QLIGONUCLEOTIDE MEDIATED NUCLEIC ACID RECOMBINATION" (BY THE RECOMBINATION OF NUCLEIC CIDOS MEDIATED BY-OLIGONUCLEÓTIDOS "by Crameri et al, presented on February 5, 1999 (USSN 60 / 118,813) and filed on June 24, 1999 (USSN 60 / 141,049) and filed on September 28, 1999 (IISSN 09 / 408,392) and "USE OF CODON-BASED OLIGQNUCLEOTIDE SYNTHESIS FOR SYNTHETIC SHUFFLING "(USE OF SYNTHESIS OF OLIGONUCLEOTIDES BASED ON THE CODON FOR THE SYSTEMATIC RESTRUCTURING) by Welch et al., Presented on September 28, 1999 (USSN 09 / 408,393) OBJECTIVES FOR THE MODIFICATION AND RESTRUCTURING OF THE CODON Essentially any nucleic acid can Restructure using the GAGGS methods of the present No attempt is made here to identify the hundreds of thousands of known nucleic acids. Common deposition sequences for known proteins include the GenBank EMBL, DDBJ and the NCBI. Other depositaries can be easily identified by searching the internet. A class of preferred targets for GAGGS methods include nucleic acids that encode proteins
therapeutic agents such as erythropoietin (EPO), insulin, peptide hormones such as human growth hormone; growth factors and cytokines such as epithelial peptide-78 Neutrophil Activation,. GRO / MGSA, GROß, GRO ?, MlP-la, MIP-16, MCP-1, epidermal growth factor A 10, fibroblast growth factor, X factor and hepatocyte growth, insulin-like growth factor, interferons, interleukins, keratinocyte growth factor, leukemia inhibiting factor oncostatin M, PD-ECSF, PDGF, pleiotropin, SCF,
ligand of set c, VEGEF, G-CSF etc. Many of these proteins and their corresponding coding nucleic acids are commercially available (see for example the catalog Sigma BioSciences 1997 and its price list) and in any case, the corresponding genes are well known. £ * f- Another class of preferred targets for GAGGS are the transcription and expression activators. Examples of transcriptional activators and expression include the 11"54 - genes and proteins that" modulate cell growth, differentiation, regulation or the like. Explication and transcriptional activators are found in prokaryotes, viruses, and eukaryotes, including fungi, plants, and animals, including mammals, which provide a wide range of "therapeutic" targets.It will be appreciated that the expression and transcriptional activators regulate transcription by many mechanisms, for example, "by binding to receptors," the "simulation of a cascade of
signal transduction, the regulation of the expression of transcription factors,. the union to the promoters and. méj oradores, the union to roteínas that unite to the promoters and improvers, the development of DNA, the splice of pre-. MRNA, RNA polyadenylation and RNA degradation. The
"Effectors of expression" include cytosines, inflammatory molecules, growth factors, their receptors and oncogenic products, for example "interleukas" (for example IL-1, IL-2, IL-8 etc.), "interferons. , FGF, IGF-1, IGF-II, '....,. ^ - FGF7 PDGF, TNF, TGF-a,' "TGF-ß EGF, KGF, SCF / ponjunto-c,
CD-4-0L / CD40, VLA-4 / VCAM-1, ICAM-1 / LFA-1 and Xhialurin / CD44; molecules of signal transduction and the corresponding ... anthogenic products, for example, Mos, Ras, Raf and Met; and transcriptional activators and suppressors, for example p53, Tat, Fos, Myc, Jun, Myb, Rei and steroid hormone receptors such as those for estrogen, progesterone, testosterone, aldosterone, the LDL receptor ligand and corticosterone. "" "" "" 5 - Similarly, the proteins of "infectious organisms for possible vaccine applications, described in
- * »more detail below, including infectious fungi, for example the species Aspergillus, Candida; bacteria, particularly E. coli, which serves as a model for pathogenic bacteria, as well as medically important bacteria such as Staphylococci (for example aureus), Stréptococci (for example pneumoniae), Clostridia (for example V perfringens), Neisseria ( for example gonorrhea) enteroh? icteriaceae (for example coli), Helicobacter (for
For example pylori), Vibrio (for example cholerae), Capylobacter (for example j ejuni), Pseudomonas (for example aeruginosa), Haemophi lus (for example influenzae), Bordetella (for example pertussis), Mycoplasma (for example pneumoniae), Ureaplasma (for example urealyti cum), Legionella (for example
pneummophila), Spirochetes (for example Treponema, Leptospira and Borrelia), Mycobacteria (for example tuberculosis, smegmati s), Actinomyces (for example israelii), Nocardia (for example asteroids), Chlamydia (for example trachomati s), Rickettsia, Coxí she Ehrilichia, Rochaea, Brucella, .Yersinia, Fracisella and Pasteurella; protozoa such as sporozoa (for example Plasmodia), rhizopods (for example Entamoeba) and flagellates (T ryp anos orna, Leishmania, Trichomonas, Giardia e tc.); viruses such as RNA viruses 5 (+) (eg includes Poxvirus for example vaccinia, Picornavirus for example polio, Togavirus for example rubella, Flavivirus for example HCV, and Coronavirus), tf -t AKN (-) viruses ( examples include Rhabdoviruses, for example VSV, Paramyxovirus, for example RSV, Ortomyxovirus, for example yes influenza, Buniavirus, and Arenavirus), dsDNA virus (Reovirus for example), RNA virus to DNA, ie Retroviruses for example especially HIV and VHLT and certain DNA-to-RNA viruses such as Hepatitis B virus. Other nucleic acids that encode proteins
revealers for non-medical uses, such as transcription inhibitors or toxins from crop pests, eg, insects, fungi, weed plants and the like are also preferred targets for GAGGS,. Indutrically important enzymes such as monooxygenases,
Proteases, nucleases and lipases are also preferred targets. As an example subtilisin can be developed by selected forms of gene restructuring by subtilisin (Von der Osten et al. J. Biotechnol 28: 55-68 (1993) provides a nucleic acid de-coding subtilisin). Also preferred are proteins that aid in cleavage such as chaperonins. Preferred known genes suitable for modification and restructuring of the codon also include the following: Antithipcin Alfa-1, Angiostatin, Antihemolytic factor, Apolipoprotein, Apoprotein, Atrial natriuretic factor, Atrial natriuretic polypeptide, Atrial peptides, CXC chemokines (for example T39765, NAP-2 ENA-78, Gro-a, Gro-b, Gro-c, IP-10, GCP-2, NAP-4, SDF-lX PF4, MIG), Calcitonin, CC chemokines (for example monocyte chemoattractant protein - 1, monocyte-2 chemoattractant protein, monocyte-3 chemoattractant protein, monocyte-1 alpha-inflammatory protein, monstro-1 beta inflammatory protein, RANTES, 1309, R83915, R91733", HCC1, T58847, D31065, T64262) , binding CD40, Collagen, colony stimulating factor (CSF), factor of "complement 5a". complement inhibitor, complement 1 receptor, factor IX, factor VII, factor VIII, factor X, fibrinogen, fibronectin, glucocerebroside gonadotropin, proteins "Hedgehog (for example Sonic, Indian, Desert), Hemoglobin, (for blood substitute; radio sensitization), Hirudin, Human serum albumin, Lactoferrin, Luciferase, Neurturin, Neutrophil inhibitor factor (NIF), osteogenic protein, parathyroid hormone, protein A, G protein, Relaxin, Renin,
X "Salmon calcitonin, salmon growth hormone, soluble complement receptor I, soluble I -CAM, soluble interleukin receptors (IL-1, 2, 3, 4, 5, 6, 7, 9, * 10, 11, 12, 13, 14, 15), soluble TNF receptor, Somatomedin, Somatostatin, Somatotropin, Streptokinase, Superantigens ie Staphylococcal enterotoxins (SEA, SEB, SEC1, SEC2, SEC3, SED, SEE), Toxin toxic shock syndrome (TSST-1), exfoliation toxin A and B pyrogenic exotoxins A, B and C and mitogen of a tritis M., superoxide dismutase, Thymosin alfa 1, tissue plasminogen activator, beta factor of tumor necrosis (TNF beta), tumor necrosis factor receptor (TNFR), tumor necrosis factor alpha (TNF alpha), and urokinase - Other preferred genes for restructuring include p450s (these enzymes represent a very diverse set of many important reactions of natural diversity and catalysis ), see for example Ortiz de Montellano.
(ed) (1995) Cytochrome P450 Structure Mechanism and
Biochemistry, Second Edition Plenum Press _ (New York and London) and references cited herein for an introduction of cytochrome P450. Other monooxygenases can be restructured, as well as dioxygenases, acyl transferases (cis-diol), halogenated hydrocarbon dehalogenases, methyltransferases, terpene synthetases and the like. THE USE OF CONCENSIVE GENES IN DIRECTED EVOLUTION, INCLUDING
"DIPLOMACY" One of the factors included in the restructuring of standard gene families is. the -graded identity of genes to physically recombine. Genes of limited identity are difficult to recombine without cross-over oligonucleotides and often result in restructuring files that have unacceptable elimination rates or non-chimera formation, no activity, nonfunctional file, etc. In one aspect, the present invention overcomes this difficulty by providing the in-design of a "diplomatic" sequence having an intermediate level of homology for each of the sequences to be recombined., facilitating by this the crossing events between the sequences and facilitating the formation of chimeras. This diplomatic sequence can be a string of characters produced by any of a variety of GOs to establish the similarity of the intermediate sequence in the diplomatic sequence as compared to the sequences to be recombined, including by aligning the sequences to select a sequence of consensus, the codon modification to optimize the similarity between the - different nucleic acids or the "similar." As one observes, one way in which a diplomatic sequence is designed is simply to select a sequence of consensus, for example using any of The procedures of the present consensus sequence is generated by comparison and alignment / stacking of a gene family (DNA consensus) or the alignment / stacking of the amino acid sequence (consensus aa). the amino acid consensus sequence is optionally translated back using a codon derivation desirable to further improve homology or to improve the host organism for the expression or selection of alternative codon uses in order to allow access to alternative sets of amino acid codons. Different subsets of gene families can also be used to generate consensus sequences. Additionally, the consensus sequences themselves can encode an improved enzyme. This has been observed elsewhere (eg presentation on "International Documentary Conference Opportunities for the Next Millennium Enzymes" Chicago, IL, May 5 to 7 by Dr. Luis Pasamontes, Roche Vitamins, Inc., on "Development of Heat Stable pitase "(Development of phytase thermostable) -a" consensus phytase has an increase of 16 ° C in thermal stability-. Another example of a consensus protein having improved properties is the "Inferieron of concensus" "(IFN-conl). according to the above, diplomatic sequences separately or in conjunction with any of the techniques herein may diseñarse_ using criteria selected GO and optionally synthesized and restructured physically using any of the techniques of the present. "~ EXAMPLE OF STAGES OF THE PROCESS FOR REVERSE TRANSLATION AND OLIGOS DESIGN The stages of automatic processing (for example carried out in a digital system as scribe here) that perform the "following functions, facilitate the selection of oligonucleotides in the" synthetic restructuring techniques herein. - For example, the system may include a set of instructions that allow the introduction of one. amino acid sequence of a family of proteins of interest. These sequences are translated back with any desired codon usage parameter, eg, optimal use parameters, for one or more organisms to be used for expression, or to optimize the sequence alignments to facilitate recombination or both. For example, the use of the codon can be selected for "multiple expression hosts, for example E. coli and S._cerevisiae." In some cases, simply optimizing the use of the codon for expression in a host cell will result in the preparation of hom sequences more similar logas because lose their natural derivation codon their species. sequences are aligned and a sequence of consensus occurs, optionally showing degenerate codons. oligonucleotides are designed for the synthetic construct of one or more nucleic acids Synthetic elements for restructuring The input parameters in the oligonucleotide designs include the maximum and minimum lengths, minimum length of the identity sequence at the ends, maximum degeneration by the oligonucleotides, oligo coating length, etc. As noted, an alternative for back translation to achieve optimal use of the codon for expression in a particular organism is to translate the sequence back to optimize nucleotide homology among family members. For example, the amino acid sequences are aligned. All possible codons for each amino acid are determined and the codons that minimize the differences between the family of the aligned sequences are selected in each position. _ USE OF FAMILY RESTRUCTURING TO IDENTIFY STRUCTURAL MOTIFS CONFERRING SPECIFIC PROTEIN PROPERTIES It is often of interest to identify regions of a protein that are responsible for specific properties, to facilitate the functional manipulation and design of related proteins. This identification is traditionally made using the structural information usually obtained by biophysical techniques such as X-ray crystallography. The present invention provides an alternative method in which the variants are obtained and analyzed for their specific properties which are then correlated with the motifs of sequences. Sequences of naturally occurring enzymes that catalyze similar or even identical reactions can vary widely: The sequences can be only 50% identical or smaller. Although one family of such enzymes can catalyze each, an essentially identical reaction, other properties of these enzymes can differ significantly. These include physical properties such as stability to temperature and organic solvents, optimal pH, solubility, ability to retain activity when immobilized, ease of expression in different host systems, etc. They also include catalytic properties that include activity (kcat and K, the range of accepted substrates and even the chemical ones carried out.The method described herein can also be applied to non-catalytic proteins (ie ligands such as cytokines) and even nucleic acid sequences (such as promoters that can be inducible by a number of different ligands), when multiple functional dimensions are encoded by a family of homologous sequences.Due to the divergence between enzymes with similar catalytic functions, it is not usually possible to correlate the specific properties with individual amino acids in certain positions, since they simply exist too many differences However, variant files can be prepared from a family of homologous natural sequences by means of family DNA restructuring. These files contain the diversity of the original set of sequences, "in a large number of different combinations." If individuals from the archive are then tested, under a specific set of conditions for a particular property, optimal combinations of sequences from the set can be determined. If the test conditions are then altered in only one parameter, different individuals in the file will be identified as the best performers, because the examination conditions are very similar, most of the amino acids are conserved among the Two sets of best performers: The comparison of the secuncias (for example in silico) of the best enzymes under the two different conditions, identifies the differences in the sequences responsible for the differences in performance The main component of the analysis is a powerful tool used for identification d e sequences that confer a particular property. For example, Partek Incorporated (St. Peters, Missouri; www.partek.com) provides software for the recognition of models (for example, the one provided by the Partek Pro 2000 Pattern Recognition Software) which can be applied to genetic algorithms for multivariate data analysis, interactive visualization, variable selection, ñeural modeling and statistical. The "" relationships can be analyzed, for example by Principal _ Components Analysis (PCA) __ dispersed diagrams and correlated biplanes, Multi-Dimepsional Scaling (MDS) of correlated dispersion diagrams, star diagrams, etc. Once the motifs of the sequence have been identified, the proteins are manipulated eg in any of a variety of ways. For example, the identified changes are optionally entered. deliberately in another round back from the sequence. The sequences that they confer can be combined. -different specific properties. The regions of. Identified sequences of importance for a specific function can be determined for a more complete investigation, for example by complete randomization using degenerate oligonucleotides, for example selected by an in silico process. IDENTIFICATION OF GENTIOUS TAXPAYERS FOR CHEMISTRY PRODUCED BY FAMILY RESTRUCTURING This example provides a method for identifying taxpayers for chimeras produced by family restructuring. The method takes as input the sequences of. genitor genes and the sequences of chimeras and compares each quintera with each genitor. This constructs sequences and graphic maps of each chimera that indicate the source of the genitora. each chimerical fragment. The correlation of this with the functional data allows the identification of the parents that contribute to specific properties and by this. it facilitates the selection of parents for new more focused files which can be elaborated by any of the "methods noted here and analyzed for any desired functional property."
In a simple example, the genes of families 3 and 4 contribute to an activity in, for example, pH 5.5 while the genes of families 1 and 2 are better at pH 10 .. Thus, for a low pH application, the composition to create a file would lean towards 3 and 4, while for a high pH would be appropriate, predominantly a file based on 1 and 2. In this way a GO can be implemented that selects oligonucleotides for the reconstruction of the gene, predominantly of families 3 and 4. Additional details regarding the methods of gene mixing used in restructuring oligonucleotides are found in "OLIGONUCLEOTIDE MEDIATED NUCLEIC ACIDA¬ RECOMBINATION" (Recombination of Nucleic Acid Mediated by Oligonucleotides) by Crameri et al., filed on February 5, 1999 (USSN 60 / 118,813) and filed on June 24, 1999 (USSN 60 / 141,049) and filed on September 28, 1999 (USSN 09 / 408,392). of the gene is similar to the principal component analysis (PCA) for the identification of specific sequence motifs. The analysis of main "components" (PCA) is a mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of non correlated variables called "principal components." The first principal component explains the " variability in? Data as much as possible and each subsequent component "explains the remaining variability as much as possible." Traditionally, the analysis of principal components is carried out in an SSCP matrix (pure sums of square and cross products) square symmetric, a covariance matrix (sums - graduated from square and "crossed" products) or correlation matrix (sums of standard and cross-linked data from standardized data). The results of the analyzes by SSCP and covariance targets are similar. A correlation objective is used when the differences of individual variables differ substantially or the measurement units of the individual variants differ. The objectives of the principal component analysis include, for example, discovering or reducing the dimensionality of a data set, in order to identify new underlying significant variables and the like. The main difference is that the present operation gives information on what genders to use in a mixture (and thus direct the construction of new files based on natural genes), while PCA identifies the specific motifs ~~ and thus is very "adequate" for the more general synthetic restructuring with discrete regions identified being altered either in a directed or
aleatopzada. The Partek software (PCA) discussed above has an "experimental design" component, which identifies the variables that seem to have an effect on a specific function. As applied to the present example, this is
useful in an iterative process in which it is constructed and
Sf examines a family file and analyzes the resulting chimeras for functional correlations with sequence variations. This is used to predict regions of sequences for a particular function and is
selects an in silico file by any desired change of the GO-directed region that correlates with the functional activity. A focused file having diversity in those regions is constructed, for example, by synthetic methods of oligonucleotides as described herein. The members of the resulting files (chimeras) were analyzed for functional correlations with sequence variations. This approach focuses on the search for variation in the sequence space in most of the relevant regions of a protein or other relevant molecule. After the sequences that are active have been deconvolved, the information of the resulting sequences is used to refine the additional predictions for in silico operations, for example, in an approach to the preparation of the neural network. For example, neural network approaches can be coupled to genetic algorithm type programming, for example, NNUGA (Neural Network Using Genetic Algorithms), (Neural Network that uses Genetic Algorithms), is a program available from (http: // www. cs. bgu.ac.? l / ~ omri / NNUGA /) that couples neural networks and genetic algorithms. An introduction to neural networks can be found, for example, in Kevin Gurney (1999) An Introduction to Neural Networks, (UCL Press, 1 Gunpowder Square, London EC4A 3DE, UK and at http: / / www. shef. ac .uk / psychology / gurney / notes / index.html. Additional useful references to the neural network include the - noted above with respect to genetic algorithms and for example, Christopher M. Bishop (1995) Neural Networks for Pattern Recognition (Oxford Neural Networks for Pattern Recognition) Oxford Univ Press; ISBN: 0198538642; Brian D. Ripley, N. L. Hjort (Contributor)
(1995) Pattern Recognition and Neural Networks
(Recognition of Models and Neural Networks) Cambridge Univ.Pr (Short); ISBN: 0521460867. COUPLING THE DESIGN AND RESTRUCTURING OF RATIONAL PROTEINS _ _ A protein design cycle involving the oscillation between theory and experiment has led to recent advances in the design of rational proteins. A reductionist approach, in which protein positions are classified by their local environments, has helped the development of appropriate energy expressions. Protein design programs can be used to construct or modify the proteins with any set of selected design criteria. See for example, http://www.mayo.caltech.edu/; Gordon and Mayo (1999) "Branch-and-erminate: A Combinational Optimization Algorithm for Protein Design" (Branch and Finish: A Combinational Optimization Algorithm for Protein Design) Structure with Folding and Design (Structure with Functional Duality and Design) 7 (9): 1089-1098; Street and Mayo (1999) "Intrinsic ß-sheet Propensities Result from Vander Waals Interactions Between Side Chains and the Local Backbone" (Results of Predispositions to "the intercalation-ß Intrinsic of the interactions of Vander Waals between Side Chains and the Local main element ) Proc. Nati, Acad. Sci. USA 96, -9074-9076, Gordon et al. (1999) "Energy Functions for Protein-Design" (Energy Functions for the Design of Proteins) Current Opinion in Structural Biology (Current Opinion in Structural Biology) 9 (4): 509-513 Street and Mayo (1999) "Computational Protein Design" (Structure with Folding and Design) (Structure with Functional Duality and Design) 7 (5): R105-R109; Strop and Mayo (1999) "Rubredoxin Variant Folds Without Iron" (Folds of Rubredoxin variants without iron) J. Am.
Chem. Soc. 121 (11): 2341-2345; Gordon and Mayo_ (1998) "Radical
Performance Enhancements for Combinational Optimization
Algorithms based on the Dead-End Elimination Theorem "
(Radical Performance Improvements for Combinational Optimization Algorithms based on the Termination Elimination Theorem) J. Comp. Chem 19: 1505-1514; Malakauskas and Mayo (1998) "Design, Structure, and Stability of a Hyperthermophilic Protein Variant" (Design, Structure and - -
Stability of a Hyperthermophilic Protein Variant) Nature Struct. Biol. -5: 470. Street and Mayo_? I99.8jL "Pairwise.Calculation of Protein Solvent-Accessible Surface Areas" (Calculation in Pairs of Surface Areas Accessible by Protein Solvent) Folding & Design (Functional Duality ^ and Design) 3: 253-258. Dahiyat and Mayo (1997) "De ovo Protein Design: Tully Automated Sequence Selection" (Novo Protein Design: Fully Automated Sequence Selection) Science 2-78: 82-87; Dahiyat and Mayo (1997) "Probmg the Role of Packing Specificity in Protein Design"
(Investigation of the Role of the Specificity of
Conditioning in Protein Design) Proc. Nati
Acad. Sci. USA 94: 10172-10177; Dahiyat et al. (1997)
"Automated Design of the Surface Positions of Protein Propellers" (Automated Design of the Surface Positions of the Prolein Propellers) Prot. Sci. 6: 1333-1337;
Dahiyat et al. (1997) "De Novo Protein Design: Towards Fully Automated Sequence Selection" (Design of the Novo Protein: in * Relation to the Selection of Fully Automated Sequences) J. Mol. Biol. 273: 789-796, - and Haney et al.
(1997) "Structural Basis for Thermostability and
Identification of Pbtentia! Active Site Residues for
Adenylate Kinases from the Archaeal Genus Metha ococcus "
(Structural Bases for Thermostability and Identification of Potential Active Site Residues for Adenylate Kinases from the archaic genus Methanococcus) Proteins 28 (1): 117-30. These design methods generally rely on energy expressions to evaluate the quality of the different sequences of 'amino acids for target protein structures. In any case, the designed or modified proteins or the chains of characters that correspond to proteins can be restructured directly in silico or translated in reverse and restructured in silico and / or by physical restructuring. Thus, one aspect of the invention is the coupling of high performance rational design and the restructuring and physical or physical examination of genes to produce activities of interest. Similarly, molecular dynamic simulations such as those mentioned above and for example, Ornstein et al.
(http: // www.emsl.nl.gov: 2080 / homes / tms / bms .html; Curr Opin
Struc Biol (1999) 9 _ (4): 509-13) provides eg "rational" enzymatic redesign by modeling and biomolecular simulation to find new enzymatic forms that otherwise tended to have a low probability of evolving biologically. For example, the rational redesign of cytochrome p450 enzymes and alkane dehalogenase are an objective of current rational design efforts. Any rationally designed protein (eg new p450 homologous proteins or new alkaline dehydrogenases) can be developed by "translation and reverse restructuring against any other designed protein or against related natural homologous enzymes." Details about the p450s can be found in Ortíz de Montellano ( ed.) 1995, Cytochrome P450 Structure and Mechanism and Biochemistry, Second Edition (Structure of X Cytochrome P450 and Mechanism and Biochemistry, Second Edition) Plenum Press (New York and London) The comparison of the crystal structures of the protein to predict the points Crossing on structural bases instead of homology considerations of sequences that can be conducted and crossed can be affected by oligos to direct the chimerization as discussed here MODELING THE MULT1VARIANT PROTEIN SEQUENCE ACTIVITY: OPTIMIZATION OF ENZYMATIC ACTIVITY THROUGH STAY RATIONAL STICKS. This section describes how to analyze a large number of related protein sequences using modern statistical tools and how to derive new protein sequences with desirable characteristics using rational and multivariate statistical analyzes. Background, Multi-Variable Data Analysis _ Multi-variable data analyzes and experimental design are widely applied in industry, government and research centers. This is typically used for things like the formulation of gasoline or the optimization of a chemical process. In the classic example of the gasoline formulation, they can exist. more than 25 different additives that can be added in different quantities "and in different combinations.The manufacturing of the final product is also multifactoria
(level of energy, degree of contamination, stability, etc. etc.). By using the experimental design, a limited number of test formulations can be made, wherein the presence and amounts of all the additives are altered in a non-random manner in order to fully explore the relevant "formulation space". Appropriate measurements of the different formulations are subsequently analyzed. By plotting the data points in a multi-variant (multidimensional) form, the formulation space can be predicted graphically and the ideal combination of additives can be extracted. One of the statistical tools most commonly used for this type of analysis is the Principal Component Analysis (PCA). In this example, this type of matrix is used to correlate each muitidimensional data point with a specific output vector in order to identify the relationship between a matrix of dependent variables Y, and a matrix of predictor variables X. A common analytical tool for this type of analysis, it is Partial Least Square Projections to Latent Structures (PLS). This, for example, is frequently used in investor bankers' analysis of stock price fluctuations or material science predictions about the properties of new compounds. Each data point can consist of hundreds of different parameters that are diagramed against each other in a n-dimensional hyperspace (one dimension for each parameter). The manipulations are done in a computer system, which adds any number of dimensions that are necessary to be able to handle the data to be processed. There are the previously mentioned methods (PCA, PLS and others) that can help in the projections and plans of the discoveries so that hyperspace can be analyzed in an appropriate way. Antecedents of the Analysis of Sequences - -
The analysis of nucleotide or amino acid sequences has traditionally focused on the recognition of qualitative models (for example, the classification of sequences). This mainly involves the identification of sequences based on similarity. This works well for predictions or classification identifications, but it does not always correlate with quantitative values. For example, a transcriptional matching promoter may not be a good promoter in a particular application, but it is instead the average promoter between an aligned group of related sequences. To access the quantitative characteristics of the related biological sequences (DNA / ARC or amino acids) one can analyze the systematic variations (ie the systematic absence of similarity) between the sequences aligned with related biological activity. When applying different tools for multiple variable analysis
(such as PLS) to the protein sequences, it is possible to predict a sequence that generates better catalytic activity than the best one present in the analyzed group. The experimental data showing the success of the general method are described below. Background of the Multiple Variable Analysis of the Promoter Activity ~ One of the few references where the analysis of data from multiple variables has been applied to the biological sequences focused on analyzing a set of defined transcriptional promoters in order to see if one could predict a promoter stronger than any found in the preparation set (Jonsson et al. (1993) Nucleic Acids Res. 21: 733-739). In this example, the promoter sequences were parameterized. For simplicity, the physico-chemical differences between the respective nucleotides (A, C, G, T) were selected as equal, ie the non-nucleotides were considered more closely related to any other nucleotide. They were represented as four corners of a diametrically opposed cube that form a perfect tetrahedron. By assigning an origin to the center, each corner can be represented by a numerical coordinate, which reflects the numerical representation. Since the descriptors only represent distribution of equal properties, any nucleoside can be placed in any corner. See Figure 17. Previous studies of (Brunner and Buj ard (1987) EMBO J. 6; 3139-3144; Knaus and Buj ard (1988) EMBO J. 7: 2919-2923; Lanzer and Buj ard (1988) Proc. Nati. Acad. Sci. 85: 8973-8977J analyzed a set of 28 promoters that have been studied in detail in the identical context. "The promoters included E. coli promoters, T5 phage promoters and a number of chimeric and synthetic promoters. All of them were cloned as 68 bases (-49 to +19) inserted in the front part of a DHFR coding region.The relative transcriptional level was measured by dot-blot using the derived lactamase-β gene vector as an internal standard. this example, when the 28 promoters, each with 68 nucleotides, were parameterized with "the three descriptors as defined in Figure 14, the result is a 28x204 matrix (28 promoters x 68 nucleotides x 3 parameters (spherical volume, hydrophobicity and psalizability)). The unique sequence of each pro Motor can be represented as a single point in a 204-dimensional hyperspace. This compilation of 28 promoters in this way formed a group of 28 points in this space. The experimental data from the previous studies previously recorded were repeated and the generated transcription levels were plotted against the group of 28 points generated in the 204 hyperspace using PLS. Half of the promoters (14) were subsequently used to build a statistical model and the other half to test it. This was shown.
- - to generate a good correlation for the X resistance calculated promoter vs. the observed. Two new promoters were construed based on extrapolations from the generated model and in both cases they were shown to be significantly better than the best promoter present among the 28 initial promoters. Analysis of Multiple Variables and Protein Sequence The same analytical methods can be applied to protein sequences. Signal peptides have been characterized using multiple variable analysis showing good correlation between location in hyperspace and final physical location (Sjóstr? M et al. (1987) EMBO J. 6: 823-831). The main difference between. nucleotides and amino acids is that instead of qualitative descriptors of nucleotides (see Figure 14), quantitative descriptors have to be used to parameterize amino acids. The relevant characteristics of the amino acids (spherical volume, hydrophobicity and polarizability) have been determined and can be extracted from the literature (Hellberg et al. (1986) Acta Chem.
Scand. B40: 135-140; Jonsson et al., (1989) Quant. Struct. - Act. Relat. 8: 204-209). After the restructuring and characterization of the restructured proteins, the mutated proteins were analyzed, both those that are "better" and those that are "worse" than the initial sequences. Statistical tools (such as PLS) can be used to extrapolate the "new sequences that are probably better than the best sequence present in the analyzed set." MODELING THE SPACE OF THE P ROTEIN SEQUENCE As described above, any protein encoded by a DNA sequence can be plotted as a precise point in a multidimensional space using statistical tools. A "normal" Ikb gene can be constituted, for example, of approximately 330 amino acids. Each amino acid can be described, for example, by the three major quantitative physicochemical descriptors (spherical volume, hydrophobicity and polarizability) for each amino acid (other descriptors for proteins are largely dependent on the three major descriptors). See also Jonsson et al., (1989) Quant. Struct. -Act. Relat. 8: 204-209. Thus, a 1 kb gene is modeled at 330 (number of amino acids) X 20 (possible amino acids at each position) X 3
(the three main descriptors previously noted, for each amino acid) = 19,800 dimensions. Due to the extended nature of the sequence space, a number of restructured sequences are used to validate predictions related to sequence activity. The closer in space the "surrounding sequences" (percentage of similarity) are, the greater the probability that the predictive value can be extracted.Alternatively, the more sequence space is analyzed, the predictions become more accurate. This modeling strategy can be applied to any available sequence.The prognostic value of diagramming a large number of restructured sequences was described above.Two additional approaches can also be used.First, it can be done to plot as many chimeric progeny in a file as possible vs. an enzymatic activity using, for example, PLS (partial quadratic partial projections for latent structures) If enough data are available, the activity activity diagram forms a function that can be extrapolated out of the experimental data to produce an in silico sequence that corresponds to an activity greater than the best preparation set. Second, all related sequences can be diagrammed and certain sequences grouped with related activities given.Using this matrix, subsequent genes can be immediately grouped with the appropriate activity or can be examined directly related new activities for a subset of the sequenced genes. General for the above strategy is the availability of sufficient gene related sequences through restructuring to provide useful information. An alternative to restructuring sequences is to apply the modeling tools to all available sequences, for example, the database of GenBank and other public sources, although this causes massive computational energy, current technologies make the approach possible, representing all available sequences provides an indication of regions of interest in the sequence space. Information can be used as a filter that is applied to in silico restructuring events to determine which virtual progeny are preferred candidates for the physical implementation (eg, synthesis _ and / or recombination as noted here). RESTRUCTURING CLADISTIC INTERMEDIARIES The present invention provides for the restructuring of "evolutionary intermediaries". In the context of the present invention, the evolutionary intermediaries are artificial constructions that are intermediaries in character between two or more sequences - - homologous, for example, when the sequences are grouped in an evolutionary dendogram. Nucleic acids are often classified as evolutionary dendograms (or "trees") that show evolutionary branching points and optionally relatedness. For example, cladistic analysis is a classification method in which organisms or characteristics (including nucleic acid or polypeptide sequences) are sorted and classified on the basis that they reflect the origin of a postulated common ancestor (an intermediary from the characteristics or divergent organisms). Cladistic analysis is mainly related to the branching of connecting trees (or "dendograms") that show the connection, although the degree of difference can also be assessed (sometimes a distinction is made between evolutionary taxomomists who consider degrees of difference and those who they simply determine branch points in an evolutionary dendrogram (classical cladistic analysis), - however, for purposes of the present invention, connection trees produced by any method can produce evolutionary intermediates). Cladistics or other evolutionary intermediates can be determined by selecting nucleic acids that are intermediates in the sequence between two or more existing nucleic acids. Although the sequence may not exist in nature, it still represents a sequence that is similar to a sequence in nature, which has been selected, that is, an intermediary of two or more sequences represents a sequence similar to the common postulated ancestor of the two or more existing nucleic acids. So, the. Evolutionary intermediaries are a preferred restructuring substrate, since they represent "pseudo-selected" sequences, which are more likely to be active than randomly selected sequences. One benefit of using evolutionary intermediates as substrates for restructuring (or of using oligonucleotides corresponding to such sequences) is that considerable sequence diversity can be represented in a few initial substrates (ie, if starting with genitors A and B, a only intermediary "C" has at least a partial representation of both A and B). This simplifies the scheme of oligonucleotide synthesis for gene reconstruction / recombination methods, improving the efficiency of the procedure. Additional research sequence databases with evolutionary intermediates increase the opportunity to identify related nucleic acids using standard research programs such as BLAST.
Intermediate sequences can also be selected from two or more synthetic sequences that are not represented in nature, simply by starting from two synthetic sequences. Such synthetic sequences can include evolutionary intermediates, proposed gene sequences, or other sequences of interest that are related by sequence. These "artificial intermediaries" are also useful to reduce the complexity of gene reconstruction methods and to improve the search capacity of evolutionary databases. According to the above, in a "significant embodiment of the invention, strings of characters representing evolutionary or artificial intermediates are first determined using sequence alignment and alignment software and then synthesized using oligonucleotide reconstruction methods. , intermediaries can form the basis for the selection of the oligonucleotides used in the methods of reconstructing genes of the present.Several of the following sections describe implementations of this approach using hidden Markov model threading (threading Markov model hidden) and other approaches. Restructuring In Silico Using Hidden Markov Model Threading In relation, -with synthetic restructuring it is assumed that each amino acid present between the genitors is an independent entity and adds by itself (or does not add) function in one dimension func When restructuring using DNase I-based methods, this problem is avoided because recombination occurs during the meeting as 20-200 bp fragments and thus, each amino acid exists in its evolutionary context among other amino acids that have ., evolved in a given direction due to the selective pressure of the functional unit (gene or - promoter or other biological entity). By capturing the co-variation that normally exists within a family of genes, whether wild type or generated through regular restructuring, a significant number of biologically inactive progeny are avoided, improving the quality of the generated file. Artificially generated offspring may be inactive due to structural, modular or other subtleties between active parents and progeny. One way to remove unwanted progeny without co-variation is to apply a statistical profile such as the Hidden Markov Model (HMM) in the genitoral sequences. A generated HMM matrix (eg, as in Figure 15) can capture the complete variation between the family as probabilities among all possible states (ie, all possible combinations of amino acids, deletions and insertions). The matrix that results from the analyzed family is used to investigate classified databases for additional members of the family that are not sufficiently similar to be identified by standard BLAST algorithms of any particular sequence, but that are sufficiently similar to be identified when tested using a probabilistic distribution model based on the original family. The HMM matrix shown in Figure 5 exemplifies a family of 8 amino acid peptides. In each position, the peptide can be a specific amino acid
(one of the 20 present in the boxes), an insertion (diamonds) or a suppression (circles). The likelihood that each one will occur depends on how often they occur among the compiled parents. Any given generator can be subsequently "threaded" through the profile in such a way that all given allowed trajectories are a probability factor. HMM can also be used in other ways. Instead of applying the generated profile to identify previously unidentified family members, the HMM profile can be used as a model to generate members - of the de novo family (for example, the intermediary members of a cladistic nucleic acid tree). By. For example, the HMMER program (http://hmmer.wustl.edu/) is available. This program builds an HMM profile on a defined set of family members. A sub-program, HMMEMIT, reads the profile and builds de novo sequences based on that. The original purpose of HMMEM? T is to generate positive controls for the research model, but the program can be adapted to the present invention by using the output as a progeny generated in silico from a restructuring defined by the HMM profile. According to the present invention, the oligonucleotides corresponding to these nucleic acids are generated by recombination, gene reconstruction and examination. Since the context of the sequence of each position is explained in a probabilistic manner, the number of non-active progeny is significantly lower than that of a restructuring reaction that simply randomly selects such a progeny. The crossing between genetic modules (structural, functional or non-defined) occurs when they occur in nature (that is, between the genitors) and the co-evolution of point mutations or structural elements is retained throughout the restructuring process.
Example of Algorithm for Generating Sequence Intermediates from Sequence Alignments The following is a configuration of a program to sequence sequence intermediates of the related genic acid nucleic alignments. Given a sequence alignment that encodes the parents, and an alignment that codes the descendants for each descendant sequence for each genitora sequence for each window if the genitora sequence and descendant sequence are coupled for this window If this window is no longer found covered by the sequences in the list of segments Try to expand the window, 5 'and 3' to too many inequalities Add the expanded end segment, to the list of segments for this sequence for_ each descendant sequence to establish the position at the beginning of the sequence the following until the end of the descendant sequence is reached Find through the segment finds, a segment that extends the longest from a point before the position (this is the most similar to the genitor segment) If it is one, add it to the optimal path list and set the position to the end of the segment If one is not found, increase the current position by displaying segments from the list of optimal trajectories. NORMALIZATION OF FILES-USES OF DATA OF POSITIVE OR NEGATIVE ACTIVITY One aspect of the present invention is to use positive or negative data in the design of sequences and selection methods, whether in silico, or in physical processing stages, or both. The use of positive or negative data can be in the context of a learning heuristic, a neural network or simply by using positive or negative data to provide physical or logical filters in the processes of file synthesis and design. Learning networks are described above and provide a convenient way to use positive or negative data to increase the opportunities in which additional sequences that are subsequently generated will have a desired activity. The ability to use negative data to reduce the size of files to be examined provides a considerable advantage, since screening is often a limiting step in the generation of improved genes and proteins by forced evolution methods. Similarly, the use of positive data to derive files to sequences of interest is another way of concentrating files. For example, as noted, in addition to a neural network learning procedure, positive or negative data may be used to provide a physical or logical "filter" for any system of interest. That is, the sequences shown, which are inactive, provide useful information about the likelihood that closely related sequences will also prove to be inactive, particularly when the active sequences are also identified. Similarly, active sequences provide useful information about the likelihood that closely related sequences will also prove to be active, particularly when inactive sequences are also identified. These active or inactive sequences can be used to provide a physical or virtual filter to derive files (physical or virtual) towards the production of - - more - active members. For example, when negative data is used, physical subtraction methods use hybridization for inactive members under conditions of selected rigor (often high stringency, since many files produced by the methods of the present, comprise homologous members) to remove similar nucleic acids from the files that are generated. Similarly, hybridization rules or other parameters can be used to select against members that are probably similar to inactive sequences. For example, oligonucleotides used in gene reconstruction methods can be derived against sequences that have been shown to be inactive. In this way, in certain methods the files or strings of characters are filtered by subtracting the file or set of strings with the members of an initial file of biological polymers that display activity below a desired threshold. When using positive data, physical enrichment methods utilize hybridization for active members under selected conditions of rigor (often of high stringency, since many archives produced by the methods hereof, comprise homologous members) to isolate the similar nucleic acids that comprise the members of the files to be produced. Similarly, the hybridization rules or other parameters can be used to select the members that are probably similar to the active sequences. For example, the oligonucleotides used in the methods of gene reconstruction can be derived to the sequences, which have been shown to be inactive. In this way, in certain methods the files or strings of characters are filtered by deriving the file or set of strings with the members of an initial file of biological polymers that display activity above a desired threshold. Similarly, in silico procedures can be used to produce files of inactive sequences, instead of active sequences. In this way, inactive sequences can be restructured in silico to produce files of information that are less likely to be active. These inactive sequences can be physically generated and used to subtract files (typically through hybridization for file members) generated by other methods. This subtraction reduces the file size to be examined, mainly through the removal of members who are likely to be inactive.
Example-Filtration of Motifs _ _ _
Selections or tests often produce too many "positive" cues to sequence. effectively cost all positive or negative events. However, if it is identified that the motifs of sequence are enriched or depopulated in any of the positive or negative events, then this derivation is used in the construction of synthetic files that are derived towards the "good" motifs and are derived from of the "bad" motifs. If each contiguous region or motif selected (for example, the selected window can be, for example, a region of 20 bases) is considered as a separate gene or gene element, one that can measure the change in the frequencies of genes before and after. after the selection or examination. The motifs that increase in frequency in positive events are characterized as "good", and the motifs whose frequency is reduced in positive events are characterized as "bad". The second generation files are synthesized, in which the file is selected to be enriched by good motifs and depopulated from bad motifs using any filtering or learning process as established herein. A variety of methods are available to measure the frequencies-Xe motifs-in gene populations. For example, one can hybridize analyte sequences for a gene chip or other array of nucleic acids with the motifs of interest encoded in a separately targetable manner, for example using gene chip as provided by Affymetrix (Santa Clara, CA) , or another gene chip spider. Similarly, hybridization for membranes containing spatially steerable motifs and measurement relative to the signal intersites for testing before and, after selection, can also be carried out in an "essentially similar manner, for example using Souther methods. or Nothern blot.The relative proportions of desirable / non-desirable identity characteristics in the chips also provide an indication of the total quality of the file.Further, the display of the phage or other expression files can be used to assess the characteristics. from the archive, that is, through the evaluation of expression products. " - ~ '~~ Alternatively, real-time quantitative PCR_ (eg TaqMan) can be carried out, where the PCR oligos are highly discriminated for the characteristics of interest. This can be done, for example, by having a single polymorphism for the motif present or close to the 3 'end of an oligo, such that it will only prepare the PCR efficiently if a perfect comparison exists. The analysis of the real-time PCR product, for example FRET or TaqMan (and the related real-time reverse transcription PCR) is a family of known techniques for monitoring real-time PCR that has been used in a variety of contexts ( see, Laureandeau et al. (1999) "TaqMan PCR-based Genetic Analysis Assay for Predictive Testing in Individuals with Cancer Family with INK4 locus haploinsufficiency" (Test of the gene dose based on PCT TaqMan, for the predictive test in individuals from a family of cancer with -haploinsufficiency of INK4 locus) Clin Chem 45 (7): 982-6; Laureandeau et al. (1999) "Quantitation of MYC gene expression in sporadic breast tumors with a real-time reverse transcription -PCR assay "" (Quantification of the expression of the JXTYC gene in sporadic breast tumors with a real-time reverse transcription-PCR assay) Clin Chem 59 (12): 2759-65 and Kreuzer et al. (1999) " LightCicler Technology for the - quantitation or f bcr / abl fusion transcripts "(LightCycler technology for the quantification of bcr / abl fusion transcripts) Cancer Research 59 (13): 3171-4. If the family of genes of interest is highly similar, start with (for example, more than 90% sequence identity), then one can simply sequence the gene population before and after selection. If several sequence primers are used above and below the gene, then one can observe the sequences in parallel in a sequence gel. Sequence polymorphisms near the primer can be read to observe the relative proportion of the bases at any given site. For example, if populations start-with 50% T and 50% C at a given position, but 90% T and 10% after selection, one-it could easily quantify this base-to-part ratio of a series of sequences that originate near the. polymorphism. This method is limited, since as one gets more from the initiator and reads through more polymorphisms, the mobilities of the different sequences get the variable and the traces to start working together. However, as the cost of sequencing continues to decline and the cost of oligos continues to decline, one solution is to simply sequence with many different oligos up and down the gene. Example: Fractional Distillation of Sequence Space _ __ _ Typical sequence spaces are very large compared to the number of sequences that can be cloned and physically characterized. There are computational tools with which to describe subsets of sequence spaces that are predicted to be enriched by those with properties of interest (see, supra). However, there are presumptions and computational limitations inherent in these models. The methods for the fragmentation of the sequence space in such a way that it is enriched by molecules that are predicted by a given model, to have greater or lesser suitability with respect to a phenotype of interest, would be useful for "testing such predictive models. A simple example of how this would work is as follows: There are approximately 1027 possible restructured IFNs (a protein completely typical in terms of size) based on the family of naturally occurring human IFN genes. can easily be examined If one's goal is to develop the restructured human IFNs that are active, for example in mouse ce then one can use literature information showing that residues 121 and 125 from human IFN alpha 1 confer improved activity when transplanted into other human IFNs such as IFN-a2a If one assumes that this motif confers activity improved in many different contexts, then one can create a large group. of restructured IFN genes (typically in the order of 109-1012), converting them to ssDNA, passing them through an affinity column consisting of an oligonucleotide complementary to "human IFN alfalpha on these residues, washing" under appropriate rigor, eluting the bound molecules, extend the eluted genes by PCR, clone the material and perform functional tests on the expressed genes. "This procedure allows one to physically derive a file of strongly restructured genes in favor of containing this motif that is predicted by this very simple model to confer an improvement in the desired activity. Enriched by the motif and populations that depopulated with the motif, both populations are analyzed (ie, 1000 cgs of each population) .The hypothesis is then "tested", questioning whether this fragmentation of sequence space derived the average suitability in the form If you did, then you can "accept" the hypothesis and ascend to the examination of that file X You can also try a number of affinity design algorithms based on fragmentation, accept those that are supported by the results of the experiment "and after" perform the selections by affinity in series so that one is enriched by events that meet the criterion of - - di Multiple algorithms In this model, the restructuring, such as the restructuring of families, is used as the first order design algorithm. However, additional design algorithms are integrated downstream of the restructuring to additionally fractionate a sequence space based on simple design heuristics. The method can be performed at the nucleic acid level with any design algorithm that can be translated into a nucleic acid selection scheme. A number of variations in this example are useful for reducing the size of files that are produced by physical or virtual filtering processes. For example, the affinity choice of oligonucleotides encoding motifs of interest prior to gene recombination / resynthesis (either physically or in silico) reduces the diversity of nucleic acid populations that occur in gene recombination / resynthesis methods such as It was noted in the present. Similarly, oligonucleotides encoding motifs can be selected by enzymatically degrading molecules that do not match perfectly with the oligos, eg, again prior to the de-recombination / resynthesis methods of the gene. Alternatively, genes that imperfectly match the oligonucleotides can be selected for example, by binding to mutS or other "DNA mismatch repair proteins." Polymerization events during the recombination / synthesis procedures of the gene can be prepared using one or more oligos that encode the motif (s) of interest, ie, the inequalities at or near the 3 'end of the hybridized nucleic acids reduce or block the elongation, in this variant, only the newly polymerized molecules are allowed to survive.
(used in the subsequent construction / file selection stages). This can be done, for example, by preparing the reverse translation of RNA and then degrading the RNA. Another approach is to make the model specifically degradable. For example, DNA with a high frequency of uracil incorporation can be synthesized. The polymerase-based synthesis is prepared with oligos and extended with dNTPs that do not contain uracil. The resulting products are treated with uracil glycosylase and a nuclease that separates at apurinic sites and removes the degraded model. Similarly, RNA nucleotides can be incorporated into DNA strands (synthetically or through -enzymatic incorporation); these nucleotides then serve as targets for cleavage through RNA endonucleases. A variety of other divisible residues are known, including certain residues that are targets for enzymes or other residues and serve as split points in response to light, heat or similar. When polymerases are not currently available with activity that allows the incorporation of a desired cleavage target, such polymerases can be produced using restructuring methods to modify the activity of existing polymerases, or to acquire new polymerase activities. Localized motifs can easily be translated into affinity selection procedures. However, one sometimes wishes to impose the rule that molecules have multiple sequence characteristics that are separated in space in the gene (eg, sequence characteristics 2, 3, 4, 5, 6, etc.). This can be synchronized in a selection by elaborating a nucleic acid model containing all the motifs of interest separated by a flexible linker program. The Tm for molecules that have all motifs, is greater than for molecules that have only one or two of the motifs. This is, therefore, possible to enrich the molecules that have all the motifs by selecting the molecules with high Tms for the selection oligo (s). A "gene" of the many such motifs extended together can be synthesized separately by flexible linker programs or by bases such as inosine which can base the base promiscuously. One would then select the genes with high Tms for the selection of nucleic acid. The careful design of the nucleic acid model selection allows one to enrich genes that have a large number of sequence motifs that are predicted to derive genes that contain them to have a phenotype of interest. If there is little information about whether a given motive is predicted to favorably derive the file, the technique can still be used. A set of motifs is defined, for example, on the basis of sequence conservation between different homologs or motifs can be randomly selected motifs. Since the sequence space is not isotropic (equally dense with good members in all directions), then one can simply fractionate the sequence space based on a set of designed or randomized motifs, measure the average suitability of the elements in the region of sequence space of interest and then explore more excessively in regions that give the highest suitability. In addition to the simple sequence alignment methods, there are more sophisticated approaches available to identify regions of interest such as macromolecular binding sites. For example, U.S. Patent 5,867,402 to Schneider, et al. (1999) "COMPUTATIONAL ANALYSIS OF NUCLEIC ACID INFORMATION DEFINES BTNDING SITES" (THE COMPUTATIONAL ANALYSIS OF NUCLEIC ACID INFORMATION DEFINES SITES OF UNION) proposes methods in which the union sites are defined based on the content of individual information of a site particular of interest. Substitutions within the sequences of the binding site can be analyzed to determine whether the substitution causes a deleterious mutation or a benign polymorphism. Methods of identifying new binding sites that use the content of individual information are also proposed. This approach can be used in the context of the present invention as a way to identify sequences of interest by manipulation of the in silico sequences. REPRODUCTION OF MOTIFS A rational design can be used to produce desired motifs in sequences or spaces of sequences of interest. However, this is often difficult to predict if a given designed motif will be expressed in a functional form or if its presence will affect another property of interest. An example of this is the process of designing glycosylation sites on proteins in such a way that they are accessible to the cell glycosylation machinery and in such a way that they do not adversely affect other properties of the protein such as blocking the binding to another protein in virtue of spherical obstruction by - the groups of bound polysaccharides. One way to address these problems is to design motifs or multiple variations of motifs at multiple candidate sites within the target gene. The sequence space is then examined or selected for the phenotype (s) of interest. Molecules that meet the design criteria threshold specified, they are restructured together, recursively, to optimize the properties of interest. The motifs can be built in any gene. Exemplary protein motifs include: N-linked glycosylation sites (i.e. Asn-X-Ser), 0 -unylated glycosylation sites (i.e. Ser or Thr), protease sensitive sites (i.e. cleavage by collagenase after X in PXGP) sites for Rho-dependent transcriptional termination bacteria, RNA secondary structure elements that affect translation efficiency, transcriptional enhancer elements, transcriptional promoter elements, transcriptional damping motifs, etc. RATIONAL HIGH PERFORMANCE DESIGN _ In addition to, or in conjunction with, the rational design approaches described above, high-throughput rational design methods are also useful. In particular, high throughput rational design methods can be used to modify any given sequence in-silica, for example, before recombination / synthesis. For example, Protein Design Automation (PDA) is a computationally driven system for the design and optimization of proteins and peptides, as well as for the design of proteins and peptides. PDA typically starts with a major protein structure and designs the amino acid sequence to modify the properties of the protein, while maintaining its three-dimensional functional duality properties. Large numbers of sequences can be manipulated using PDA, allowing the design of protein structures
(sequences, subsequences, etc.). PDA is described in several publications that include for example, Malakauskas and Mayo
(199B) "Design, Structure and Stability of a
Hyperthermophilic Protein Variant "(Design, Structure and -
Stability of a variant of the Hyperthermophilic Protein) Nature Struc. Biol. 5: 470; Dahiyat and Mayo (1997.1"De Novo, Protein Design: Fully Automated Sequence Selection" (De Novo Protein Design: Fully Automated Sequence Selection) Science, 278, 82-87, DeGrado, (1997) "Proteins from Scratch" ( Proteins from nothing) Science, 278: _ 80-81; Dahiyat, Sarisky and Mayo (1997) "From Novo Protein Designation: Towards Fully Automated Sequence Selection" (Novo Protein Design: Towards Selection of Sequence Completely Automated) J. Mol. Biol. 273: 789-796; Dahiyat. And May (1997) "Probing the Role of Packing Specificity iñ Protein Design" (Proof of the Specificity of Conditioning in Protein Design) Proc. Nati Acad Sci. USA, 94: 10172-10177; Hellinga (1997) "Rational Protein Design - Combining Theory and Experiment." (Rational Protein Design - Combination of Theory and Experiment) Proc Nati. Acad. Sci. USA, 94: 10015-10017; Su and Mayo (1997) "" Coupling Backbone Flexibility and Amino Acid Sequence Selectio n in Protein Design "(Coupling of the Main Flexibility and the Selection of Amino Acid Sequences in the Design of Proteins) Prot. Sci. 6: 1701-1707; Dahiyat Gordon and Mayo (1997) "Automated Design of the Surface Positions of Protein Propellers" (Automated Design of the Surface Positions of the Helices of the Proteins) Prof. Sci., 6: 1333-1337; Dahiyat and Mayo (1996) "Protein Desígn Automation" (Automation of Protein Design) Prot. Sci. , 5: 895-9031 Additional details regarding PDA are available at http // ww. xencor. com/ . In the context of the present invention, PDAs and other design methods can be used to modify in silico sequences, which can be synthesized / recombined in restructuring procedures as set forth herein. Similarly, PDAs and other design methods can be used to manipulate nucleic acid sequences derived from the following selection methods. Thus, recursive methods of design can be used in recursive restructuring processes. NON-DEPENDENT RESTRUCTURING IN "SILK" DETERMINATION As discussed herein, many of the methods of the invention involve the generation of diversity in the sequences of in silico sequences, followed by the recombination / synthesis methods of the gene by oligonucleotides. However, non-oligonucleotide-based recombination methods are also suitable, For example, instead of generating oligonucleotides, complete genes can be made corresponding to any diversity created in -silic, without the use of oligonucleotide intermediates.
-
This is particularly feasible when the genes are short enough that direct synthesis is possible. In addition, it is possible to generate peptide sequences directly from various populations of character strings, instead of going through oligonucleotide intermediates. For example, synthesis of peptides in solid phase can be performed. For example, arrays of solid phase peptides can be constructed by standard solid phase peptide synthesis methods, with the members of the arrays being selected to correspond to the chains of sequences generated in silico. In this regard, solid phase synthesis of biological polymers, including peptides, has been carried out, from at least the first peptide synthesis methods in "Merrifield" solid peptides described for example in Merrifield (1963) J. Am. Chem. Soc. 85: 2149-2 £ 54 (1963). Solid phase synthesis techniques are available for the synthesis of several peptide sequences., for example, a number of "tips", see for example Geysen et al. (1987) J. Immun. Meth. 102 ^ 259-274, incorporated herein by reference for all purposes. Other solid phase techniques include, for example, the synthesis of several peptide sequences on different cellulose discs supported on a column. See, Frank and Doring (1988) Tetrahedrpn 44: 6031 ^ 6040 ^. Still other solid phase techniques are described in U.S. Patent No. 4,728,502"issued to Hamill and WO 90/00626. Methods for forming large arrays of peptides are also available, e.g., Pirrung et al., U.S. Pat. No. 5,143,854 and Fodor et al., PCT Publication No. WO 92/10092 describe methods for forming arrays of peptides and other polymer sequences using, for example, light-directed synthesis techniques. See also, Stewart and Young, Solid Phase Peptide. Synthesis (Synthesis of Peptides in Solid Phase), 2d ed., Pierce Chemical Co. (1984); Atherton et al. (1985) Solid Phase Peptide Synthesis (Synthesis of Solid Phase Peptides), IRL Press, Greene, et al. (1991) Protective Groups In Organic Chemistry (Protective Groups. In Organic Chemistry), 2nd. Ed., John Wiley & Sons, New York, NY and Bodanzszyky (1993) Principies of Peptide Synthesis (second edition of Springer Verlag, Inc. NY. Other useful information regarding proteins is found in R. Scopes, Protein Purification (Springer-Verlag, N.Y. (1982); Deutscher, Methods in Enzymology Vol. 182: Guide to Protein Purification, (Methods in Enzymology, Vol. 182: Guide for Protein Purification) Academic Press, Inc. N.Y. (1990); Sandana (1997) Bioseparation of Proteins, -
(Bioseparation of Proteins), Academic Press, Inc .; Bollag et al. (1996) Protein Methods, 2nd Edition (Methods for Protein, 2nd Edition) Wiley-Liss, NY; Waiker (1996) The Protein Protocols Handbook (Manual of Procedures for "Proteins") Humana Press, NJ, Harris and Angal (1990) Protein Purification Applications: A Practical Approach (Applications of Protein Purification: A Practical Approach) IRL " Press in Oxford, Oxford, England; Harris and Angal Protein Purification Methods: A Practical Approach (Methods of Protein Purification: A Practical Approach) IRL Press in Oxford, Oxford, England; Scopes (1993) Profein Purificaction: Principles and Practice 3rd Edition (Purification of Protein: Principles and Practice 3rd Edition) Springer Verlag, NY; Janson and Ryden (1998) Protein Purification: Principies, High Resolution Methods and Applications, Second Edition (Purification of Protein: Principles, High Resolution Methods and Applications, Second Edition) Wiley-VCH, NY; and Waiker (1998) Protein Protocols on CD-ROM ^ Protein Protocols on CD-ROM) Human. Press, NJ; and the references cited therein. In addition to proteins and nucleic acids, it should be appreciated that the diversity of chains of characters generated in silico may correspond to other biopolymers. For example, the character strings may correspond to peptide nucleic acids (PNAs) which can be synthesized according to available techniques and examined for the activity in any appropriate assay. See, for example, Peter E. Nielsen and Michael Egholm (eds) (1999) Peptide Nucleic Acids: Protocols and Applications (Nucleic Acids of Peptides: Procedures and Applications) ISBN 1-8984S6-16-6 Horizon Scientífic Press, Wymondham, Norfolk , UK for an introduction to the synthesis of PNA and the examination of the activity. SELECTION BY ESSAYS - PHYSICS _ _. The evolution directed by GAGGS, as in the DNA restructuring, or classical strain improvement, or any functional genomic technology, can use any physical assay known in the art to detect polynucleotides encoding desired phenotypes. Synthetic genes are -transable in conventional cloning and expression procedures; in this way the properties of the genes and proteins that they encode can be easily examined after their expression in the cell. Synthetic genes can also be used to generate polypeptide products by in-vitro transcription and translation (cell free). Polynucleotides and polypeptides can thus be examined for their ability to bind a variety of predetermined ligands, small molecules and ions, or polymeric and heteropolimeric substances, including other proteins and polypeptide epitopes, as well as microbial cell walls, viral particles, surfaces and membranes. For example, many physical methods can be used to detect polynucleotides encoding phenotypes associated with the catalysis of chemical reactions, either by polynucleotides directly or by encoded polypeptides. For purposes of illustration only and depending on specific predetermined particular chemical reactions of interest, these methods may include a multitude of techniques well known in the art that explain a physical difference between substrate (s) and product (s) or changes in the medium of reaction associated with the chemical reaction (for example changes in electromagnetic emissions, adsorption, dissipation and fuorescence, either UV, visible or inf red (heat)). These methods can also be selected from any combination of following Xos: Mass spectrometry; nuclear magnetic resonance; Isotopically labeled materials; Partitioning and spectral methods that explain the distribution of isotopes or the formation of labeled products; spectral and chemical methods to detect the accompanying changes in the ion or elemental compositions of the reaction product (s) (including changes in pH, inorganic and organic ions, and the like). Other methods of physical testing suitable for use in GAGGS may be based on the use of specific biodetectors for the reaction product (s), including those that comprise antibodies with reporter properties, or those based on affinity recognition. in vivo coupled with the expression and activity of a reporter gene. Enzyme-linked assays for the detection of reaction products and the life-death development selections of the cells in vivo can also be used where appropriate. With respect to the specific nature of the physical assays, they are all used to select a desired property or a combination of desired properties, encoded by the polynucleotides generated by GAGGS. The polynucleotides that are found to have the desired properties are selected. well of the archives. The methods of the invention optionally include selection and / or screening steps to select nucleic acids having desirable characteristics. The relevant test used for the selection will depend on the application. Many assays are known for proteins, receptors, ligands and the like. The formats include binding to the immobilized components, cell production or organismal viability of the reporter compositions and the like. In high-throughput trials, it is possible to examine several hundred different restructured varieties in a single, day. For example, each orifice of a microtitre plate can be used to run a separate test, or, if the effects of concentration or incubation time are to be observed, every 5 ^ 10 holes can test a single variant ( for example, at different concentrations). In this way, a single standard microtiter plate can assay approximately 100. • (for example 96) reactions. If 1536-well plates are used, then a single plate can easily test from about -10-0- to about 1500 different reactions. It is possible to test several different plates per day; The assays examine up to about 6,000-20,000 different assays (ie including different nucleic acids, encoded proteins, concentrations, etc.) is possible using the integrated systems of the invention. More recently, microfluidic procedures for the handling of reagents have been developed, for example by Caliper Technologies (Mountain View, CA) which can provide very high throughput microfluidic assay methods. In one aspect, cells, viral plaques, spores or the like, comprising nucleic acids restructured by GAGGS, are separated in solid medium to produce individual colonies (or plaques). Using an automatic colony collector (for example the Q-bot, Genetix, U.K.) colonies or plaques are identified, collecting up to 10,000 different mutations inoculated in 96-well microtitre vessels containing two 3mm glass balls / holes. The Q-bot does not collect a complete colony but rather inserts a tip through the center of the colony and comes out with a small sample of cells (or mycelium) and spores (or viruses in plaque applications). The time that the tip is in the colony, the number of dives to inoculate the culture medium and the time in which the tip is in that medium, each one makes the inoculum size and each parameter can be controlled and optimized. The uniform process of collection -automatized colonies such as Q-bot decreases the error of handling -human and increases the rate to establish crops
(approximately 10,000 / 4 hours). These cultures are optionally stirred at a controlled temperature and humidity in the incubator. The optional glass spheres in the microtiter plates act to promote uniform aeration of the cells and dispersion of cell fragments (eg myceliales) similar to the blades of a fermentor. The cultures of the crops of interest can be isolated by limited dilution. As described above, the plates or cells constituting the files can be examined directly for protein production, either by detecting hybridization, protein activity, protein binding to antibodies or the like. To increase the identification changes of a group of sufficient size, a pre-examination can be used. increases the number of mutants processed 10 times. The objective of the primary test is to quickly identify the mu before they have equal or better titrated products than the parent strain (s) and move only these mutants to the liquid cell culture for subsequent analysis. A method for examining various files is to use a massively parallel solid phase procedure to examine the cells expressing the restructured nucleic acids, for example, which encode the enzymes for the enhanced activity. Massively-parallel solid-phase examination devices that use absorption, fluorescence or FRET are available. See for example, U.S. Patent 5,914,245 to Bylina "et al (1999), see also, http: // www. Kairos-scientific. / Youvan et al (1999)" Fluorescence _ Micro-Imaging Spectrophotometer (FIMS) "(Micro-Spectrophotometer for Fluorescent Image Formation (FIMS)) Biotechnology et alia <lt-al .com> g: 1-16; Yang et al (1998)" High Resolution Imaging Microscope (HIRIM) "(High Resolution Imaging Microscope (HIRIM)) Biotechnology et alia <www. Et-al. Com > 4: 1-20 and Youvan et al. (1999)" Calibration of Fluorescence Resonance Energy Transfer in Microscopy Using Genetically Engineered GFP Derivatives on Nickel Chelating Beads "(Calibration of Fluorescent Resonance Energy Transfer in Microscope Using Genetically Engineered GFP Derivatives in Nickel Chelation Beads) placed at www.kairos-scientific.com. through these techniques, the sequences of inte These are typically isolated, optionally sequenced and the sequences used as set forth herein to design new in silico sequences or other restructuring methods. Similarly, a number of well-known robotic systems have also been developed for the chemical substances-and solution phase, useful in the test systems. These systems include automated work stations such as automated synthesis devices developed by Takeda Chemical Industries, LTD- (Osaka, Japan) and many robotic systems using robotic arms (Zymate II, Zymark Corporation, Hopkinton, Mass; Orea Beckman Coulter , Inc. (Fullerton, CA) that mimic manual synthetic operations performed by a scientist Any of the foregoing devices are suitable for use with the present invention, for example for the high throughput examination of molecules encoded by codon nucleic acids. The nature and implementation of modifications for these devices (if any) so that they can operate as discussed herein, will be apparent to those skilled in the relevant art.High-performance examination systems are commercially available. (see for example Zymark Corp. Hopkinton, MA; Air Technical Industries, Mentor, OH; Beckman Instruments, Inc. Fullerton, CA; Precision Systems, Inc., Natick, MA, etc. ). These systems typically automate complete procedures including all samples and reagent pipetting, distribution of liquids, synchronized incubations and final reading of the microplates in the appropriate detector (s) for the test. These configurable systems provide high performance and fast commissioning as well as a high degree of flexibility and custom design. The elaboration of such systems provides detailed protocols of several high performances. In this way, for example Zymark Corp. provides technical bulletins describing the systems of examination to detect the modulation of gene transcription, ligand binding and the like. A variety of commercially available peripheral equipment and software are available to digitize, store and analyze a digitalized or digitized optical video or other test images, for example using PC computers (Intel x86 or DOS ™ based machines, 0S2 ™, WINDOWS ™, WINDOWS NT ™ or WINDOWS95 ™ compatible with Pentium chip), MACINTOSH ™, or UNIX (for example, SUN ™ workstation). Integrated systems for analysis, typically includes a digital computer with GO software for GAGGS and optionally control software, high performance liquids, image analysis software, - data interpretation software, a robotic liquid control framework to transfer solutions, from a source to a destination operably linked to the digital computer, an input device (for example a computer keyboard) to input data to the digital computer for the control of GAGGS operations or "liquid transfer" high performance through the robotic control liquid frame and, optionally, an image scanner to digitize label signals of the labeled test components.The image scanner can communicate with the image analysis software to provide a measurement of intensity of test label Typically, label intensity measurement The test is interpreted using the data interpretation software to show whether the labeled test hybridizes the DNA on the solid support. The computational hardware resources of the current technique are fully suitable for practical use in GAGGS (it will be sufficient, any Unix system valued in the mid range (for example by Sun Microsystems) or even higher end Macintosh or PCs will suffice). The current technology software technology (ie, there is a multitude of mature programming languages and suppliers -of source codes) for the design of a package of genetic algorithm oriented to an objective of improved open architecture specialized for users. of GAGGS with a biological backup.
A DIGITAL DEVICE FOR GOs Several methods and genetic algorithms (GOs) can be used to carry out the desirable functions as noted here. In addition, analog or digital systems such as analog or digital computing systems can control a variety of other functions such as the deployment and / or control of the output files. For example, standard desktop applications such as word processing software (for example Microsoft Word ™ or Corel WordPerfect ™) and database software (for example spreadsheet software such as Microsoft Excel ™, Corel Quattro) Pro ™, or database programs such as Microsoft Access ™ or Paradox ™) can be adapted to the present invention by inputting one or more character strings in the software that is loaded into the memory of a digital system, and run a GO as noted in the present character string. For example, the systems may include the above software having the appropriate character string information, for example, used in conjunction with a user interface (for example, a GUI in a standard operating system such as a Windows system, -
Macintosh, or LINUX) to manipulate the character strings, with the GOs programmed in the applications, or the GOs being executed manually by the user (or both). As noted, specialized alignment programs, such as PILEUP and BLAST can also be incorporated into the systems of the invention, for example, for the alignment of nucleic acids or proteins (or the corresponding character strings) as a preparatory step for carry out an additional GO in the resulting aligned sequences. It can also be included in the digital system, the software to run PCA. Systems for handling GO, typically include, for example, a digital computer with GO software to align and manipulate the sequences according to the GOs noted herein, or to execute
PCA, or similar, as well as data sets introduced into the software system comprising the sequences to be manipulated. The computer can be, for example, a PC
(Intel x86 or a machine with DOS ™, OS2 ™, WINDOWS ™, WINDOWS NT ™, WINDOWS95 ™, WINDOWS98 ™ LINUX compatible with Pentium chip, compatible with Apple, compatible with MACINTOSH ™, compatible with Power PC or compatible with UNIX (for example SUN ™ workstation)) or another commercially-common computer known to the expert. Software for alignment, or otherwise for manipulation of sequences may be constructed by an expert using a standard progamming language such as Visulabasic, Fortran, Basic, Java, or the like according to the methods herein. Any controller or computer optionally includes a monitor which may include, for example, a cathode ray tube ("CRT") display, a flat panel display (e.g., active matrix crystal-liquid display, liquid crystal display) , or others. The computer circuit is often placed in a box that includes numerous integrated circuit chips, such as microprocessor, memory, interface circuit, etc. The box also optionally includes a hard disk drive, a floppy disk drive, a high-capacity renewable unit such as a writable CD-ROM, and other 'common' peripheral elements. Input devices, such as a keyboard or mouse, optionally provide user input and selection of user sequences, to be compared "or otherwise manipulated in the relevant computer system - Computers typically include software - appropriate to receive user instructions, either in Xa user input form in a set of parameter fields, for example in a GUI, or in the "" form of pre-programmed instructions, for example, programmed for a variety of different specific operations. The software then converts these instructions into the appropriate language to instruct the system to perform any desired operation. For example, in addition to GO executing character string manipulation, a digital system can instruct an oligonucleotide synthesizer to synthesize oligonucleotides for gene reconstruction, or even to order oligonucleotides from commercial sources (e.g. , by means of appropriate printed order forms or by linking to an order form on the internet). The digital system may also include output elements for controlling the synthesis of nucleic acids (eg, based on a sequence or an alignment of a sequence herein), i.e., an integrated system of the invention optionally includes a synthesizer of oligonucleotides or an oligonucleotide synthesis controller. The system may include other operations that occur downstream from an alignment or other operation performed, using a string of characters - corresponding to a sequence in the present, for example, as noted above with reference to the tests. In one example, the GOs of the invention are incorporated into a fixed medium or transmittable program component that contains logical instructions and / or data that when loaded into an appropriately configured computing device causes the device to perform a GO in one or more character strings. Figure 13 shows an example of digital device 700 which should be understood as a logical apparatus that can read instructions from a medium 717, network port 719, user input keyboard 709, user input 711 or other means of entry. The apparatus 700 may subsequently use those instructions to direct the GO modifications of one or more strings of characters, for example, to construct one or more data sets (eg, comprising a plurality of sequences modified by GO corresponding to nucleic acids or proteins). Another type of logic apparatus that the invention can incorporate is a computer system as in the computer system 700 comprising a CPU 707, an optional user input device keyboard 709, and a GUI 711 signaling device, as well as peripheral components such as the disk drives 715 and the monitor 705 (whose display of GO modifies the character strings and provides the simplified selection of the subsets of such strings by the user.) The fixed means 717 is optionally used to program the system. Total and may include, for example, an optical or magnetic disk-like medium or other electronic memory storage element The communication port 719 may be used to program the system and may represent any type of communication connection. it can also be incorporated into the circuitry of a specific application integrated circuit (ASIC) or programmable logic device (PLD). In such a case, the invention is incorporated into a computer-readable descriptor language that can be used to create an ASIC or PLD. The invention can also be incorporated into the circuitry or logic processors of a variety of other digital devices, such as PDAs, laptop computer systems, displays, image editing equipment, etc. In a preferred aspect, the digital system comprises a component of learning where the .results. of the physical oligonucleotide binding schemes (compositions, product abundance, different processes) are monitored in conjunction with the physical assay, and the correlations are established. Successful and unsuccessful combinations are documented in the database to provide justification / preferences for the user base or digital system based on the selection of parameter sets for subsequent GAGGS processes that include the same set of character strings gene / nucleic acids / proteins (or even unrelated sequences), where the information provides improvement information of the process). The correlations are used to modify the subsequent GAGGS processes to optimize the process. -This cycle of synthesis, selection and physical correlation is optionally repeated to optimize the system. For example, a neural learning network can be used to optimize results. MODALITY ON A WEBSITE __ .__ _ ... The methods of this invention can be implemented in a localized or distributed computing environment. In a distributed mode, the method can be implemented in a single computer that comprises multiple processors or in a multiplicity of computers. Computers can be linked. For example, through a common bus, but more preferably the computer (s) is in nodes in a network. The network can be a generalized or local network dedicated or wide area and, in certain preferred modalities, computers can be components of an intranet or internet. In an internet mode, a client system typically runs a Web viewer and is coupled to a server computer running a Web server. The Web viewer is typically a program such as the IBM Web Explorer, Internet browser, NetScape or Mosaic. The Web server is typically, but does not need a program such as IBM's HTTP Daemon or another WWW daemon (for example LINUX base program forms). The client computer is bi-directionally coupled with the server computer on a line or through a wireless system. In turn, the server computer is bidirectionally coupled with a website (server that hosts the website) providing access to software that implements the methods of this invention. __ _ __
A user of a client connected to the Intranet or Internet can have the client request resources that are part of the web site (s) hosting the application (s) that provide an implementation of the methods of this invention. The server program (s) then process the request back to the specified resources (assuming they are currently available). A standard naming agreement, known as the Uniform Resource Locator ("URL") has been adopted. This agreement covers several types of location names, which currently include subclasses such as Hypertext Transport Protocol ("http"), Gopher File Transfer Protocol ("ftp"), and Wide Area Information Server ("WAIS") . When a resource is uploaded, it can include additional resource URLs. In this way the client user can easily learn of the existence of new resources that he or she has not specifically requested. The software implementing the method (s) of the invention can run locally on the server hosting the website in a true client-server architecture. In this way, the client's computer post requests the host server to run the locally requested process (s) and then uploads the results back to the client. Alternatively, the methods of this invention can be implemented in a "multi-row" format wherein one component of the method (s) is performed in loco by the client. This can be implemented by means of the software loaded from the server at the request of the client (for example a Java application) or it can be implemented by the software installed "permanently" to the client.
In one embodiment the application (s) that implement (n) the methods of this invention are "" divided into structures. In this paradigm, it is useful to visualize an application not so much as a collection of features or functionality but, rather, as a collection of discontinuous structures or views. For example, a typical application, generally includes a set of menu items, each invoking a particular structure that is, a form that manifests certain functionality of the application. With this perspective, an application is visualized not as a body of monolithic code, but as a collection of applets or integrated services of functionality. In this way from within a visualizer, a user would select a web page link which in turn would invoke a particular structure of the application (ie subapplication). Thus, for example, one or more structures may provide functionality to input and / or encode the biological molecule (s) in one or more character strings, while another structure provides the tools to generate and / or increase the diversity of the encoded character string (s). In the particularly preferred embodiments, the methods of the invention are implemented as one or more structures that provide for example, the following (s) - functionality (s). The function (s) to encode two or more biological molecules in the character strings to provide a collection of two or more different initial character strings wherein each of the biological molecules comprises a selected set of subunits; the functions to select in at least two substrings from the character strings; the functions to concatenate the substrings to form one or more product chains of approximately the same length as one or more of the initial character strings; the functions to add - (place) the product chains to a collection of strings, and the functions to implement any GAGGS feature or any GO or GA as set forth herein. "Functions to code two or more biological molecules can provide one or more windows, where the user can insert the representation (s) of biological molecules. In addition, the encryption function optionally also provides access to private and / or public databases accessible through a local network and / or the internet whereby one or more sequences contained in the databases can enter the methods of the invention. -From this way, for example, in a modality, where the end user gives entry to a nucleic acid sequenced in the coding function, the user can optionally have the ability to request a search in the GenBank and input to one or more of the sequences returned by such a search in the coding and / or diversity generation function. The methods of implementing intranet and / or internet modalities of computational processes and / or data access are well known to those skilled in the art and are documented in greater detail (see for example Cluer et al (1992) A structure of General Work for the Optimization of Target-Oriented Requests, Proc SIGMOD International Conference on Data Management, San Diego, California, June 2-5, 1992, SIGMOD Record, Vol.21, Issued June 2, 1992; Stonebraker M. Editor; ACM Press, pp.383-392; Project -Work, ISO-NSÍ, "Information Technology-SQL Language Database" (Information Technology-SQL Database Language), Jim Melton, Editor, International Organization for Standardization and the National Standards Institute of America, July 1992; Microsoft Corporation, "ODBC 2.0 Programmer's Reference and SDK Guide, Microsoft's Open Database Standard for Microsoft Windows TM and Windows NT.TM., Microsoft TM Open Database Connectivity, Software Development Team" , 1992, 1993, 1994 Microsoft "Press, pp 3-30 and 41-56; Project" of ISO Work, "SQL Database Language-Part 2: Foundation (SQL / Foundation)" ", CD9075-2: 199 SQL, September 11, "from 1997, and the like. The additional relevant details regarding the web-based applications are found in "METHODS OF POPULATING DATA STRUCTURES FOR USE" IN EVOLUTIONARY SIMULATIONS "(Selbonov and Stemmer METHODS FOR POPULATING DATA STRUCTURES FOR USE IN EVOLUTIONARY SIMULATIONS), "Attorney File Number 3271.002WO0. EXAMPLES The following examples are proposed to further illustrate the present invention and should not be considered as limiting. An expert will immediately recognize a variety of parameters that can be changed to achieve essentially similar results. Example 1: DECISION TREE FOR EXAMPLE GAGGS PROCESS A set of flowcharts is included that provides a general representation of an emplusive process of Directed Evolution (DE) by GAGGS (Figures 1-4). Figure 1 provides an example of a process that makes a decision from an idea of a desired property for the selection of a genetic algorithm. Figure 2 provides an evolution decision tree directed from the selection of the genetic algorithm to a refined file of strings of genitored characters. Figure 3 provides the example of the "processing steps from the purified genitor file for an original derived file of character strings," Figure 4 processes the original character strings for strings with a desired property. It is apparent that many modifications to this particular installation for DEGAGGS, for example, as set forth herein, can be developed and practiced.Several quality control modules and links, as well as as most of the learning components of the artificial genetic neural networks are omitted for clarity, but they will be apparent to an expert.The diagrams are in a continuous ordering each one connectable from beginning to end.The material and the additional implementations of the individual GO modules, and many GOs installations in the sec Active sources and trees, as used in GAGGS, are available in several software packages. The appropriate references that describe the existing software examples can be found, for example, at http://www.aic.nrl.navy.mil/galist/ _ and at http: // www. cs .purdue. edu / coast / archive / clife / FAQ / www / Q20_2. ht m. It will be apparent that many of the decision stages represented in Figures 1-4 are most easily accomplished with the assistance of a computer, using one or more software programs to facilitate the selection / decision processes. EXAMPLE 2: ESTIMATES OF MODELING COSTS __ The use of degenerate synthetic oligos with a very limited degree / low level of positional degeneration (under 0.01-5% per position) can offer a very substantial cost saving in the construction of those files that they incorporate substantial mutagenicity. However, for the synthesis of genes for PCR ordering, the representation of all crossing events between the entries of genitors uses the synthesis of two delicate oligos by stimulated crossover event. However, as will be apparent from the following examples and from the combinational nature of the nucleic acid evolution algorithms, very large gene files (109-1010) are constructed for smaller physical examination uses than the individual ones of 103 oligos of 40 -mer for the evolution of a typical gene family of size ~ 1.6kb.
Several typical examples below provide examples of costs of gene synthesis components in GAGGS, where the cost calculation is arbitrarily based on $ 0.7 per base (for an amount of 40-50 nmol, which is suitable for gene rearrangement procedures). ) for exemplary purposes. The higher volume demand in oligo synthesis service leads to substantially lower unit costs (for example, up to a 10-fold decrease) and the overall costs of oligo synthesis fall. The synthesis of oligos is a process - inefficiently parallel and routine "easily susceptible to automation and therefore increases in performance.Today, parallel devices without chips for the synthesis of oligos provide an effective capacity for simultaneous synthesis complete (single load) of 196 individuals (2x96) 60-mer oligos in less than 5 hours, with the cost of hardware less than $ 100K and the cost of reagents less than $ 0.07 per base. of these costs, the estimated costs made in the following examples can be reduced by at least 8 times - "" "* EXAMPLE 3: GAGGS OF A LOW MUTAGENICITY FILE ONLY. GENITOR . . . - - - This example describes the GAGGS of a single gentior low mutagenicity file derived from an average gene (~ 1.6kb), given the sequence information of a single gene of 1.6kb (encoding 500aa + oligos of start / end of "convenience"). The purpose is to build a file of gene variants with all the possible changes of a single amino acid, a change to a for each copy of the gene in the file. Relevant parameters include the number of oligos and costs to build 1 genitor gene of 1.6kb, for example, of 40 mer oligos, with the 20 + 20 full base coatings, for example, by prone-binding PCR without error, the number of all possible single aa replacement mutations, the number of distinct 40 -mer non-degenerate distinct oligos used to "" construct "all possible aa unique mutations, the minimum number of all distinct single codon degeneration oligos fixed position used to incorporate all possible aa single mutations, but without the terminations and the minimum number of all the distinct full-codon fixed-position single degeneration oligos used to incorporate all possible single aa mutations. - "" For a gene 1.6kb, 1o or 1,600: 40 oo 2 = 80 oligos, $ 0.7 oo 40 co 80 = $ 2,240, N = 500 oo 19 = 9,500 9/500 x2 = 19,000"; $ 532,000 @ $ 0.7 / base $ 56 per gene, 1 per group 500 x 2 x 3 -
= 3,000; $ 84,000 @ 0.7 / base $ 8.85 per gene, 20 phenotypes per group, normalized abundance (for example, when using only three variable codons, two of which degenerate:
NNT, VAA, TGG) 500x2 = 1,000; $ 28,000 @ 0.7 / base $ 2.94 per gene, 20"" phenotypes per group, oblique abundance (this results in the presence of significant numbers of truncated genes in the synthesized file). The same inventory of physical oligos used for the first round of GAGGS is used in the second round of GAGGS to synthesize a file containing ~ 95% of all possible combinations of any two aa unique changes. To have 100% coverage (including mutations combinations within the proximity of +/- 20bp, additional oligos are used.) When at least one mutation of the previous round has been identified as beneficial the coverage of all combinations of new mutations within +/- 20bp of beneficial mutations, use synthesis of no more than 42 new oligos). The cost of the subsequent rounds of GAGGS grows only marginally and linearly, while the diversity sampled in a recursive mode grows exponentially. EXAMPLE 4: RECOMBINOGENIC ARCHIVE GAGGS (NOT MÜTAGEÑICO) EMPLOYED BY A FAMILY OF GENES (GAGGS EQUIVALENT OF -
A SINGLE ROUND OF RESTRUCTURING FAMILY DNA) Sequence information given for six genes of reasonably average size (1.6kb), each having six areas of homology with each of the other parent genes (six "principles" and "ends") "for each chimerization homology area). Relevant parameters include: the number of oligos and resultant costs to build 6 genes generators of 1.6kb (from oligos of 40mer, of complete coating 20 + 20, by means of PCR of union of prone without error), the number of crosses of different pairs between all the areas of homology compared, suppose "1 crossing event by region of homology of pair", the number of all the possible chimeras that use the crosses, the size of the theoretical file and the number of different oligos and "costs to build all possible chimeras. As mentioned, 6 oo 1,600: 40 co 2 = 480 oligos, $ 0.7 oo 40 oo 480 = $ 13,440, that is, $ 2,240 per gene construction. N = 180, calculated according to the formula N = k 8 m oo (m-1), where m = 6 numbers of parents- and k = 6 is the number of areas of paesr homology that satisfies the crossing conditions. X = ~ 5.315 co 109, calculated according to the formula.
-
where X- is the size of the theoretical file and n is the number of crosses in each file entry (integer from 1 to k) 2 oo 180 + 480 = 84Q oligos; $ 0.7 oo 40 oo 840 = $ 23,520; $ 0.000048 per gene construction. If only 106 are examined, then the cost of the oligos is $ 0.024 per gene construct; if it is only 105, then the cost of the oligos is $ 0.24 per gene construct, if 104 are examined then the cost of the oligos is $ 2.35 per gene construct. The cost of running multiple rounds of GAGGS is not additive, since most of the excess oligos from the previous rounds can be reused in the. synthesis of the subsequent generation files. Even if only a small fraction of all gene constructs are actually examined (for example, 104, with the cost of $ 2.35 per gene construction), the cost of the. Oligos are comparable with the cost of the assays based on the assay per gene. In addition, the costs of the synthesis of widely industrialized oligos decrease. EXAMPLE 5: GAGGS IN STAGES This example provides a protocol in stages of the GAGGS family model.
A gene / protein family (DNA or AA sequence) is selected. All possible pairs alignments are developed to identify the homology regions of the pairs that satisfy the conditions of the crossing operator (length,% identity, stringency). The crossing points are selected, one for each of the substrings of pair homology, to the middle of each substring or randomly or according to a model construction of probability based on the reassociation or histograms of the degree of probability of crossing for each pair of parents. The oligos are selected for the binding PCR and synthesized. The genes / files are ordered from the synthesized oligos. The files are examined / selected as previously established. EXAMPLE 6: MODEL OF "SUBTILISING FAMILY ^ The amino acid sequences were aligned (the use of -codon can be optimized in the back-translation for a preferred expression system and the number of oligos for synthesis can be minimized). points of alignment of pairs of all possible pairs of 7 parents (figures 5, 6, 7) Figure 5 is a similarity alignment of percentage for the 7 parents.The amino acid sequences were aligned, with the guide peptide excluded. Figure 6 is an "alignment in - 75
Sequence diagram of the sequences to identify regions of similarity. Figure 7 is a representation showing the crossing points by 'pairs in the_ alignment. Pairs 6 and 7 show 95% identity for each >window; 7aa, while all other pairs show 80% identity for each window of > 7aa It is observed that the strictness of the alignment (and subsequent representation of the interbreeding between the parents) can be manipulated individually for each pair, so that the crosses of low homology can be represented in "the expense of the highly homologous generators." No structural derivation or derivation of active site was incorporated in this model.As an example the calculation of GAGGS for the subtilising family model, supposes 7 genitors, of approximately 400 amino acids and 1200bp each, including the guide) or approximately 275 amino acids and 825bp for the mature protein , 7x825x2 + approximately 500 = 12kb of the total sequences to be generated by gene synthesis.They are used from 40. mers, with total coverage junction (coverage of 20 + 20bp), "approximately 300 oligos. For the pairs crossing oligos to build chimeras, based on the alignment results, with a cross for each homologous substring, there are approximately 180 homologous substrings with 170 in the coding region and 10 in the guide region. With 260 mers for each crossing point and two sets from start to finish for each pair of parents, approximately 360 additional oligos dedicated to construct crosses can be used. The total number of oligos is approximately .660 (300 40 mers and 360 60 mers). At a total cost of the oligos of $ 0.70 per base, the oligos would be approximately $ 23,520. The cost of the reagents would run approximately $ 0.07 per base, for a total cost of approximately $ 2,252. EXAMPLE "7: Naphthalene DXOX1GENASE Naphthalene dioxygenase is a" reductive dioxygenase without hemu. There are at least three closely related but catalytically distinct types of naphthalenes dioxygenases. Figure 12 provides a schematic representation of the percentage of similarity for the three different types of naphthalenes dioxygenases, provided with the amino acid sequence for the large ISP subunit (which is responsible for substrate specificity). At a size of approximately 1,400 amino acids, oligos of 3 1,400 total base pairs = 260 40 mer exist for the synthesis of 20 + 20 coverage genes. A graph of the alignment of the sequence reveals that there are highly stringent oligos of 14 + 19 + 23 = 112 60 mer used in the recombination. The cost of the oligos at $ 0.70 per base would cost approximately $ 12,000 per synthesis, using approximately 9 hours of synthesizer time to make the oligos. The estimated size of the file would be approximately 9.4 X 109 chimeras. EXAMPLE 8: CALCULATION OF GAGGS OF A SINGLE GENITOR _ "_
As noted above, one aspect of the invention provides GAGGS from a single genitor. In these methods, polynucleotides having desired characteristics are provided. This is accomplished by: (a) providing a string of genomic sequence characters encoding a polynucleotide or polypeptide; (b) providing a set of character strings of a predefined length that encodes single-stranded oligonucleotide sequences comprising "overlapping sequence fragments of a full-length character string and a complete polynucleotide strand complementary to the strand of genitora characters
(dividing the sequence of a genitor _ into oligos suitable for the binding .PCR), - (c) create a set of -derivatives of genitora sequence comprising the variants with all the possible mutations of a single point, with for example , a mutation per chain of variation (which defines all the possible mutations of a single point); (d) providing a set of coating character strings of a predefined length that encodes both strands of the genomic and oligonucleotide sequence. "a set of chaining character strings of a predefined length that encode the sequence areas that include the mutations (the oligos that incorporate the single point mutations, suitable for the PCR scheme of the same junction); ) synthesizing the sets of single stranded oligonucleotides according to step (c) (e.g., constructing or reconstructing the genitora sequence or a variant thereof, e.g., incorporating mutations "from a single" point during the binding of the genes ), - (f) joining a "mutation gene file in binding PCR from the single stranded oligonucleotides (pool, partial" pool "or one per container). For gene-per-container approaches (or another process that includes physically separate file components, for example in arrays), wild-type oligos are excluded in the mutations; and (g) selecting or screening for the recombinant polynucleotides which have developed into a desired property. "In an optional additional step (h), the method includes unconvolume the.
sequence of the mutated polynucleotides (i.e., determining which file members have a sequence of interest and which is that sequence) that have been developed towards a desired property to determine the beneficial mutations (when PCR-binding is format one per recipient , this is done by deconvolution of the positional sequence, instead of the current sequencing, that is, the physical location of the components is adequate to provide the recognition of the sequence). In an optional additional step (i), the method includes ordering a file of recombinant variants that combine some or all of the possible beneficial mutations in some or all of the possible combinations, from single-stranded oligos by binding PCR. This is carried out from the same set of oligos; . if any of. the mutations are positionally close (within any oligo), then the extra oligos of a single braid are elaborated, which incorporates combinations of mutations. An additional step (j) includes selecting or screening recombinant polynucleotides that have been further developed into a desired property. Below is an example of GAGGS calculation of a single genitor, for a sequence kb. Genomics length: lOOObp. "- - -
Mutation rate of the first round: 1 amino acid / gene. "" Number of oligonucleotides to construct the wild-type gene: 52 (coating synthesis scheme of 40 mers, 20 + 20). Number of oligonucleotides to provide all. the possible mutations of a single point without termination (333 total possible): oligos without degrading: 13320. Partially degraded 40 mers, a position pg by oligo: 1920. Oligos of complete degradation, 40 mers, a position-fg by oligos: 666. The union of PCR of prone error will also be worked, but the deconvolution of sequence is carried out, for example when sequencing before the subsequent rounds. The number of additional oligos to allow the construction of all possible recombinations with beneficial mutations would utilize approximately 10% of the preceding number of oligos. However, approximately 95% of all possible recombinations that have beneficial mutations can be made from the initial set produced above. EXAMPLE 9: A PROCESS FOR THE DESIGN OF CROSS OLIGONUCLEOTIDES FOR THE SYNTHESIS OF CHEMICAL POLYUCLEOTIDES First, the substrings are identified and selected in genitorains chains to apply a crossing operator to form chimeric junctions. This is done by: a) identifying all or part of Xas regions of homology of pairs between all strings of genitored characters, b) selecting all or part of the regions of homology of pairs identified to index at least one crossing point within each of the homology regions of selected pairs, c) selecting one or more regions without homology of pairs to index at least one crossing point within each of the regions without homology of selected pairs ("c") is an optional step that can be omitted and it is also a stage where the helitism can be applied to the activity of the structure), thus providing a description of a set of regions / areas (substrings) positionally and genitoraly indexed, of character strings suitable for the additional selection of crossing points. Second, the additional selection of crossing points within each of the substrings of the set of substrings selected in stage 1 above is carried out. The stage includes: a) randomly selecting at least one of the crossing points in each of the selected substrings and / or b) select - - at least one of the crossing points in each of the selected substrings, using one or more of the models in liase to the reinforcement stimulation to determine the probability of selection of the crossing point within each of the selected substrings and / or c) select an intersection point approximately half of each of the selected substrings, creating by this a set of pairs crossing points, where each point is indexed to correspond to the positions of the characters in each of the desired genitoral chains to form a chimeric union at that point. Third, adjustments of the use of the optional codon are carried out. The process may vary depending on the methods used to determine the homology (chains that encode DNA or AA). For example, if a DNA sequence is used: a) codon adjustment for the selected expression system is carried out for each gene strand, and b) adjustment of the codons between genitors can be carried out, to standardize the use of the codon for each aa given in each corresponding position. This process can significantly decrease the total number of different oligos by the synthesis of the gene files and can be particularly beneficial for cases where the homology of the AAs is greater than the homology of the. DNA or con-families of highly homologous genes (eg + 80% identity). This option has been exercised with caution, since this is essentially an expression of a mutation operator. of elitism. In this way, one considers the benefits of cutting the number and the resulting costs of the oligos against the introduction of this derivation, which can have undesirable consequences. More typically, one uses codons that encode AAs at a given position in most genitors. If AA sequences are used; a) roll back the sequences to degenerate DNA; b) define the degenerate nucleotides using positional references by position to stop the use of the codon in the original DNA (of most of the gengenitors or of the corresponding genitor) and / or exercise the appropriate codon settings for the selected expression system where will perform a physical essay This step can also be used to introduce any restriction site within the encoded parts of the genes, if any, for subsequent identification / QA / deconvolution / manipulation of the file entries. All crossing points identified in step 2 above (indexed to the parent pairs) are correspondingly indexed to the adjusted DNA-sequences. Fourth, the oligo orderings are selected for a gene binding scheme. This stage includes several decision stages: Uniform oligos of 40-60 mer are typically used (using oligos majors will result in the decrease of # of oligos to build genitors), except additional uses of dedicated oligos to provide representations of crosses / mutations closely located. Choose Oligos Shorter / Longer
(ie a decision Yes / No). A "Yes" decision cuts the total number of oligos for genes of high homology of different lengths with spaces (cancellation / insertion) is for 1-2 aa. Select the length of coatings (typically 15-20 bases, which can be symmetric asymmetric). Select if oligo degradation is allowed (Yes / No). This is another cost-effective feature and also a powerful means to obtain additional sequence diversity. The partial degeneration schemes and the minimized degeneration schemes are especially beneficial in the construction of mutagenic files. If software tools are used for these operations, several variations of the parameters are executed to select the maximum complexity of the file and the minimum cost. Exercising - complex binding schemes using oligos of various lengths complicate, significantly the indexing processes and, subsequently, the union of the files in positionally encoded parallel or partially grouped formats. If this is done without sophisticated software, a simple and uniform scheme can be used (for example, all oligos of 40 base length with 20 bases covering). Fifth, "convenience sequences" are designed on the front and back of the genitora chains. Ideally, it is the set of schemes that will be built on each file entry at the end. This includes any restriction site, the primer sequences for the identifications of bound products, RBS, leader peptides and other special or desirable characteristics. In principle, the sequences of convenience can be defined in a later stage and in this stage, a "transient" set of appropriate length can be used. For example, a substring from some forbidden letters - easily recognizable. Sixth, an indexed array of oligo chains is created to construct each genitor, according to the selected scheme. An 'index of each oligo includes: a genitor identifier (genitor ID), indication of the coding chain - or complementary and position numbers. The crossing points are determined by the indexed coding chains of each genitor with the substrings of convenience of beginning and end. A complementary chain of each chain is generated. Each coding strand is selected according to the binding PCR scheme selected in step 4 above (for example in '40 bp increments). Each complement chain is divided according to the same scheme (for example, 40 bp with a displacement of 20 bp). Seventh, an indexed oligo matrix is created for each "cross-pair" operation First, all oligos that have pairs crossing markers are determined Second, "all sets of all oligos that have the same position are determined." and the same pair of parent crossing markers (4 per crossing point). Third, each set of chains of 4 oligos that have been tagged with the same crossing marker is elaborated and another set derived from 4 - chains of chimeric oligos comprised of characters that encode 2 codings and 2 complementary chains (for example with displacement of 20 bp in scheme of 40 = 20 + 20). Two coding chains are possible, which have a substring of forward end sequence of a genitor followed by the back end of the second genitor after the crossing point. The complement chains are also designed in the same way, obtaining by this a complete indexed inventory of chains that encode oligos suitable for the gene file junctions by means of PCR. This inventory can additionally be optionally purified by detecting all "redundant oligos, counting and eliminating them from the inventory, accompanied by the introduction of the calculation value for a field of "abundance = quantity" in the index of each chain of oligos. This can be a very beneficial step to reduce the total number of oligos for the synthesis of files, particularly in cases where the genitoras sequences are highly homologous. EXAMPLE 10: PROGRAM ALGORITHM FOR DESIGNING OLIGONUCLEOTIDES FOR SYNTHESIS _ _ The following is a program configuration for designing oligonucleotides for use in synthetic / recombination protocols. "" "" *
Given a protein alignment table and a codon derivation: - - For each position in a protein alignment find a set of minimally degenerate codons that encode Amino Acids in their position using the codon derivation table for each sequence in alignment Add three letter codons (DNA) encoding the amino acids in their position in this sequence for the sequence DNA version. The spaces are represented by a special codon for each DNA sequence created according to the above. Note that the spaces are ignore at this stage For each window = approximate size of oligo verify end degeneracy Try to increase and decrease the length of the window to minimize end degeneration while staying within the limits of the length add the oligos given the limits of the window and all the sequences add the oligos given the inverse limits of the window and all the sequences for each position in the DNA sequences "for each sequence -" "*" If the current position is at the beginning of a space Add oligos given the limits "that contain a minimum number of sequences coming from of the current sequence 5 'of the space and a minimum number of sequences - from the current sequence 3' of the space Repeat add oligos for the inverse limits add Oligos: given a list of sequences (ADN) and limits for each position in the limits get all the unique bases in this position from the DNA sequences in the list generate a base (or degenerate symbol of. base) for this position - if the total degenerate positions is greater than the defined user number divide the list of sequences into two, add Oligos' given a list of sequences, add Oligos given two lists of sequences (recursive) also add This oligo (set of bases for each position) so that the list of oligos display all the oligos in the list of oligos. EXAMPLE 11: SELECTION OF THE "CROSSPOINT POINT" Figures 8-11 are diagrams of various processes and process criteria for the selection of oligonucleotides for recombination between the parent nucleic acids.
Figure 8, panel A shows an alignment of dot plots typical of two parents and the increase in crossover probability that results in regions of similarity. Panel B shows-that crosses can be selected based on a simple logical / physical filter, that is, "the physical or virtual annealing temperature of the oligonucleotides, for example using a linear annealing temperature." Panel C shows several more complex filters that vary the annealing temperature to achieve specific crossings, that is, by appropriately controlling the physical or virtual annealing temperature Figure 9 schematically represents the introduction of the crossing points indexed in the sequence of each parent In summary, the sequences are aligned and the positional index of each crossing point (field of the marker) is represented schematically by a vertical mark of the identifier.The crossing point- for the parents m and n, as represented in figure 8 , are represented by an identifier, a position number for "the genitor m (a pr incipio) and a position number for genitor n (an end). This process is repeated for each -genitor in a data set, applying the grid operator of the oligonucleotide (the grid of the positional indices indicating the start and end of each oligonucleotide in a PCR binding operation) for each of the parents. Figure 10 schematically represents the complete inventory of oligonucleotide sequences for sorting all the parents. The data set is simplified by identifying all pairs of oligonucleotide sequences with the crossed rates of coupled pairs, providing a sub-inventory of oligos with markers-. of crossing. Figure 11 provides a scheme for obtaining an inventory of sequences for the chimeric oligonucleotides for each of the selected crossing points. In summary, two pairs of sequences of oligos are selected with the crossing indexes in coupling pairs (arrow 1 down). Chimeric oligonucleotide sequences are generated around the crossing points in pairs (beginning-end, end-beginning, chain s or a) (arrow 2 down). In a "40 = 20 + 20" junction scheme (where a of 40 mer has 20 residues for each genitor), only one oligo greater than 60 bp is used for each chimerization (two for each crossing point, relative to the relative positions inside each oligo). This empirical finding can be described by the variations of the cutting and joining operations in the S or A - chains represented (for example as in Figure 8) by reducing the rule to a guide table. In the chimeric oligos, the string A or S and the beginning and end sequence subfragments are defined using a selected subset of rules a from the guidance table.The selection or rules from the guidance table can be automatic based on the comparison of the relative positions of the crossing points in the oligos (boolean operations) (arrow 3 downwards) This process is repeated for each set of oligos with identical cross-indexes to obtain an inventory of sequences for the chimeric oligos for each of the selected crossing points Modifications can be made to the methods and materials as described above without departing from the spirit or scope of the invention as claimed and the invention can be put into several different uses, including: The use of an integrated system to generate restructured nucleic acids and / or test restructured nucleic acids, in cluded in an iterative process. An assay, a system or system that employs "the use of any of the selection strategies, materials, components, methods, or substrates described hereinabove." The equipment will optionally additionally comprise instructions for carrying out methods or assays, conditioning, one or more containers containing the test, devices or system components or the like In a further aspect, the present invention provides equipment incorporating the methods and apparatus of the present invention The equipment of the invention comprises, optionally, one or more than the following: (1) a restructured component as described in this; (2) instructions for practicing the methods described herein and / or for operating the selection procedure herein; (3) one or more test components; (4) a container for containing the nucleic acids or enzymes, other nucleic acids, transgenic plants, animals, cells or the like; (5) conditioning materials; and (6) software to execute any of the decision stages noted herein related to GAGGS. In a further aspect, the present invention provides the use of any component-or equipment hereof, for the practice of any method or assay herein and / or for the use of any apparatus or equipment for "practicing any test or method". of the present The above examples are illustrative and not limiting An expert will recognize a variety of non-critical parameters that can be modified to achieve essentially similar results All patents, applications and publications cited herein are incorporated by reference in their entirety. for all purposes.
Claims (88)
- CLAIMS 1. A method for making a recombinant nucleic acid, the method comprising: providing - a plurality of strings of genitor characters corresponding to a "plurality of nucleic acids, whose strings of characters, when aligned for maximum identity," comprise at least one region of heterology; align character strings, - "" define a set of subsequences of strings, whose set of subsequences comprises the subsequences of at least two of the plurality of strings of parent characters; provide a set of oligonucleotides corresponding to the set of subsequences of .'character strings; strengthen the set of oligonucleotides; and elongating one or more members of the set of oligonucleotides with a polymerase or ligating at least two members of the set of oligonucleotides with a ligase, thereby producing one or more recombinant nucleic acids. The method of claim 1 wherein the character hips, when aligned for maximum identity comprise at least one region of similarity. 3. The method of claim 1 wherein at least one of the strings of genitored characters is an evolutionary or artificial intermediary. 4. The method of claim 1 wherein at least one of the "strings of genitored characters" corresponds to a designed nucleic acid 5. The method of claim 4 wherein the designed nucleic acid represents a minimized energy design for a polypeptide encoded 6. The method of claim 1, further comprising applying one or more genetic operators to one or more of the strings of genitored characters or to one or more of the string subsequences in which the genetic operator is selects from: a mutation of. one or more strings of genitored characters or. one or more subsequences of string of characters, a multiplication of the one or more strings of "genitored characters or- one or more string subsequences; a fragmentation of the one or more strings of genitored characters or one or more string subsequences, up crossover between any of the one or more strings of genitive characters OR "one or more string subsequences of characters or an additional strings of characters , "a link of the one or more strings of genitored characters or one or more string subsequences, a calculation of elitism, a calculation of sequence homology or sequence similarity of the aligned strings, a recursive use of one or more genetic operators for the evolution of the strings of characters, the application * of an operation of randomness for the nail or more strings of genetic characters or one or more subsequences of string of characters, a mutation of suppression of the one or more strings of characters genitoras or one or more subsequences of a string of characters, an insertion mutation in the one or more strings of genetic characters or one or more subsequences character string, the subtraction of the one or more strings of genitored characters or one or more string subsequences, with an inactive sequence, the selection of the one or more strings of genomic characters or one or more string subsequences of characters, with an active sequence and the death of the one or more strings of genitored characters or one or more string subsequences of characters. 7. The method "of claim 1, further comprising selecting a registered sequence, the registered sequence comprising an intermediate level of sequence similarity between two or more of the plurality of character strings. The method of claim 1 wherein the set of oligonucleotides comprises a plurality of coating oligonucleotides. The method of claim 1, wherein the set of subsequences of character strings is defined. by selecting a length of the character strings and subdividing at least two of the plurality of strings of genitor characters into segments of the selected length. The method of claim 1 wherein the alignment of the character strings is performed on a digital computer or on a network-based system. 11. The method of claim 1, which further comprises synthesizing a set of single braid oligonucleotides corresponding to the set of character string subequences, thereby providing the set of oligonucleotides. The method of claim 1, further comprising: grouping all or part of the sets of oligonucleotides; hybridize the resulting pooled oligonucleotides; and extending a plurality of resulting hybridized oligonucleotides wherein at least one of the resulting double stranded nucleic acid-nucleic acids comprises the sequence of at least two of the plurality of genitored character strings. The method of claim 11, further comprising denaturing the double stranded nucleic acids, thereby producing a heterogeneous mixture of single stranded nucleic acids. The method of claim 11, further comprising: (i) denaturing the double stranded nucleic acids, thereby producing a heterogeneous mixture of single stranded nucleic acids. (ii) rehybridizing the heterogeneous mixture of single stranded nucleic acids; and (iii) spreading the resulting rehybridized double stranded nucleic acids with a polymerase. The method of claim 13, further comprising repeating steps (i), (ii) and (iii) at least two times. 16. The method of claim 1, further comprising selecting the one or more recombinant nucleic acids for a desired property. The method of claim 1 wherein the set of oligonucleotides is provided by synthesizing the oligonucleotides to comprise one or more of the "subsequences of modified genitocharacter strings, the subsequences of which comprise one or more of: a subsequence of character strings modified by one or more replacements of a finger or more "characters of _ the subsequence of strings of characters" "genitoras with one or more different characters;" "X a subsequence of strings of genitora characters modified by one or more deletions or insertions one or more characters of the subsequences of "genitora" character strings, a subsequence of gene chains modified by the inclusion of a degenerate sequence character in one or more positions selected randomly or non-randomly; "a string subsequence; of genitora characters "modified by the inclusion of a character string is from a different string of characters from a second string subsequence "genitora en una. or more. positions; a subsequence of strings of genitora characters which is derived based on their frequency in a selected nucleic acid file; and a subsequence of parent character strings comprising one or more sequence motifs whose sequence motif is artificially included in the subsequence. 18. The method of claim 17 wherein the sequence motif comprises a N-linked glycosylation sequence, an O-linked glycosylation sequence, a protease sensitive sequence, a collagenase-sensitive sequence, a terminating sequence Rho-dependent transcriptional sequence, a RNA sequence of "secondary structure that affects the efficiency of transcription, an RNA sequence of secondary structure - that- affects translation efficiency, a transcriptional enhancer sequence, a transcriptional promoter sequence or A transcriptional buffer sequence 19. The method of claim 1 wherein the set of oligonucleotides contains one or more modified or degenerate positions as compared to the corresponding sequence of one or more strings of genitored characters. claim 1, further comprising selecting the one or more recombinant nucleic acids s based on their hybridization to a selected nucleic acid or to a set of selected nucleic acids. The method of claim 1 wherein the one or more strings of genitored characters' comprises at least two strings of genitored characters wherein the set of oligonucleotides comprises at least one oligonucleotide member comprising a chimeric nucleic acid sequence, the at least one oligonucleotide member comprising at least two subsequences of oligonucleotide members wherein the at least two subsequences of oligonucleotide members correspond to at least two subsequences of the at least two strings of genitored characters, the at least two being separated subsequences of oligonucleotide members through a crossing point. The method of claim 21 wherein the crossing point is selected by identifying a plurality of subchains of genitored characters of a plurality of at least two strings of genitored characters, which align the substrings to display the identity in "pairs between sub-chains and selecting a point within the aligned subsequence as the crossing point 23. The method of claim 21 wherein the crossing point is randomly selected 24. The method of claim 21 wherein the crossing point is not randomly selected 25. The method of claim 21 wherein the crossing point is not randomly selected by selecting a crossing point approximately in half of one or more identity regions of the identified pair. claim 21 wherein at least one crossing point for the at least one oligonucleotide member is selected from a region outside a region of homology of the identified pair. The method of claim 1, further comprising adding one or more oligonucleotide members from the set of oligonucleotides at a concentration that is greater than at least one additional oligonucleotide members from the set of oligonucleotides. The method of claim 1, further comprising incubating one or more members of the set of oligonucleotides with the recombinant nucleic acid and a polymerase 29. The method of claim 1, further comprising denaturing the recombinant nucleic acid and contacting the recombinant nucleic acid with at least one additional nucleic acid from the set of oligonucleotides 30. The method "of claim 1, further comprising denaturing the recombinant nucleic acid and contacting the recombinant nucleic acid with at least one additional nucleic acid produced by the division of a genitor nucleic acid encoded by at least one gene character string. The method of claim 1, further comprising denaturing the recombinant nucleic acid and contacting the recombinant nucleic acid with at least one additional nucleic acid produced by the division of a genitor nucleic acid encoded by at least one genitora string , whose genitor nucleic acid is divided by one or more of: chemical division, division with a DNase and division with a restriction endonuclease. 32. The method of claim 1 wherein the genitored character string encodes one or more "nucleic acids" corresponding to one or more proteins or a gene selected from: EPO, insulin, a peptide hormone, a cytosine, a factor of epidermal growth, fibroblast growth factor, hepatocyte growth factor, insulin-like growth factor, an interferon, an interleukin, a keratinocyte growth factor, a leukemia inhibitor, oncostatin M, PD-ECSF, PDGF, pleiotropin ", SCF, c-team ligand, VEGEF, G-CSF, an oncogene, a tumor suppressor, a steroid hormone receptor, a plant hormone, a disease-resistant gene, a gene resistant to herbicides, a bacterial gene, a monooxygenase, a protease, a nuclease and a lipase. The method of claim 1 wherein the set of oligonucleptides comprises one or more oligonucleotide members between about 20 and about 60 nucleotides in length. 34. The method of claim 1, further comprising selecting the recombinant nucleic acid for a desired characteristic or property., thereby providing a selected recombinant nucleic acid. 35. The method of claim 1, further comprising recombining the selected recombinant nucleic acid with one or more of: a homologous nucleic acid and "an oligonucleotide member from the set of oligonucleotides." 36. The method of claim 1, comprising further selecting the recombinant nucleic acid for a desired characteristic or property, thereby providing a selected recombinant nucleic acid wherein the desired characteristic or property is selected in an in vivo selection assay or a solid phase parallel assay. 37. The method of claim 1, further comprising selecting the recombinant nucleic acid for a desired characteristic or property, thereby providing a selected recombinant nucleic acid wherein the desired characteristic or property is selected in an in vitro screening assay. 38. The method of claim 1, further comprising deconvolution of the recombinant nucleic acid. 39. The method of claim 1, further comprising sequencing or cloning the recombinant nucleic acid 40. The method of claim 1 wherein the recombinant nucleic acid is synthesized in vitro by the binding PC. 41. The method of claim 1 wherein the recombinant nucleic acid is synthesized in vitro by prone error binding PCR. 42. The method of claim 1 wherein the strings of genomic traits or sets of oligonucleotides are selected in a computer. 43. A method of making character strings, the method comprising: a) providing a genomic character string encoding a polynucleotide or a polypeptide; b) providing a set of oligonucleotide character strings of a pre-selected length encoding a plurality of single stranded oligonucleotide sequences comprising sequence fragments of the parent character string and the complement thereof; c) _ creating a set of derivatives of the genitora sequence comprising sequence variant chains, the set comprising a plurality of mutations, which have a variant chain mutation. 44. The method of claim 43 wherein a plurality of the plurality of single stranded oligonucleotide sequences is sequenced. "" ** " 45. The method of claim 43, further comprising applying one or more genetic operators to the character-genitora chain or to one or more oligonucleotide character strings wherein the "genetic operator is selected from: a mutation of the parent character string" or one or more of the oligonucleotide character strings, a multiplication of the parent character strings or one or more of the oligonucleotide character strings, a fragmentation of the parent character string or one or more of the character strings of oligonucleotides, a crossing between any of the parent character strings or one or more of the oligonucleotide character strings or an additional character string, a link of the parent character string or one or more of the oligonucleotide character strings, a calculation of elitism, a sequence homology calculation or sequence similarity of the alignment comprising the genitora character string or one or more of the oligonucleotide character strings, a recursive use of one or more genetic operators for the evolution of strings, the application of a randomness operator for the "string of genitora characters or for one or more of the oligonucleotide character strings, a deletion mutation of the parent character string or one or more of the oligonucleotide character strings, an insertion mutation in the parent character string or one or more of the oligonucleotide character strings, the subtraction of the parent character string or one or more of the oligonucleotide character strings with an inactive sequence, the selection of the parent character string or one or more of the oligonucleotide character strings with an active sequence and death of the parent character string or one or more of the oligonucleotide character strings. 46. The method of claim 43, further comprising: d) providing a set of coating character strings of a pre-defined length that encodes both strands of the sequence of generic character strings; and e) synthesizing the single stranded oligonucleotide assemblies according to step (c) and (d). 47. The method of claim 46, which further comprises: f) joining a recombinant nucleic acid file by binding PCR from the single stranded oligonucleotides. 48. A file prepared by the method of claim 47. - 49. The method of claim 47, further comprising: g) selecting or examining the file for one or more recombinant polynucleotides having a desired property. SW. The method of claim 48, further comprising h) deconvoluining the sequence of one or more of the selected polynucleotides. 51. The method of claim 46 wherein the sequence of one or more of the selected polynucleotides is deconvoluted by sequencing the selected polynucleotide or by digesting the one or more of the selected polynucleotides 52. The method of claim 46 in where the sequence is deconvolved by the positional deconvolution of one or more selected polynucleotides 53. The method of claim 46, further comprising the restructuring or reiterative selection of the recombinant nucleic acid file 54. A method to facilitate recombination between two or more divergent nucleic acids, the method comprises: aligning the strings of genetic characters corresponding to the divergent nucleic acids, thereby identifying the regions of sequence identity and the regions of sequence diversity; define a chain of diplomatic characters that is intermediary in the sequence between strings of genitored characters; synthesize at least a portion of the sequence. qualified to produce a certified nucleic acid; and recombining a mixture of selected nucleic acids comprising the parent nucleic acids or fragments thereof and the licensed nucleic acid. 55. The method of claim 54"wherein the graded nucleic acid is synthesized by synthesizing a plurality of coating oligonucleotides corresponding in sequence to the licensed sequence, hybridizing the coating oligonucleotides and incubating the coating oligonucleotides with a polymerase. 56. The method of claim 54, further comprising synthesizing a group of oligonucleotides that corresponds to one or more of the strings of genitored characters, whose group of oligonucleotides is present in the mixture of selected nucleic acids. Selected nucleic acids produced by the method of claim 56. 58. A method for regenerating and recombining nucleic acids, the method comprising: introducing a plurality of character strings of amino acid sequences in a digital system: inverse translation of the character strings of amino acids in the digital system haci to a plurality of amino acid character strings wherein the nucleic acid sequences translated in reverse are selected by one or more of: codon-species derivations in a selected expression host and optimized sequence similarity among the plurality of chains of nucleic acid characters; and synthesizing one or more sets of oligonucleotides corresponding to one or more nucleic acid sequences translated in reverse. 59. The "" "method of claim 58, further comprising hybridizing members of one or more sets of oligonucleotides to each other or to a set of fragmented nucleic acids encoding one or more amino acid polymers corresponding to one or more of the amino acid sequence character strings 60. The method of claim 59, further comprising elongating one or more of the resulting hybridized nucleic acids with a polymerase, 61. The method of claim 58, further comprising fragmenting one or more of the resultant elongated nucleic acids and hybridizing the resultant fragmented nucleic acids resulting from each other or with members of the one or more sets of oligonucleotides or with a set of primarily fragmented nucleic acids encoding one or more of the amino acid polymers corresponding to one or more of - the character strings of amino acid sequence. 62. A method for optimizing the activity of a nucleic acid, the method comprising: parameterizing a set of nucleic acids or proteins to provide up a set of multidimensional data points; extrapolate one or more of the points "of multidimensional data postulated from the set of multidimensional data points, and convert the postulated multidimensional data points to a new chain of characters corresponding to a postulated nucleic acid or protein. 64. The method of claim 62, comprising the analysis of the main component of the set of multidimensional data points 65. The method of claim 62, comprising restructuring the postulated nucleic acids or a sequence thereof, with an additional nucleic acid 66. The method of claim 62 wherein the set of nucleic acids or proteins is parameterized by correlating each nucleic acid or protein residue to a matrix of nucleic acids or proteins. numerical indicators 67. The method of claim 66 wherein the matrix is represented graphically as a tetrahedron, which has an origin assigned in the center of the tetrahedron, with each corner represented as a numerical representation, each nucleic acid residue being placed in a different corner, thereby producing the matrix of numerical indicators. 68. The method of claim 62, comprising correlating each point. of multidimensional data with an output vector to identify a relationship between a matrix of dependent Y variables and a matrix of predicted X variables. 69. The method of claim 68 wherein the correlation is carried out by "partial minimum quadratic projections for the analysis of latent structures." 70. The method of claim 62 wherein each multidimensional data point comprises more than one different parameter. where the parameters are plotted together in the dimensional hyperspace n, at least comprising the dimensional space bridge n one dimension - for each parameter 71. A method for providing "a - file of recombinant nucleic acids that are enriched by a sequence of interest and selects the file, the method comprises: producing an initial file - of at least about 106 recombinant nucleic acids, whose initial file of recombinant nuclease acids comprises at least about 105 different types of members, whose 105 different types of members are not identical; hybridize - the file for one or more populations of nulceic acids, whose one or more populations of nulceic acids correspond to one or more subsequences in the different members of the file; isolating the members of the file that hybridize to one or more populations of nulceic acids thereby enriching the nucleic acid file by the members that hybridized to the one or more populations of nulceic acids; and select the members of the resulting rich file for one or more -more interesting properties. 72. The method of claim 71 wherein the initial file has between about 109 and 1012 members .. - - 73. The method of claim 71 wherein the one or more populations of nulceic acids is attached to a solid substrate. 74. The method of claim 73 wherein the solid substrate comprises one or more of: a column matrix material and a nucleic acid chip. 75. The method of claim 71 wherein the initial file is produced by recombining one or more homologous nucleic acids. 76. The enriched file produced by the method of claim 71. The method of claim 71, wherein the initial file is produced by: producing a plurality of strings of gemstone characters corresponding to a plurality of acids. , whose character strings, when aligned by "the maximum identity, comprise at least one region of similarity and at least one region of heterology, - aligning the strings of characters; defining a set of subsequences of strings of characters, whose set of subsequences comprises subsequences of at least two of the plurality of strings of genitored characters; provide a set of oligonucleotides corresponding to the set of subsequences of -symbol of characters; strengthen the set of oligonucleotides; and lengthening one or more members of the set of oligonucleotides with a polymerase, thereby producing the initial nucleic acid file. 78. A method to generate a biological polymers file, the method comprises: generating a diverse population of character strings in a computer, whose character strings are generated by modifying the pre-existing character strings, - and synthesizing the diverse population of chains of characters, whose diverse population comprises the archive of biological polymers. 79. The method of claim 78 wherein the modification comprises the recombination of the pre-existing character strings. 80. The method of claim 78 wherein the biological polymers are selected from nuclease acids, polypeptides and peptide nuclease acids. 81. The method of claim 78, further comprising selecting the members of the biological polymer file for one or more activities. 82. The method of claim 81, further comprising filtering an additional file or an additional set of character strings by subtracting the additional file or the additional set of character strings with members of the biological polymer file that display activity below a desired threshold. 83. The method of claim 81, further comprising filtering an additional file or an additional set of character strings by deriving the additional file or the additional set of strings from "characters with members of the biological polymers file that display activity by below a desired threshold 84. An integrated system comprising a "computer having a first data set comprising a pirmer string of characters, a second data set comprising a second string of characters the software to align the first and the second character strings the software for performing a genetic operation in the first or second string of characters, an output file comprising a third data set comprising a third string of characters, the third string of characters comprising the string sub-stringings of characters from the first and second strings of characters and an output file of The oligonucleotide sequence comprising a plurality of coating oligonucleotide sequences corresponding to the third character string. 85. The integrated system of claim 84, the system further comprises a synthesis machine. - oligonucleotides for synthesizing the plurality of coating oligonucleotides. 86. The integrated system of claim 84, the system further comprises a plurality of oligonucleotides encoded by the plurality of coating oligonucleotide sequences, whose oligonucleotides, when incubated in one or more cycles of the chain extension produce a third nucleic acid encoded by the third string of characters. 87. The integrated system of claim 84, wherein the system further comprises a program with a set of instructions for applying one or more genetic operators to the first or second character string or to any other character string. 88. The integrated system of claim 84, wherein the system further comprises a program with a set of instructions for applying one or more genetic operators to the first or second character string or to any string of characters wherein the The genetic operator is selected from: a mutation, a multiplication, a chain or chain fragmentation, a crossing between one or more chains, a chain ligation, a helitism calculation, an alignment, a sequence homology calculation or a similarity of sequence, up recursive use of one or more genetic operators for the evolution of character strings, randomness, a deletion mutation, an insertion mutation and death.
Applications Claiming Priority (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US60/116,447 | 1999-01-19 | ||
US60/118,854 | 1999-02-05 | ||
US60/118,813 | 1999-02-05 | ||
US60/141,049 | 1999-06-24 | ||
US09408392 | 1999-09-28 | ||
US09408393 | 1999-09-28 | ||
US09/416,375 | 1999-10-12 | ||
US09/416,837 | 1999-10-12 |
Publications (1)
Publication Number | Publication Date |
---|---|
MXPA00009026A true MXPA00009026A (en) | 2001-07-09 |
Family
ID=
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7853410B2 (en) | Method for making polynucleotides having desired characteristics | |
US7058515B1 (en) | Methods for making character strings, polynucleotides and polypeptides having desired characteristics | |
US7421347B2 (en) | Identifying oligonucleotides for in vitro recombination | |
WO2001075767A2 (en) | In silico cross-over site selection | |
JP5319865B2 (en) | Methods, systems, and software for identifying functional biomolecules | |
US6436675B1 (en) | Use of codon-varied oligonucleotide synthesis for synthetic shuffling | |
US20120040871A1 (en) | Oligonucleotide mediated nucleic acid recombination | |
US20060051795A1 (en) | Oligonucleotide mediated nucleic acid recombination | |
JP2005520244A (en) | Crossover optimization for directed evolution | |
CA2396320A1 (en) | Integrated systems and methods for diversity generation and screening | |
US20030054390A1 (en) | Oligonucleotide mediated nucleic acid recombination | |
US20110160071A1 (en) | Novel Proteins and Methods for Designing the Same | |
Stähler et al. | Another side of genomics: synthetic biology as a means for the exploitation of whole-genome sequence information | |
MXPA00009026A (en) | Methods for making character strings, polynucleotides and polypeptides having desired characteristics | |
WO2008127213A2 (en) | Methods, systems, and software for regulated oligonucleotide-mediated recombination | |
KR20010042037A (en) | Methods for making character strings, polynucleotides and polypeptides having desired characteristics | |
DK2253704T3 (en) | Oligonucleotide-mediated recombination nucleic acid |