WO2020165408A1 - Means and methods for preparing engineered target proteins by genetic code expansion in a target protein-selective manner - Google Patents

Means and methods for preparing engineered target proteins by genetic code expansion in a target protein-selective manner Download PDF

Info

Publication number
WO2020165408A1
WO2020165408A1 PCT/EP2020/053883 EP2020053883W WO2020165408A1 WO 2020165408 A1 WO2020165408 A1 WO 2020165408A1 EP 2020053883 W EP2020053883 W EP 2020053883W WO 2020165408 A1 WO2020165408 A1 WO 2020165408A1
Authority
WO
WIPO (PCT)
Prior art keywords
rna
poi
ncaa
amino acid
trna
Prior art date
Application number
PCT/EP2020/053883
Other languages
French (fr)
Inventor
Edward Anton LEMKE
Christopher Dieter REINKEMEIER
Gemma ESTRADA GIRONA
Original Assignee
European Molecular Biology Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by European Molecular Biology Laboratory filed Critical European Molecular Biology Laboratory
Priority to EP20703782.1A priority Critical patent/EP3924365A1/en
Priority to US17/426,338 priority patent/US20230098002A1/en
Priority to CN202080028507.1A priority patent/CN113727993A/en
Priority to JP2021545719A priority patent/JP2022521049A/en
Priority to CA3129336A priority patent/CA3129336A1/en
Publication of WO2020165408A1 publication Critical patent/WO2020165408A1/en
Priority to IL285405A priority patent/IL285405A/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/93Ligases (6)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P21/00Preparation of peptides or proteins
    • C12P21/02Preparation of peptides or proteins having a known sequence of two or more amino acids, e.g. glutathione
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/01Fusion polypeptide containing a localisation/targetting motif
    • C07K2319/05Fusion polypeptide containing a localisation/targetting motif containing a GOLGI retention signal
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • C07K2319/70Fusion polypeptide containing domain for protein-protein interaction
    • C07K2319/735Fusion polypeptide containing domain for protein-protein interaction containing a domain for self-assembly, e.g. a viral coat protein (includes phage display)

Definitions

  • the present invention is concerned with orthogonal translation systems which allow for the site-specific introduction of non-canonical amino acid (ncAA) residues into a polypeptide of interest (POI) in a POI-mRNA-selective manner.
  • the present invention relates to fusion proteins which bring an RNA-targeting polypeptide (RNA-TP) segment and an orthogonal aminoacyl tRNA synthetase (O-RS) segment into spatial proximity of one another.
  • RNA-TP RNA-targeting polypeptide
  • OF-RS orthogonal aminoacyl tRNA synthetase
  • RNA-TP/O-RS fusion protein RNA-TP/O-RS fusion protein
  • APs polypeptide segments which act as“assemblers”
  • AFPs assembler fusion proteins
  • the invention also relates to AFP combinations and nucleic acid molecules comprising a POI-encoding sequence together with a targeting nucleotide sequence (TN) that is able to interact with an RNA-TP.
  • TN targeting nucleotide sequence
  • the invention further relates to nucleic acid molecules, expression cassettes and expression vectors encoding said RNA- TP/O-RS fusion proteins or AFPs, cells comprising same, as well as methods and kits for translationally preparing POIs.
  • orthogonal (i.e. non-crossreactive) translation systems site-specifically into living cells enables the introduction of new functionality into proteins.
  • this is a herculean task, as translation is a complex multistep process in which at least 20 different aminoacylated tRNAs, their cognate aminoacyl tRNA synthetases (RS), ribosomes and diverse other factors work in concert to synthesize a polypeptide chain from the RNA transcript.
  • RS aminoacyl tRNA synthetases
  • An ideal orthogonal system would show no cross-reactivity with factors of the host machinery, minimizing its impact on the housekeeping translational activity and normal physiology of the cell.
  • GCE genetic code expansion
  • an orthogonal (suppressor) RS can aminoacylate its cognate suppressor tRNA with non-canonical amino acids (ncAAs).
  • ncAAs are typically custom designed and harbor chemical functionalities that can, for example, enable protein function to be photocontrolled, encode posttranslational modifications or allow the introduction of fluorescent labels for microscopy studies using click chemistry.
  • the anticodon loop of the tRNA is chosen to decode and thus suppress one of the stop codons (see, e.g., Liu et al., Annu Rev Biochem 2010, 79:413-444; Lemke, ChemBioChem 2014, 15:1691-1694; Chin, Nature 2017, 550;53-60).
  • the Amber stop codon corresponding tRNAcu A
  • E.cou to terminate endogenous proteins ( ⁇ 10%).
  • any Amber codon in the genome can be suppressed, potentially leading to unwanted background suppression of non- targeted host proteins.
  • this background incorporation might be tolerable as long as the yields of purified full-length protein are acceptable.
  • the challenge is different if the host is considered more than just a bioreactor vessel that can be sacrificed for its protein.
  • the physiological condition of that host cell is an important factor. In that context, minimization of background incorporation of the ncAA is particularly required to ensure well-controlled experiments.
  • orthogonal translation systems which are able to selectively translate the mRNA of a POI can be created by generating spatial proximity between the mRNA of the POI and the O-RSs which allow for translationally introducing the ncAA residues into the growing polypeptide chain of the POI.
  • the inventors demonstrated for a variety of POIs, including membrane proteins, that their OT systems allow for site- specifically introducing ncAA residues into a POI in a mammalian cell with selectivity for the mRNA of the POI compared to other mRNAs in the cytoplasm that contain the same stop codon (that is used as selector codon for encoding the ncAA residue of the POI).
  • the spatial proximity is achieved by including a targeting sequence (TN) in the mRNA of the POI that can selectively interact with an RNA-targeting polypeptide (RNA-TP), and linking the O-RS with such RNA-TP.
  • Said linkage can be in a fusion protein comprising both, the O-RS and the RNA-TP (RNA-TP/O- RS fusion protein).
  • this can be achieved by the action of one or more polypeptide segments which act as“assemblers” (APs) in facilitating a local enrichment of at least two assembler fusion proteins (AFPs) at least one of which comprising the one or more APs and an RNA-TP segment and at least one other AFP comprising the one or more APs and an O- RS segment, thus bringing said RNA-TP and O-RS segments (RNA-TP and O-RS also designated“effector” or ⁇ R”) into close proximity of one another.
  • RNA-TP and O-RS segments also designated“effector” or ⁇ R
  • the local enrichment of the AFPs allows for the formation of assemblies (OT assemblies, also termed“OT organelles” herein) which can act as artificial orthogonally translating organelles.
  • a first type includes APs which drive local enrichment at (previously existing) intracellular structures (such as, e.g., microtubules or the cytoplasmic side of membranes such as the cell membrane or the nuclear membrane, the ER, mitochondrial or Golgi organelles), termed intracellular targeting polypeptide (IC-TP) segments.
  • IC-TP intracellular targeting polypeptide
  • a second type of APs generates high local AFP concentrations by self-association in the cytoplasm (in particular by phase separation) termed phase separation polypeptide (PSP) segments herein.
  • PSP phase separation polypeptide
  • Said AP types may also be combined with other polypeptide elements having the ability to form multimeric structures, like in particular, coiled coil heterodimers, as formed by synthetic SYNZIP polypeptide pairs.
  • said EP types may also be combined with other polypeptide elements having the ability to form multimeric structures, like in particular, coiled coil heterodimers, as formed by synthetic SYNZIP polypeptide pairs.
  • Such multimer formation further improves local enrichment of AFPs.
  • AFPs combining different AP types are particularly useful.
  • AFPs are provided encompassing in a single polypeptide, i.e. fused together, both types of EP segments, i.e. the RNA-TP and O-RS segment, one or both types of AP segments, i.e. the IC-TP and/or PSP segment, optionally supplemented by said polypeptide elements having the ability to form multimeric structures (SYNZIP polypeptide).
  • the present invention relates to an assembler fusion protein (AFP) comprising:
  • a polypeptide segment derived from an intracellular targeting polypeptide (IC-TP segment), wherein said intracellular targeting polypeptide targets, and thus becomes locally enriched at, an intracellular structural element within or directly adjacent to the cytoplasm; and (a2) a polypeptide segment derived from a phase separation polypeptide (PSP segment), wherein said phase separation polypeptide has the ability to undergo self-association in the cytoplasm of a cell so as to create sites of high local concentration in the cytoplasm, and
  • RNA-TP RNA-targeting polypeptide
  • polypeptide segments are functionally linked in said AFP.
  • the present invention relates to an assembler fusion protein (AFP) combination comprising at least two AFPs of the present invention as described herein.
  • the AFP combination comprises at least one AFP comprising a RNA-TP segment and at least one AFP comprising an O-RS segment.
  • Including into at least one AFP of said combination a first SYNZIP element and including in at least another AFP of said combination a second SYNZIP element, wherein said first and said second SYNZIP act together by forming a heterodimer structure, represents another advantageous form of said second aspect..
  • RNA-TP/O-RS fusion protein comprising:
  • RNA-TP RNA-targeting polypeptide
  • polypeptide segments are functionally linked in said RNA-TP/O-RS fusion protein.
  • the present invention provides a nucleic acid molecule, or a combination of two or more nucleic acid molecules, comprising:
  • RNA-TP/O-RS fusion protein of the present invention (i) a nucleotide sequence that encodes at least one RNA-TP/O-RS fusion protein of the present invention as described herein, or
  • nucleic acid molecule or a combination of two or more nucleic acid molecules, comprising:
  • the present invention provides a nucleic acid molecule, or a combination of two or more nucleic acid molecules, comprising:
  • the present invention provides an expression cassette comprising the nucleotide sequence of the nucleic acid molecule, or the combination of nucleic acid molecules, of the present invention as described herein.
  • the present invention provides an expression cassette comprising:
  • RNA-TP/O-RS fusion protein of the present invention (i) a nucleotide sequence that encodes at least one RNA-TP/O-RS fusion protein of the present invention as described herein, or
  • the present invention provides an expression cassette comprising:
  • the present invention provides an expression cassette comprising: (i) a nucleotide sequence that encodes at least one AFP combination of the present invention as described herein, or
  • the present invention provides an expression vector comprising at least one expression cassette of the present invention as described herein.
  • the present invention provides a cell comprising at least one nucleic acid molecule, or combination of nucleic acid molecules, of the present invention as described herein.
  • the cell comprises at least one expression cassette or at least one expression vector of the present invention as described herein.
  • the present invention relates to a method for preparing a polypeptide of interest (POI) comprising in its amino acid sequence one or more non-canonical amino acid (ncAA) residues.
  • Said method comprises expressing the POI in a cell of the present invention in the presence of said one or more ncAAs, wherein the cell comprises:
  • a targeting nucleotide sequence that is functionally linked to the CS P01 and is able to interact with an RNA-TP segment of at least one of the AFPs in the cell;
  • Said at least one AFP comprising a RNA-TP segment and said at least one AFP comprising an O-RS segment recited in (i) can be one and the same type of AFP, i.e. an AFP comprising both a RNA-TP segment and an O-RS segment.
  • said at least one AFP comprising a RNA-TP segment and said at least one AFP comprising an O-RS segment recited in (i) can be different AFPs.
  • the present invention relates to a method for preparing a polypeptide of interest (POI) comprising in its amino acid sequence one or more non-canonical amino acid (ncAA) residues.
  • Said method comprises expressing the POI in a cell of the present invention in the presence of said one or more ncAAs, wherein the cell comprises:
  • RNA-TP/O-RS fusion proteins of the present invention as described herein;
  • a targeting nucleotide sequence that is functionally linked to the CS P01 and is able to interact with an RNA-TP segment of at least one of the RNA-TP/O-RS fusion proteins in the cell;
  • one or more orthogonal tRNA ncAA (0-tRNA ncAA ) molecules which carry the anticodon(s) complementary to the selector codon(s) of the CS P01 , and wherein said 0-tRNA ncAA molecules together with one or more O-RS segments of the RNA- TP/O-RS fusion proteins in the cell form one or more orthogonal 0-RS/0-tRNA ncAA pairs which allow for the introduction of said one or more ncAA residues into the amino acid sequence of the POI;
  • the method optionally further comprises recovering the expressed POI.
  • the present invention relates to a method for preparing a polypeptide of interest (POI) comprising in its amino acid sequence one or more non-canonical amino acid (ncAA) residues. Said method comprises the steps of:
  • orthogonal tRNA ncAA molecules and one or more of the O-RS segments of the AFPs in the cell form one or more orthogonal aminoacyl tRNA synthetase/tRNA ncAA (0-RS/0-tRNA ncAA ) pairs,
  • steps (a) and (b) can be concomitantly or sequentially in any order;
  • the POI-encoding nucleotide sequence (CS P01 ) comprises one or more selector codons encoding said one or more ncAA residues
  • said CS P01 is functionally linked to a targeting nucleotide sequence (TN), thus forming a CS pol /TN fusion sequence,
  • TN targeting nucleotide sequence
  • said CS pol /TN fusion sequence is able to interact, via its TN, with an RNA-TP segment of at least one of the AFPs in the cell;
  • the present invention relates to a method for preparing a polypeptide of interest (POI) comprising in its amino acid sequence one or more non-canonical amino acid (ncAA) residues. Said method comprises the steps of:
  • orthogonal tRNA ncAA molecules and one or more of the O-RS segments of the RNA-TP/O-RS fusion proteins in the cell form one or more orthogonal aminoacyl tRNA synthetase/tRNA ncAA (0-RS/0-tRNA ncAA ) pairs,
  • said 0-RS/0-tRNA ncAA pairs allow for introducing said one or more ncAA residues into the amino acid sequence of said POI
  • steps (a) and (b) can be concomitantly or sequentially in any order;
  • the POI-encoding nucleotide sequence (CS P01 ) comprises one or more selector codons encoding said one or more ncAA residues
  • CS po ' is functionally linked to a targeting nucleotide sequence (TN), thus forming a CS pol /TN fusion sequence
  • said CS pol /TN fusion sequence is able to interact, via its TN, with an RNA-TP segment of at least one of the RNA-TP/O-RS fusion proteins in the cell;
  • the present invention relates to a nucleic acid molecule comprising:
  • CS P01 a nucleotide sequence that encodes a polypeptide of interest (POI), said POI comprising one or more, identical or different, non-canonical amino acid (ncAA) residues which are encoded in the CS P01 by selector codons, and
  • RNA-TP RNA-targeting polypeptide
  • the present invention relates to a kit for preparing a polypeptide of interest (POI) having at least one non-canonical amino acid (ncAA) residue, the kit comprising:
  • Said expression vector comprises at least one expression cassette comprising:
  • RNA-TP/O-RS fusion protein of the present invention (i) a nucleotide sequence that encodes at least one RNA-TP/O-RS fusion protein of the present invention, at least one AFP of the present invention, or at least one AFP combination of the present invention, or
  • Figure 1 shows a schematic representation of the spatial separation of the components which allow for orthogonal translation so as to decode a specific stop codon in a uniquely tagged mRNA.
  • A Conventional expression of the synthetase PylRS leads to aminoacylation of its cognate stop codon suppressor tRNA Pyl with a custom designed ncAA. This leads to site-specific ncAA incorporation whenever the respective stop codon occurs in mRNA of the POI. Given that many endogenous mRNAs terminate on the same stop codon, utilizing this approach in the cytoplasm potentially leads to misincorporation of the ncAA into unwanted proteins (left box).
  • the present invention allows that the mRNA encoding the POI and the orthogonal aminoacyl-tRNA synthetase (e.g., PylRS) can be brought into close proximity to one another through the use of an RNA-targeting polypeptide segment (e.g., MCP) and assemblers (APs), .
  • MCP RNA-targeting polypeptide segment
  • APs assemblers
  • This allows for spatial enrichment of all components so as to create an OT assembly ( ⁇ T organelle”), including the mRNA encoding the POI, the orthogonal aminoacyl-tRNA synthetase, the tRNA, and ribosomes (right box).
  • aminoacylated tRNA Pyl is particularly available in direct proximity of the OT organelle, so that particularly here stop codon suppression (of the POI mRNA) can occur. This leads to a selective suppression of stop codons (and thus expression) of the POI mRNA over corresponding stop codons in mRNAs that are not targeted to the OT assembly. While in (A) GCE occurs stop codon-specific, in (B) it should occur stop codon-specific and mRNA- specific.
  • Figure 2A shows a schematic representation of different assembler classes.
  • B bimolecular MCP::PylRS fusion
  • P1 fusions to FUS and EWSR1
  • P2 SPD5
  • K1 truncation of kinesin KIF13A (KI F1 3AI-4I I ,AP39O)
  • K2 truncation of kinesin KIF16B (KIFI6B1-400) and combinations thereof (K1 ::P1 , K1 ::P2, K2::P1 , K2::P2).
  • FIG. 2B shows a schematic representation of the dual-color reporter.
  • mRNAs encoding the fluorescent proteins GFP and mCherry, containing stop codons at permissive sites, are expressed from one plasmid, each with its own CMV promoter, ensuring a constant ratio of mRNA throughout each experiment.
  • the mRNA of the mCherry reporter is tagged with two MS2 RNA stem-loops (“ms2”, also referred to as MS2-tag herein), mRNA(mCherry)::ms2.
  • the light-gray bars represent the relative efficiency as defined by the mean fluorescence intensity of mCherry for each condition divided by cytoplasmic PylRS control (derived from FFC, see Fig. 2D, E). Shown are the mean values of at least three independent experiments; error bars represent the SEM. The box highlights the best performing OT organelle (OT K2::P1 ).
  • Figure 2D shows the results of the FFC analysis of the dual-color reporter expressed with the four indicated systems in transfected HEK293T cells and tRNA Pyl in the presence of the ncAA SCO, a lysine derivative with a cyclooctyne side chain. Highly selective and efficient orthogonal translation was observed for the OT assembly (the black arrow indicates a bright, highly mCherry-positive population). Shown in the dot plots are the sums of at least three independent experiments. Axes indicate fluorescence intensity in arbitrary units.
  • Figure 2E shows FFC plots for the OT assembly selectively translating Opal and Ochre codons only of recruited mRNA(mCherry 185TGA )::ms2 and mRNA(mCherry 185TAA )::ms2, respectively.
  • Figure 3 shows a schematic representation of the constructs composing the following systems: PylRS, MCP::PylRS, FUS::MCP::PylRS and LcK::FUS::PylRS*LcK::EWS::MCP.
  • Figure 4 shows the flow cytometry analysis of the dual reporter expression with the 4 different systems depicted in Figure 3.
  • HEK293T cells were transfected with constructs encoding the dual reporter, tRNA, LcK::FUS::PylRS and LcK::EWS::MCP or PylRS, MCP:: PylRS, FUS::MCP:: PylRS and pcDNA3.1. Shown is the sum of at least three independent experiments. Axes indicate fluorescence intensity in arbitrary units.
  • Figure 5 shows a bar plot with the ratios of the mean fluorescence intensity of mCherry vs. GFP fluorescence for all the tested systems. Plots represent mean values of at least 3 biological replicates, error bars indicate standard error of means.
  • FIG. 6 provides an overview of different approaches of the present invention for generating OT organelles, which target to the surface of different intra-cellular structures. Different constructs are expressed and the results of the respective fluorescence flow cytometry (FFC) analyses are shown.
  • FFC fluorescence flow cytometry
  • the dual color reporter construct GFP 39TAG * mCherry 185TAG ::ms2 see also Figure 2B as applied in each of the schematically illustrated experiments A to G is depicted and a schematic illustration of different targeted cellular compartments is shown. Control experiments performed without the effector polypeptide MCP (-MCP) are also illustrated for each of the experiments A to G:
  • B OT organelle targeted to microtubule plus ends and obtained by expressing the constructs EB1 ::FUS::MCP::PvlRS or EB1 ::FUS::PvlRS (control).
  • D OT organelle targeted to mitochondrial membrane and obtained by expressing the system TOM20i- 7 o::FUS::PylRS * TOM20i- 7 o::EWSR1 ::MCP or the construct TOM20i- 7o::FUS::PylRS (control).
  • E OT organelle targeted to nuclear membrane and obtained by expressing the system CG1 ::FUS::PylRS*CG1 ::EWSR1 ::MCP or the construct CG1 ::FUS::PylRS (control).
  • FIG. 7 provides an overview of different approaches of the present invention for recruiting RNA using the interaction of different RNA loops and respective RNA targeting proteins. The results of the respective fluorescence flow cytometry (FFC) analyses are shown and compared to the respective analysis as obtained for non-targeted PylRS alone:
  • System ms-2-MCP incorporates the ms2 loops in the UTR of an mRNA molecule and recruits the mRNA with the MCP protein into the artificial organelle.
  • System boxB-A N 22 incorporates the boxB loops in the UTR of an mRNA molecule and recruits the mRNA with the A N 22 protein into the artificial organelle
  • System pp7-PCP incorporates the pp7 loops in the UTR of an mRNA molecule and recruits the mRNA with the PCP protein into the artificial organelle.
  • FIG. 8 illustrates a further approach of the present invention for generating OT organelles which will work on the surface of different cellular structures.
  • the particular approach is characterized by the pairwise incorporation of so-called synthetic heterodimeric-coiled coil peptides SYNZIP1 and SYNZIP2 fused into the system LcK::FUS::SYNZIP1 ::PylRS*EWSR1 ::SYNZIP2::MCP; upon expression SYNZIP1 and 2 pair and recruit MCP to a plasma membrane based OT organelle which in turn enables the selective orthogonal translation of a subsequently recruited mRNA comprising the ms2 targeting nucleotide loops.
  • the polypeptide of interest (POI) that is translationally expressed by the OT system according to the present invention comprises one or more ncAA residues which are encoded in the nucleotide sequence encoding the POI (CS P01 ) by selector codons.
  • the fusion proteins of the invention may be construed in different manner.
  • a first type includes fusion proteins wherein at least two types of effector polypeptides (EPs), comprising at least one RNA-TP and at least one O-RS, are comprised by one and the same fusion protein (also designated as RNA-TP/O-RS fusion proteins).
  • EPs effector polypeptides
  • a second type includes fusion proteins which comprise at least one assembler polypeptide (AP) and at least one type of EP selected from RNA-TP segments and O-RS segments (also designated AFPs).
  • AFPs can comprise both RNA-TP and O-RS segments, such as one or more RNA-TP segments and one or more O-RS segments in any sequential order, in addition to the at least one type of AP.
  • AFPs in particular are selected from the following fusion protein types (segments functionally linked in any order within the polypeptide chain; one or more segments of the same type in any order within the polypeptide chain):
  • APs are selected from IC-TPs and PSPs, and may be composed of one or more IC-TPs and/or one or more of PSPs in any sequential order.
  • AFPs more particularly are selected from the following fusion protein types (segments functionally linked in any order within the polypeptide chain; one or more segments of the same type in any order within the polypeptide chain):
  • APs and/or EPs may also comprise (as part of the fusion protein) heterooligomer forming, in particular heterodimer forming polypeptide segments, like in particular synthetic coiled coil SYNZIP peptides.
  • AFP combinations comprising such interacting SYNZIP pairs distributed between members of said AFP combination, so that each AFP comprises merely one member of such interacting SYNZIP pair are particular embodiments.
  • segment indicates that the thus designated element (e.g., RNA-TP, O-RS, IC-TP, PSP, SYNZIP) is part of the fusion protein, i.e. linked to the remainder of the fusion protein.
  • the segments of the fusion proteins of the invention are functionally linked, i.e. linked such that they still function as RNA-TP, O-RS, IC- TP and PSP or SYNZIP, respectively.
  • Said linkage is preferably covalent, and in particular is a peptidic linkage.
  • RNA-TP segment comprised in the fusion proteins of the present invention is a segment of the fusion protein that is derived from, and functions in the context of the fusion protein as, an RNA-TP, thus allowing the fusion protein to interact with (bind to) the targeted RNA, wherein said interaction is expediently a specific one.
  • an RNA-TP segment may comprise the (entire) amino acid sequence, or a functional fragment, of an RNA-targeting polypeptide as described herein.
  • an O-RS segment comprised by the fusion proteins of the present invention is a segment of the fusion protein that is derived from, and functions in the context of the fusion protein as, an O-RS, thus conferring to the fusion protein O-RS enzymatic activity, that is the ability to catalyze the aminoacylation of an O-tRNA with an ncAA.
  • an O-RS segment may comprise the (entire) amino acid sequence, or a functional fragment, of an O-RS as described herein.
  • AFPs assembler fusion proteins
  • AP refers to any polypeptide segment that allows for enrichment of AFPs comprising said segment at spatially distinct sites within a living cell. Expediently said spatially distinct sites are located within, or directly adjacent to, the cytoplasm of the cell and readily accessible to the translational machinery of the cell (which includes canonical aminoacylated tRNAs, translation factors, ribosomal subunits, etc.) as well as the O-tRNAs which allow for the introduction of the ncAA residues into the POI.
  • polypeptide segments which can serve as APs in the present invention.
  • One type of APs are polypeptide segments which are derived from, and function in the context of the fusion protein as, intracellular targeting polypeptides (IC-TPs). These IC- TP segments may comprise the (entire) amino acid sequence, or a function fragment, of an IC-TP. IC-TPs target, and thus become locally enriched at, intracellular structural elements within, or directly adjacent to, the cytoplasm. Examples of such structural elements include microtubules, the cytoplasmic side of membranes such as the cell membrane, the nuclear membrane, the mitochondrial membrane, the Golgi membrane, the ER membrane, etc.
  • the fusion protein of the present invention comprises at least one IC-TP segment that targets, and facilitates local enrichment of the fusion protein at, microtubules, in particular the plus end or the minus end of the microtubules).
  • IC-TPs dyneins and kinesins (proteins of the dynein or kinesin family of proteins), and functional fragments and mutants thereof, can be used as IC-TPs for such function.
  • the fusion protein of the present invention comprises at least one IC-TP segment that is derived from, and functions as, a membrane anchor.
  • the fusion protein of the present invention comprises at least one IC-TP segment that targets, and facilitates local enrichment of the fusion protein at, the (inner) cell membrane (in particular the cytoplasmic side of the cell membrane).
  • the fusion protein of the present invention comprises at least one IC-TP segment that targets, and facilitates local enrichment of the fusion protein at, the (outer) nuclear membrane (in particular the cytoplasmic side of the nuclear membrane).
  • the fusion protein of the present invention comprises at least one IC-TP segment that targets, and facilitates local enrichment of the fusion protein at, the outer mitochondrial membrane (in particular the cytoplasmic side of the mitochondrial membrane). In further particular embodiments, the fusion protein of the present invention comprises at least one IC-TP segment that targets, and facilitates local enrichment of the fusion protein at, the outer ER membrane (in particular the cytoplasmic side of the ER membrane). In further particular embodiments, the fusion protein of the present invention comprises at least one IC-TP segment that targets, and facilitates local enrichment of the fusion protein at, the outer Golgi membrane (in particular the cytoplasmic side of the Golgi membrane). For instance, the transmembrane domain of membrane proteins, and functional fragments and mutants thereof, can be used as IC-TPs for such function.
  • IC-TPs Polypeptides which target, and thus become locally enriched at, intracellular structural elements as described above, are known in the art and are useful as IC-TPs in the present invention.
  • IC-TPs include, but are not limited to:
  • optionally truncated kinesin polypeptides which constitutively move towards, and become locally enriched at, microtubule-plus ends in living cells, for example optionally truncated kinesin family member 16B (KIF16B), e.g. optionally truncated Homo sapiens KIF16B (Uniprot: Q96L93), in particular the fragment covering KIF16B amino acid residues 1-400 (KIFI6B1-400) comprising the amino acid sequence of SEQ ID NO:20; or optionally truncated kinesin family member 13A (KIF13A), e.g.
  • KIF16B optionally truncated Homo sapiens KIF16B
  • KIF13A optionally truncated kinesin family member 13A
  • KIF13A optionally truncated Homo sapiens KIF13A (Uniprot: Q9H1 H9), in particular the KIF13A fragment covering amino acid residues 1-411 wherein P390 is deleted (KI F1 3AI-4H ,A39O) comprising the amino acid sequence of SEQ ID NO:22; polypeptides EB1 , a microtubule tip binding protein, that binds to growing microtubule plus ends (Nehlig A, Molina A, Rodrigues- Ferreira S, Honore S, Nahmias C. Regulation of end-binding protein EB1 in the control of microtubule dynamics. CeU Mo ⁇ Life ScL 2017;74(13)2381-2393.
  • cell membrane-targeting polypeptides derived from transmembrane-proteins such as, e.g., lymphocyte-specific protein tyrosine kinase (LcK; e.g., Mus musculus LcK, Uniprot: P06240), CD4 (e.g., Mus musculus CD4, Uniprot: P06332), FRB (similar to Homo sapiens mTOR; Uniprot: P42345), CD28 (e.g., Mus musculus CD28, Uniprot: P31041) and combinations thereof, in particular polypeptides comprising the amino acid sequence of SEQ ID NO:26, SEQ ID NO:28 or SEQ ID NO:30;
  • LcK lymphocyte-specific protein tyrosine kinase
  • CD4 e.g., Mus musculus CD4, Uniprot: P06332
  • FRB similar to Homo sapiens mTOR
  • CD28 e.g., Mus muscul
  • polypeptides CG1 a nucleoporin that binds to the cytoplasmic side of the nuclear pore complex
  • a nucleoporin that binds to the cytoplasmic side of the nuclear pore complex (Fernandez-Martinez J, Kim SJ, Shi Y, et al. Structure and Function of the Nuclear Pore Complex Cytoplasmic mRNA Export Platform. Cell. 2016;167(5):1215— 1228.e25. doi: 10.1016/j.celL2016.10.028) (also designated Nup42) (Uniprot:O15504) targeting the cytoplasmic side of the nuclear membrane comprising the amino acid sequence of SEQ ID NO:304
  • Golgi membrane protein with one transmembrane helix Engelsberg A, Hermosilla R, Karsten U, SchCilein R, Dorken B, Rehm A.
  • the Golgi protein RCAS1 controls cell surface expression of tumor-associated O-linked glycan antigens. J Biol Chem. 2003;278(25):22998-23007. doi:10.1074/jbc.M301361200
  • polypeptide fragments of P450 2C1 a endoplasmic reticulum resident protein (Fazal FM, Han S, Parker KR, et al. Atlas of Subcellular RNA Localization Revealed by APEX- Seq. Cell. 2019;178(2):473 ⁇ 90.e26. doi:10.1016/j.cell.2019.05.027) (Uniprot: P78382) targeting the cytoplasmic side of the ER membrane in particular a fragment comprising the N-terminal first 27 (SEQ ID NO:298); or the first 29 (SEQ ID NO:300;)amino acid residues
  • SLP-3 transmembrane protein stomatin-like protein 3 (SLP-3) (membrane comprising the amino acid sequence of SEQ ID NO:310; aa 1-59 (Homo sapiens, Uniprot: Q8TAV4), localizing to the plasma membrane and vesicular membranes (Lapatsina L, Jira JA, Smith ES, et al. Regulation of ASIC channels by a stomatin/STOML3 complex located in a mobile vesicle pool in sensory neurons. Open Biol. 2012;2(6):120096. doi: 10.1098/rsob.120096)
  • Said functional fragments and mutants may have at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid of the polypeptide they are derived from.
  • a further type of APs are polypeptide segments, which are derived from, and function in the context of the fusion protein as, phase separation polypeptides (PSPs).
  • PSPs are polypeptides, which have the ability to self-assemble in the cytoplasm of a cell so as to create sites of high local concentration in the cytoplasm.
  • PSPs are able to drive phase separation (in particular liquid-liquid phase separation) leading to the formation of membrane-less compartments in the cytoplasm.
  • Said compartments may take the form of droplets, aggregates, condensates or a dense phase.
  • PSPs include intrinsically disordered proteins (I DPs) which are an important class of proteins that drive phase separation (see, e.g., Alberti et a!., Bioessays 2016, 38:959-968 and references cited therein such as Patel et a/., Cell 2015, 162:1066-1077; Han et a/., Cell 2012, 149:768-779; Kato et al., Cell 2012, 149:753-767).
  • ICPs intrinsically disordered proteins
  • proteins of each, or functional fragments or mutants thereof can be used as PSPs in the present invention.
  • IDPs contains so called prion-like domains which are devoid of charges and contain polar amino acid residues (Q, N, S, G) with interspersed aromatic residues (F, Y). See, e.g., Malinovska et al., Biochim Biophys Acta 2013, 1834:918-931 ; Alberti et al., 2009, Cell 137:146-158, Malinovska et al., Prion 2015, 9:339-346.
  • Another class of IDPs is also characterized by low sequence complexity but frequently contains acidic and basic amino acid side chains, e.g. RGG repeat containing IDPs such as Ddx4. See Nott et aL, Cell 2015, 57:936-947.
  • suitable IC-TPs include, but are not limited to:
  • SPD5 spindle-defective protein 5
  • P91349 spindle-defective protein 5
  • a polypeptide comprising the amino acid sequence of SEQ ID NO:32;
  • FUS fused-in sarcoma
  • a polypeptide comprising the amino acid sequence of SEQ ID NO:34;
  • Ewing sarcoma breakpoint region 1 (e.g., Homo sapiens EWSRt, Uniprot: Q01844) , in particular a polypeptide comprising the amino acid sequence of SEQ ID NO:36;
  • ATP-dependent RNA helicase laf-1 (RGG domain, 1-168, LAF-1 membrane comprising the amino acid sequence of SEQ ID NO:308;)
  • Caenorhabditis elegans, Uniprot: D0PV95 Caenorhabditis elegans, Uniprot: D0PV95
  • Schot BS Reed EH, Parthasarathy R, et al. Controllable protein phase separation and modular recruitment to form responsive membraneless organelles. Nat Commun. 2018;9(1):2985. Published 2018 Jul 30. doi:10.1038/s41467-018-05403-1) as well as functional fragments and mutants of these polypeptides.
  • Said functional fragments and mutants may comprise at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid of the polypeptide they are derived from.
  • the number of APs comprised by fusion proteins of the present invention is not particularly limited, i.e. a fusion protein may comprise 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10 or more same or different APs. Fusion proteins of the present invention which comprise at least one AP selected from IC-TP segments and at least one AP selected from PSP segments are particularly preferred.
  • the number of RNA-TP segments is not particularly limited and may be independently selected from 1 , 2, 3, 4, 5 or more, as for example 6, 7, 8, 9 or 10, different or same RNA-TP segments.
  • the number of O-RS segments is not particularly limited and may be independently selected from 1 , 2, 3, 4, 5 or more, as for example 6, 7, 8, 9 or 10, different or same O-RS segments.
  • RNA-TP/O-RS fusion protein structures comprising both types of EP segments.
  • x and y independently of each other, are integers selected from 1 , 2, 3, 4 and 5; designates a peptidic linkage.
  • RNA-TP] x for x32 may include the same or different RNA-TP segments.
  • [0-RS] y for y32 may include the same or different O-RS segments.
  • RNA-TP/O-RS fusion protein structures include, but are not limited to:
  • n and o independently of each other, are integers selected from 1 , 2, 3, 4 or 5, or are selected from 1 , 2, 3, 4, 5, 6 and designates a peptidic linkage.
  • “m” is the integer 1.
  • n is an integer selected from 1 and 2.
  • “o” is an integer selected from 1 , 2, 3, 4, 5 or 6 if EP is selected from RNA-TPs. In still another preferred embodiment“o” is an integer selected from 1 or 2, if EP is selected from O-RSs.
  • RNA-TP/O-RS fusion protein structures those are preferred wherein at least one ICT-TP takes a C- or N- terminal position within the polypeptide chain.
  • RNA-TP/O-RS fusion protein structures those are preferred wherein at least one EP takes a C- or N- terminal position within the polypeptide chain.
  • RNA-TP/O-RS fusion protein structures those are preferred wherein at least one ICT-TP takes a C- or N- terminal position within the polypeptide chain while at least one EP takes a N- or C- terminal position, respectively, within the polypeptide chain. Any PSP, if present in such structure, is positioned within the polypeptide chain.
  • [IC-TP] m for m32 may include the same or different IC-TP segments.
  • IC-TPs of the same functionality targeting the same type of cellular structure (as for example same membrane type or type or organelle) are applied.
  • [PSP] n for n32 may include the same or different PSP segments.
  • [EP] 0 for o32 may include the same or different EPs. Where [EP] 0 includes different EPs, for example at least one EP may be a RNA-TP segment and at least one may be an O-RS segment.
  • the fusion proteins of the present invention provide an orthogonal translation (OT) system wherein the one or more O-RS (segments) required for the introduction of the one or more ncAA residues into the POI are brought into spatial proximity to at least one RNA-targeting polypeptide (RNA-TP) segment.
  • RNA-TP RNA-targeting polypeptide
  • the mRNA of the POI comprises at least one targeting nucleotide sequence (TN) that is able to interact with an RNA-TP segment of at least one of the fusion proteins of the OT system. Said interaction is expediently a specific one.
  • the RNA-TP segments of the fusion proteins of the invention are preferably mRNA-targeting polypeptide segments.
  • RNA-TP segment of the fusion protein and the TN of the POI mRNA are expediently chosen so as to specifically interact with (bind to) one another.
  • Suitable pairs of RNA-TP segment and TN for this purpose can be selected from coat proteins of RNA viruses and the nucleic acid motifs bound by said coat proteins. Such viral coat proteins and protein-bound RNA motifs are known in the art.
  • RNA-TPs include, but are not limited to:
  • MCP coat protein of Enterobacteria phage MS2
  • a polypeptide comprising the amino acid sequence of SEQ ID NO: 14;
  • a N 22 (22 amino acid RNA-binding domain of lambda phage antiterminator protein N), in particular a polypeptide comprising the amino acid sequence of SEQ ID NO:16;
  • POP coat protein of Bacteriophage PP7, Wu B, Chao JA, Singer RH. Fluorescence fluctuation spectroscopy enables quantitative imaging of single mRNAs in living cells. Biophys J. 2012;102(12)2936-2944. doi:10.1016/j.bpj.2012.05.017), in particular a polypeptide comprising the amino acid sequence of SEQ ID NO:306;
  • Said functional fragments and mutants may comprise at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid of the polypeptide they are derived from.
  • Suitable TNs include, but are not limited to:
  • Enterobacteria phage MS2 RNA stem-loop in particular a polynucleotide having an RNA sequence corresponding to (encoded by) the nucleotide (DNA) sequence of SEQ ID NO:17;
  • BoxB (lambda phase RNA stem-loop, specific binding site of A N 22), in particular a polynucleotide having an RNA sequence corresponding to (encoded by) the nucleotide (DNA) sequence of SEQ ID NO:18;
  • Bacteriophage pp7 RNA stem loops (Wu B, Chao JA, Singer RH. Fluorescence fluctuation spectroscopy enables quantitative imaging of single mRNAs in living cells. Biophys J. 2012; 102(12)2936-2944. doi:10.1016/j.bpj.2012.05.017) in particular a polynucleotide having an RNA sequence corresponding to (encoded by) the nucleotide (DNA) sequence of SEQ ID NO:289 or SEQ ID NO:290 as well as functional fragments and mutants thereof.
  • Said functional fragments and mutants may comprise at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% nucleotide sequence identity to the polynucleotide sequences they are derived from.
  • Such TNs may be used as a single copy segment or as multiple copy segment composed of more than one, as for example two, three, four, five, six or more repetitive units of the TN.
  • the MCP specifically interacts with MS2 RNA stem-loops.
  • the mRNA of the POI expediently comprises one or more MS2 RNA stem-loops, e.g. two, three, four, five or six MS2 RNA stem-loops.
  • a N 22 specifically interacts with BoxB.
  • the mRNA of the POI expediently comprises one or more BoxB motifs, e.g.
  • the RNA-TP segment(s) of the fusion protein(s) comprise (consist of) segments which are derived from, and function as, PCP
  • the mRNA of the POI expediently comprises one or more pp7 RNA stem-loops, e.g. two, three, four, five or six or more pp7 RNA stem-loops.
  • RSs have been used for genetic code expansion including the Methanococcus jannaschii tyrosyl-tRNA synthetase, E.coli tyrosyl-tRNA synthetase, E.coli leucyl-tRNA synthetase pyrrolysyl-tRNA synthetases from certain Methanosarcina (such as M. mazei, M. barkeri, M. acetivorans, M. thermophila), Methanococcoides (M. burtonii ) or Desulfitobacterium ( D . hafniense ).
  • Methanosarcina such as M. mazei, M. barkeri, M. acetivorans, M. thermophila
  • Methanococcoides M. burtonii
  • Desulfitobacterium D . hafniense
  • Pyrrolysyl tRNA synthetases which can be used in methods and fusion proteins of the invention may be wildtype or genetically engineered PylRSs.
  • wildtype PylRSs include, but are not limited to PylRSs from archaebacteria and eubacteria such as Methanosarcina maize, Methanosarcina barkeri, Methanococcoides burtonii, Methanosarcina acetivorans, Methanosarcina thermophila and Desulfitobacterium hafniense. Genetically engineered PylRSs have been described, for example, by Neumann et al.
  • PylRSs which are used in the fusion proteins and methods of the present invention may be PylRSs lacking the NLS and/or comprising a NES as described, e.g., in WO 2018/069481.
  • Methanococcus jannaschii tyrosyl-tRNA synthetase Methanococcus jannaschii tyrosyl-tRNA synthetase
  • thermophila pyrrolysyl-tRNA synthetase Methanosarcina thermophila pyrrolysyl-tRNA synthetase
  • Desulfitobacterium hafniense pyrrolysyl-tRNA synthetase Desulfitobacterium hafniense pyrrolysyl-tRNA synthetase
  • Said functional fragments and mutants may comprise at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the aminoacyl tRNA synthetase they are derived from.
  • O-RS segments useful as in the present invention which are derived from M. mazei pyrrolysyl-tRNA synthetases include, but are not limited to:
  • O-RS segments derived from PylRS AF Methanosarcina mazei pyrrolysyl tRNA synthetase double mutant: Y306A, Y384F; Uniprot: Q8PWY1
  • O-RS segments comprising the amino acid sequence of SEQ ID NO:8;
  • O-RS segments derived from PylRS ⁇ Methanosarcina mazei pyrrolysyl tRNA synthetase double mutant: N346A, C348A; Uniprot: Q8PWY1
  • O-RS segments comprising the amino acid sequence of SEQ ID NO: 10;
  • O-RS segments derived from PylRS ⁇ 1 Methanosarcina mazei pyrrolysyl tRNA synthetase quadruple mutant: Y306A, N346A, C348A, Y384F; Uniprot: Q8PWY1
  • O-RS segments comprising the amino acid sequence of SEQ ID NO:12
  • O-RS segment derived from IFRS1 a Methanosarcina mazei pyrrolysyl tRNA mutant (L305M, Y306L, L309S, N346S, C348M), for example O-RS segments comprising the amino acid sequence of SEQ ID NO:224
  • O-RS segment derived from CbzRS a Methanosarcina mazei pyrrolysyl tRNA mutant (Y306M, L309G, C348T), for example O-RS segments comprising the amino acid sequence of SEQ ID NO:226
  • O-RS segment derived from CpkRS a Methanosarcina mazei pyrrolysyl tRNA mutant (A302S), for example O-RS segments comprising the amino acid sequence of SEQ ID NO:228
  • Said functional fragments and mutants may comprise at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the aminoacyl tRNA synthetase they are derived from.
  • wild-type and mutant M. mazei PylRSs as described herein are used for aminoacylation of tRNA with ncAAs as described in WO2012/104422 or WO2015/107064.
  • ncAAs for this purpose include, but are not limited to, 2-amino- 6-(cyclooct-2-yn-1-yloxycarbonylamino)hexanoic acid (SCO), 2-amino-6-(cyclooct-2-yn-1- yloxyethoxycarbonylamino)hexanoid acid, 2-amino-6[(4E-cyclooct-4-en-1-yl)oxycarbonyl- amino]hexanoic acid (TOO), 2-amino-6[(2E-cyclooct-2-en-1-yl)oxycarbonylamino]hexanoic acid (TOO*), 2-amino-6-(prop-2-ynoxycarbonylamino)hexanoic acid
  • SCO 2-
  • the above-mentioned AP (IC-TP and PSP) segments and/or the above mentioned EP (RNA-TP and O-RS) segments, independently of each other, may be further combined with natural or, more particularly, synthetic protein segments, which induce and control macromolecular interactions.
  • such further protein segments are operably fused into the polypeptide chain of an AFP of the invention.
  • One or more, like 2, 3, 4, 5, 6, 7, 8, 9 or 10, preferably however one such protein segment may be operably fused into a single AFP of the invention. Fusion into the AFP polypeptide chain should be such that the activity of the other polypeptide segments, AP and EP, is substantially unaffected, in particular not inhibited (i.e.
  • SYNZIP peptides forming multimeric structures.
  • SYNZIPs having the ability to form specific heterodimeric coiled-coil protein structures.
  • SYNZIPs are pairs of synthetic peptides capable of interacting with each other and are used to induce and control macromolecular interactions.
  • Non-limiting examples are the pairs SYNZIP 1 :2; SYNZIP 3:4 and SYNZIP 5:6.
  • heterospecific coiled-coil pair SYNZIP2:SYNZIP1 as described by Reinke, A.W., Grant, R.A., Keating, A.E. (2010) J Am Chem Soc 132 6025-6031
  • SYNZIP 1 SEQ ID NO:312
  • SYNZIP 2 SEQ ID NO:314,
  • SYNZIP 3 SEQ ID NO:316;
  • SYNZIP 4 SEQ ID NO:318, as well as functional fragments and mutants of these SYNZIP polypeptides.
  • Said functional fragments and mutants may comprise at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid of the polypeptide they are derived from).
  • these SYNZIPs are preferably used pairwise in AFP combinations as described herein. By the interaction of such SYNZIP pairs integrated in different AFP fusion proteins the formation of OT organelles according to the present invention may be further supported.
  • a fusion protein of the invention may be further modified by introducing into (fusing of) at least one so-called“epitope tag”, i.e. a short oligopeptide sequence, which serves as antibody binding sites, useful for detecting/quantifying the expressed fusion products of the invention.
  • “epitope tag” i.e. a short oligopeptide sequence, which serves as antibody binding sites, useful for detecting/quantifying the expressed fusion products of the invention.
  • VSV-G Vesicular stomatitis virus glycoprotein epitope tag (SEQ ID NO:680)
  • HA Human influence hemagglutinin epitope tag (SEQ ID NO:682)
  • Myc Human c-Myc proto-oncogene epitope tag (SEQ ID NO:684)
  • Each individual exemplified construct may be construed in the N->C or C->N direction.
  • the depicted schemes are given in the N->C direction.
  • segment blocks [IC-TP] m , [PSP] n , [0-RS] y and [RNA-TP] X wherein m, n, y or x are an integer >1 , the repetitive segments within such block may be identical or different, preferably identical.
  • the segments [IC-TP], [PSP], [O-RS], [RNA-TP] X , and [SYNZIP] as applied therein may be prepared from the respective examples of segments described above in section 1.1..
  • Intracellular structure-targeting monofunctional AFPs i.e. comprising one type of EP
  • AFPs are the same AFPs as listed above in sections 1.2.1 and 1.2.2 with the only exception that at least one of the segment [IC-TP], [PSP], [0-RS 2 ] or [RNA-TP] is N- or C- terminally supplemented with a SYNZIP element.
  • An AFP may contain, 1 , 2, 3, 4 or 5, preferably 1 or 2, identical or different, preferably identical SYNZIPs.
  • Non-limiting examples of such molecules are:
  • IC-TP and PSP may be preferably used in combination with an AFP molecule containing at least one C-TP and/or PSP segment.
  • Tables 1 , 2 and 3 Very specific examples of fusion protein of the inventions, and particular combinations thereof are listed below in Tables 1 , 2 and 3.
  • the content of this Tables 1 , 2 and 3 also forms part of general disclosure of the specification and its content is not explicitly and literally repeated here in the general part.
  • the disclosure of Tables 1 and 2 in the respective column designated“Fusion protein(s) comprising O-RS and RNA-TP segments” shall be considered as disclosed independently from the content of the other columns of Tables 1 and 2 referring to specific reports and host cell lines.
  • RNA-TPs RNA-binding activity of the parent RNA-TP
  • targeting activity for intracellular structures of the parent IC-TP the targeting activity for intracellular structures of the parent IC-TP
  • self- assembly activity of the parent PSP the binding activity for RNA-TP of the parent TN
  • enzymatic activity of the parent O-RS or the heterodimeric coiled-coil formation ability of parent SYNZIPs, respectively.
  • Such fragments and mutants can be characterized by a minimum degree of sequence identity as described herein.
  • Said amino acid or nucleotide sequence identity means identity over the entire length of the thus characterized amino acid or nucleotide sequence, respectively.
  • the percentage identity values can be determined as known in the art on the basis of BLAST alignments, blastp algorithms (protein-protein BLAST), or using the Clustal method (Higgins et ai, Comput Appl. Biosci. 1989, 5(2): 151 -1 ).
  • Fragments and mutants of particular RNA-TPs, O-RSs, IC-TPs, SYNZIPS or PSPs which are useful in the present invention retain the relevant function (binding, self-assembly or enzymatic activity, respectively) of the parent polypeptide and can be obtained, e.g., by conservative amino acid substitution, i.e. the replacement of an amino acid residue with different amino acid residues having similar biochemical properties (e.g. charge, hydrophobicity and size) as known in the art. Typical examples are substitution of Leu by lie or vice versa, substitution of Asp by Glu or vice versa, substitution of Asn by Gin or vice versa, and others.
  • translation system generally refers to a set of components necessary to incorporate a naturally occurring amino acid in a growing polypeptide chain (protein).
  • Components of a translation system can include, e.g., ribosomes, tRNAs, aminoacyl tRNA synthetases, mRNA and the like.
  • An aminoacyl tRNA synthetase (RS) is an enzyme capable of aminoacylating a tRNA with an amino acid or an amino acid analog.
  • An RS used in processes of the invention is capable of aminoacylating a tRNA with the corresponding ncAA, i.e. aminoacylating a tRNA ncAA .
  • orthogonal refers to an element of a translation system (e.g., an orthogonal tRNA (O-tRNA) and/or an orthogonal aminoacyl tRNA synthetase (O-RS)) that is used with reduced efficiency by a translation system of interest (e.g., a cell).
  • O-tRNA orthogonal tRNA
  • OF-RS orthogonal aminoacyl tRNA synthetase
  • an O-tRNA in a translation system of interest is aminoacylated by any endogenous RA of the translation system with reduced or even zero efficiency, when compared to aminoacylation of an endogenous tRNA by the endogenous RS.
  • an O-RS aminoacylates any endogenous tRNA in the translation system of interest with reduced or even zero efficiency, as compared to aminoacylation of the endogenous tRNA by an endogenous RS.
  • the term “orthogonal translation system” or “OT system” is used herein to refer to a translation system using an 0-RS/0-tRNA ncAA pair that allows for introducing ncAA residues into a growing polypeptide chain.
  • the O- tRNA ncAA is preferentially aminoacylated with the ncAA by the O-RS.
  • the orthogonal pair functions in the translation system of interest (e.g, the cell) such that the O- tRNA ncAA is used to incorporate the ncAA residue into the growing polypeptide chain of a POI. Incorporation occurs in a site specific manner.
  • the 0-tRNA ncAA recognizes a selector codon (e.g., an Amber, Ochre or Opal stop codon) in the mRNA coding for the POI.
  • preferentially aminoacylates refers to an efficiency of, e.g., about 50% efficient, about 70% efficient, about 75% efficient, about 85% efficient, about 90% efficient, about 95% efficient, or about 99% or more efficient, at which an O-RS aminoacylates an O-tRNA with an unnatural amino acid compared to an endogenous tRNA or amino acid of a translation system of interest (e.g., a cell).
  • tRNAs which can be used for being aminoacylated by a fusion protein of the present invention comprising at least one O-RS segment derived from a M. mazei pyrrolysyl tRNA synthetase include, but are not limited to pyrrolysyl tRNA of M.
  • the anticodon is the anticodon to a selector codon such as, e.g., the CUA anticodon to the Amber stop codon TAG, the anticodon UCA to the Opal stop codon TGA, and the anticodon UUA to the Ochre stop codon TAA.
  • a selector codon such as, e.g., the CUA anticodon to the Amber stop codon TAG, the anticodon UCA to the Opal stop codon TGA, and the anticodon UUA to the Ochre stop codon TAA.
  • pyrrolysyl tRNAs include, but are not limited to, those encoded by the nucleotide sequence of SEQ ID NO:4 (tRNA Pyl cUA ), SEQ ID NO:5 (tRNA Pyl uCA ) or SEQ ID NO:6 (tRNA Pyl uUA ).
  • Non-limiting examples of further suitable tRNAs are the following ones derived from pyrrolysyl tRNA of M. mazei: tRNA pyl ' CGA Pyrrolysyl tRNA (for Serine codon), SEQ ID NO: 229
  • “selector codon” as used herein refers to a codon that is recognized (i.e. bound) by the 0-tRNA ncAA in the translation process.
  • the term is also used for the corresponding codons in polypeptide-encoding sequences of polynucleotides which are not messenger RNAs (mRNAs), e.g. DNA plasmids.
  • mRNAs messenger RNAs
  • the new OT systems described herein allow for orthogonal translation of POIs in a manner that is selective for the mRNA of said POIs compared to other mRNAs present in the cytoplasm of the cell.
  • the selector codon is a codon of low abundance in the cell chosen for expression, for example a codon of low abundance in naturally occurring eukaryotic cells.
  • the new OT systems bring the mRNA of the POIs, the O-RS and the tRNA ncAA into proximity to one another, thus supporting the introduction of the ncAA (rather than the introduction of an amino acid of a different tRNA that might potentially bind to the selector codon) at the selector codon-encoded amino acid position of the POI.
  • the selector codon can be a sense codon.
  • the selector codon is a codon that is not recognized by endogenous tRNAs of the cell used for preparing the POI.
  • the anticodon of the 0-tRNA ncAA binds to a selector codon within an mRNA (the mRNA of the POI) and thus incorporates the ncAA site-specifically into the growing chain of the polypeptide (POI) encoded by said mRNA.
  • selector codons which are useful in the new OT systems described herein include, but are not limited to:
  • nonsense codons such as stop codons, e.g., Amber (UAG), Ochre (UAA), and Opal (UGA) codons;
  • codons consisting of more than three bases (e.g., four base codons);
  • a selector codon that is a sense codon (i.e., a natural three base codon)
  • the endogenous translation system of the cell used for POI expression according to a method of the present invention does not (or only scarcely) use said natural three base codon, e.g., a cell that is lacking, or has a reduced abundance of, a tRNA that recognizes the natural three base codon or a cell wherein the natural three base codon is a rare codon.
  • the use of one or more stop codons, such as one or more of Amber, Ochre and Opal, as selector codons in the present invention is particularly preferred.
  • a number of selector codons can be introduced into a polynucleotide encoding a desired polypeptide (target polypeptide, POI), e.g., one or more, two or more, more than three, etc. selector codons.
  • a POI can carry two or more ncAA residues. Said ncAA residues can be the same and encoded by the same type of selector codon, or can be different and encoded by different selector codons.
  • An anticodon has the reverse complement sequence of the corresponding codon.
  • a suppressor tRNA is a tRNA (such as an 0-tRNA ncAA ) that alters the reading of a messenger RNA (mRNA) in a given translation system (e.g., a cell).
  • mRNA messenger RNA
  • a suppressor tRNA can read through, e.g., a stop codon, a four base codon, or a rare codon.
  • the O-tRNA is preferentially aminoacylated by O-RS (rather than endogenous synthetases) and is capable of decoding a selector codon, as described herein.
  • O-RS recognizes the O-tRNA, e.g., with an extended anticodon loop, and preferentially aminoacylates the O-tRNA with an ncAA.
  • the O-tRNA and the O-RS used in the methods and/or fusion proteins of the invention can be naturally occurring or can be derived by mutation of a naturally occurring tRNA and/or RS from a variety of organisms.
  • the tRNA and RS are derived from at least one organism.
  • the tRNA is derived from a naturally occurring or mutated naturally occurring tRNA from a first organism and the RS is derived from naturally occurring or mutated naturally occurring RS from a second organism.
  • a suitable (orthogonal) tRNA/RS pair may be selected from libraries of mutant tRNA and RS, e.g.. based on the results of a library screening.
  • a suitable tRNA/RS pair may be a heterologous tRNA/synthetase pair that is imported from a source species into the translation system.
  • the cell used as translation system is different from said source species.
  • Methods for evolving tRNA/RS pairs are described, e.g., in WO 02/085923 and WO 02/06075. Conventional site-directed mutagenesis can be used to introduce selector codons into the coding sequence of a POI.
  • the invention also relates to nucleic acid molecules (single-stranded or double-stranded DNA and RNA sequences, for example cDNA, mRNA), or combinations of such nucleic acid molecules, comprising a nucleotide sequence that encodes for at least one of the fusion proteins of the present invention, and/or a nucleotide sequence complementary thereto.
  • nucleic acid molecules single-stranded or double-stranded DNA and RNA sequences, for example cDNA, mRNA
  • nucleic acid molecules comprising (i) a nucleotide sequences (CS P01 ) that encodes at least one POI, said POI comprising one or more ncAA residues which are encoded in the CS P01 by selector codons, and (ii) a targeting nucleotide sequence (TN) as described herein, wherein an RNA molecule comprising (the RNA version of) said TN is able to interact via said TN with an RNA-targeting polypeptide (RNA-TP).
  • RNA-TP RNA-targeting polypeptide
  • the nucleic acid molecules of the invention can in addition contain untranslated sequences of the 3'- and/or 5'-end of the coding gene region.
  • the TN is preferably located at the 3' end of the nucleic acid molecule encoding the POI(s).
  • nucleic acid molecules of the invention encoding the POI(s) can be prepared by introducing at least one TN at (in particular 3' of) the 3' untranslated region using common cloning techniques known in the art.
  • nucleic acid molecules of the invention can in addition contain untranslated sequences of the 3'- and/or 5'-end of the coding gene region.
  • the invention further relates to, in particular recombinant, expression constructs or expression cassettes, containing, under the genetic control of regulatory nucleic acid sequences the nucleic acid sequence of the nucleic acid molecule, or combination of nucleic acid molecules, of the invention as described herein.
  • the expression cassettes of the invention thus comprise the nucleic acid sequence coding for at least one POI (plus TN) or at least one fusion protein of the invention, and/or a nucleic acid sequence complementary thereto.
  • the invention also relates to, in particular recombinant, vectors, comprising at least one of these expression constructs (expression vectors).
  • An expression cassette typically comprises a promoter sequence that is located 5' (upstream) of, and functionally linked with, the nucleic acid sequence encoding the to-be- expressed POI(s) or fusion protein(s), a terminator sequence 3' (downstream) of said encoding sequence and optionally further regulatory elements.
  • further regulatory elements include, but are not limited to, targeting sequences, enhancers, polyadenylation signals, selectable markers, amplification signals, replication origins and the like. Suitable regulatory sequences are described for example in Goeddel, Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, CA (1990).
  • the natural regulation of these sequences can still be present before the actual structural genes and optionally can have been genetically altered, so that the natural regulation has been switched off and expression of the genes has been increased.
  • the nucleic acid construct can, however, also be of simpler construction, i.e. no additional regulatory signals have been inserted before the coding sequence and the natural promoter, with its regulation, has not been removed. Instead, the natural regulatory sequence is mutated so that regulation no longer takes place and gene expression is increased.
  • a "functional" linkage of elements of nucleic acid molecules means that these elements are arranged such that the encoding sequence can be transcribed and the optional regulatory elements can perform their regulation of said transcription. This can be achieved by a direct linkage of the elements in one and the same nucleic acid molecule. However, such direct linkage is not necessarily required. Genetic control sequences, for example enhancer sequences, can even exert their function on the target sequence from more remote positions or even from other DNA molecules. Arrangements are preferred in which the nucleic acid sequence to be transcribed is positioned downstream (i.e. at the 3'-end of) the promoter sequence, so that the two sequences are joined together covalently. The distance between the promoter sequence and the nucleic acid sequence to be expressed can be smaller than 200 base pairs, or smaller than 100 base pairs or smaller than 50 base pairs.
  • the expression cassette is advantageously inserted into an expression vector.
  • Expression vectors are chosen according to the cell to be used for expression which makes optimal expression of the encoding nucleotide sequences in the cell possible. Vectors are well known by a person skilled in the art and are given for example in "Cloning vectors" (Pouwels P. H. et a!., Ed., Elsevier, Amsterdam-New York-Oxford, 1985). Examples of expression vectors include, but are not limited to, plasmids, viral vectors (phages), e.g. SV40, CMV, baculovirus and adenovirus, transposons, IS elements, phasmids, cosmids, and linear or circular DNA.
  • a POI for the expression of a POI in a cell according to the present invention, it is possible, e.g., to introduce a nucleic acid molecule which encodes the POI (e.g. an expression vector of the invention) into the cell.
  • an existing gene of the cell can be modified so as to comprise selector codons at those amino acid positions where the POI is intended to carry ncAA residues.
  • expression describes, in the context of the invention, the production of polypeptides encoded by the corresponding nucleic acid sequence in a cell.
  • expression is also used for the production of tRNA molecules encoded by nucleic acid sequences in the cell.
  • nucleic acid molecules of the invention including the expression cassettes and expression vectors of the invention can be prepared using common cloning techniques known in the art. Common recombination and cloning techniques are used, as described for example in T. Maniatis, E.F. Fritsch and J. Sambrook, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1989) and in T.J. Silhavy, M.L. Berman and L.W. Enquist, Experiments with Gene Fusions, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1984) and in Ausubel, F.M. et al., Current Protocols in Molecular Biology, Greene Publishing Assoc and Wiley Interscience (1987).
  • nucleic acid molecules, or combinations of nucleic acid molecules, of the invention, including expression cassettes and expression vectors of the invention, can be isolated, for example by methods known in the art.
  • nucleic acid molecule is separated from other nucleic acid molecules that are present in the natural source of the nucleic acid, and moreover can be essentially free of other cellular material or culture medium, when it is produced by recombinant techniques, or free of chemical precursors or other chemicals, when it is chemically synthesized.
  • a nucleic acid molecule according to the invention can be isolated by standard techniques of molecular biology and the sequence information provided according to the invention.
  • cDNA can be isolated from a suitable cDNA-bank, using one of the concretely disclosed complete sequences or a segment thereof as hybridization probe and standard hybridization techniques (as described for example in Sambrook, J., Fritsch, E.F. and Maniatis, T. Molecular Cloning: A Laboratory Manual. 2nd edition, Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989).
  • a nucleic acid molecule comprising one of the disclosed sequences or a segment thereof, can be isolated by polymerase chain reaction, using the oligonucleotide primers that were constructed on the basis of this sequence.
  • the nucleic acid thus amplified can be cloned into a suitable vector and can be characterized by DNA sequence analysis.
  • the oligonucleotides according to the invention can moreover be produced by standard methods of synthesis, e.g. with an automatic DNA synthesizer.
  • ncAA refers generally to any non-canonical or non-natural amino acid, or amino acid residue, that is not among the 22 naturally occurring proteinogenic amino acids. Numerous ncAAs are well known in the art (see, e.g., Liu et al., Annu Rev Biochem 2010, 79:413-444; Lemke, ChemBioChem 2014, 15:1691-1694).
  • the term“ncAA” also refers to amino acid derivatives, for example a-hydroxy acids (rather than a-amino acids). Such derivatives have been shown to be translationally incorporable as well. See, e.g., Ohta et al., 2008, ChemBioChem 9:2773-2778.
  • ncAAs for use in the present invention are those which can be post- translationally further modified, for example using click chemistry reactions.
  • Such click reactions include strain-promoted inverse-electron-demand Diels-Alder cycloadditions (SPIEDAC; see, e.g., Devaraj et ai, Angew Chem Int Ed Engl 2009, 48:7013)) as well as cycloadditions between strained cycloalkynyl groups, or strained cycloalkynyl analog groups having one or more of the ring atoms not bound by the triple bond substituted by amino groups), with azides, nitrile oxides, nitrones and diazocarbonyl reagents (see, e.g., Sanders et ai, J Am Chem Soc 2010, 133:949; Agard et ai, J Am Chem Soc 2004, 126:15046), for example strain promoted alkyne-azide cycloadditions (SPAAC).
  • SPIEDAC strain-promoted inverse-electron-demand Diels-
  • ncAA-labeling groups of target polypeptides with suitable groups of coupling partner molecule.
  • Pairs of docking and labeling groups which can react via the above-mentioned click reactions are known in the art.
  • suitable ncAAs for use in the present invention comprising docking groups include, but are not limited to, the ncAAs (“unnatural amino acids”,“UAAs”) described, e.g., in WO 2012/104422 and WO 2015/107064.
  • Optionally substituted strained alkynyl groups include, but are not limited to, optionally substituted frans-cyclooctenyl groups, such as those described in.
  • Optionally substituted strained alkenyl groups include, but are not limited to, optionally substituted cyclooctynyl groups, such as those described in WO 2012/104422 and WO 2015/107064.
  • Optionally substituted tetrazinyl groups include, but are not limited to, those described in WO 2012/104422 and WO 2015/107064.
  • ncAAs used in the context of the present invention can be used in the form of their salt.
  • Salts of an ncAA as described herein means acid or base addition salts, especially addition salts with physiologically tolerated acids or bases.
  • Physiologically tolerated acid addition salts can be formed by treatment of the base form of an ncAA with appropriate organic or inorganic acids.
  • ncAAs containing an acidic proton may be converted into their non-toxic metal or amine addition salt forms by treatment with appropriate organic and inorganic bases.
  • Salts of carboxyl groups of ncAAs can be produced in a manner known in the art and comprise inorganic salts, for example sodium, calcium, ammonium, iron and zinc salts, and salts with organic bases, for example amines, such as triethanolamine, arginine, lysine, piperidine, etc.
  • ncAAs may also be used in the form of salts of acid addition, for example salts with mineral acids, such as hydrochloric acid or sulfuric acid and salts with organic acids, such as acetic acid and oxalic acid.
  • the ncAAs and salts thereof which are useful in the present invention also comprise the hydrates and solvent addition forms thereof, e.g. hydrates, alcoholates and the like.
  • Physiologically tolerated acids or bases are in particular those which are tolerated by the translation system used for preparation of POI with ncAA residues, e.g. are substantially non toxic to living eukaryotic cells.
  • ncAAs, and salts thereof, useful in the context of the present the invention can be prepared by analogy to methods which are well known in the art and are described, e.g., in the various publications cited herein.
  • the nature of the coupling partner molecule depends on the intended use.
  • the target polypeptide may be coupled to a molecule suitable for imaging methods or may be functionalized by coupling to a bioactive molecule.
  • a coupling partner molecule may comprise a group selected from, but not limited to, dyes (e.g.
  • fluorescent, luminescent, or phosphorescent dyes such as dansyl, coumarin, fluorescein, acridine, rhodamine, silicon-rhodamine, BODIPY, or cyanine dyes
  • molecules able to emit fluorescence upon contact with a reagent chromophores (e.g., phytochrome, phycobilin, bilirubin, etc.), radiolabels (e.g.
  • radioactive forms of hydrogen, fluorine, carbon, phosphorous, sulphur, or iodine such as tritium, 18 F, 1 1 C, 14 C, 32 P, 33 P, 33 S, 35 S, 1 1 In, 125 l, 123 l, 131 l, 21 2 B, 90 Y or 186 Rh), MRI-sensitive spin labels, affinity tags (e.g.
  • polyethylene glycol groups e.g., a branched PEG, a linear PEG, PEGs of different molecular weights, etc.
  • photocrosslinkers such as p-azidoiodoacetanilide
  • NMR probes such as p-azidoiodoacetanilide
  • X-ray probes such as X-ray
  • Suitable bioactive compounds include, but are not limited to, cytotoxic compounds (e.g., cancer chemotherapeutic compounds), antiviral compounds, biological response modifiers (e.g., hormones, chemokines, cytokines, interleukins, etc.), microtubule affecting agents, hormone modulators, and steroidal compounds.
  • useful coupling partner molecules include, but are not limited to, a member of a receptor/ligand pair; a member of an antibody/antigen pair; a member of a lectin/carbohydrate pair; a member of an enzyme/substrate pair; biotin/avidin; biotin/streptavidin and digoxin/antidigoxin.
  • ncAA residues to be coupled covalently in situ to (the docking groups of) conjugation partner molecules, in particular by a click reaction as described herein, can be used for detecting a target polypeptide having such ncAA residue(s) within a eukaryotic cell or tissue expressing the target polypeptide, and for studying the distribution and fate of the target polypeptides.
  • the method of the present invention for preparing a POI by expression in (e.g., eukaryotic) cells can be combined with super-resolution microscopy (SRM) to detect the POI within the cell or a tissue of such cells.
  • SRM super-resolution microscopy
  • SRM methods are known in the art and can be adapted so as to utilize click chemistry for detecting a target polypeptide expressed by a eukaryotic cell of the present invention.
  • SRM methods include DNA-PAINT (DNA point accumulation for imaging in nanoscale topography; described, e.g., by Jungmann et al., Nat Methods 11 :313-318, 2014), dSTORM (direct stochastic optical reconstruction microscopy) and STED (stimulated emission depletion) microscopy.
  • the OT systems provided by the invention allow for the translational preparation of a POI in a cell.
  • the cell used for preparing a POI according to the invention can be a prokaryotic cell.
  • the cell used for preparing a POI according to the invention can be a eukaryotic cell.
  • the cell used for preparing a POI according to the invention can be a separate cell such as, e.g., a single-cell microorganism or a cell line derived from cells of multicellular organisms.
  • the cell used for preparing a POI according to the invention can be present in (and part of) a tissue, an organ, a body part or an entire multicellular organism.
  • the methods of the invention for preparing a POI can be performed with a separate cell or a cell culture, or with a tissue or tissue culture, organ, body part or (entire multicellular) organism.
  • Eukaryotic cells are often more difficult to handle and manipulate compared to prokaryotes such as, e.g., E.coii, and therefore not or only very difficult accessible to known approaches for POI-selective orthogonal translation such as those described in the "Background of the invention" section above.
  • the OT system and the methods of the invention are therefore particular advantageous when use for POI expression in eukaryotic cells (including, e.g., single- and multicellular eukaryotic organisms, and eukaryotic cell lines).
  • prokaryotic or eukaryotic cells can be used for preparing a POI according to a method of the present invention.
  • Microorganisms such as, e.g., bacteria, fungi or yeasts can be used, as well as eukaryotic cells, such as, e.g., mammalian cells, insect cells, yeast cells and plant cells. Eukaryotic cells and in particular mammalian cells are particularly preferred.
  • the cell used for preparing a POI according to the invention carries a POI-encoding nucleotide sequence (CS P01 ) wherein the ncAA residue(s) of the POI are encoded by selector codon(s).
  • Said CS P01 is functionally linked with one or more targeting sequences (TNs). Translation yields an mRNA comprising the CS P01 and the TN(s).
  • the cell further comprises one or more fusion proteins of the present invention, wherein said fusion protein(s) comprise at least one O-RS segment and at least one RNA-TP segment.
  • Said O-RS and RNA-TP can be on separate fusion proteins (e.g. AFPs) of the invention.
  • said O-RS and RNA-TP can be on one and the same fusion protein (e.g. on an RNA-TP/O-RS fusion protein or an AFP) of the invention.
  • said mRNA Via (at least one of) its TN(s) said mRNA can interact with (bind to) at least one of the RNA-TP segments of the fusion proteins of the invention in the cell.
  • the cell further comprises one or more orthogonal tRNA ncAA molecules (0-tRNA ncAA ) which carry the anticodon(s) to the selector codon(s) of the CS P01 .
  • Said 0-tRNA ncAA molecules and one or more of the O-RS segments of the fusion proteins in the cell form one or more orthogonal 0-RS/0-tRNA ncAA pairs which allow for introducing the ncAA residue(s) into the amino acid sequence of the (translationally prepared) POI.
  • RNA-TP segment(s) The interaction of the mRNA comprising CS P01 and TN(s) with the RNA-TP segment(s), the aminoacylation of the 0-tRNA ncAA with the ncAAs by the O-RS segment(s), and the translational preparation of the POI including the introduction of the ncAA residue(s) thought to take place in the cytoplasm, more particularly in the OT assembly (OT organelle), of the cell in the presence of the ncAAs.
  • OT assembly OT organelle
  • the mRNA comprising CS P01 and TN(s) can be generated from a recombinant construct (e.g. expression vector) introduced into the cell.
  • a recombinant construct e.g. expression vector
  • one or more endogenous genes of the cell can be modified so as to comprise one or more selector codons and one or more TNs.
  • Techniques for introducing recombinant constructs into a cell as well as methods for modifying endogenous genes of a cell are well known in the art.
  • the tRNA ncAA molecules and fusion proteins of the invention can be generated from a recombinant construct (e.g. expression vector) introduced into the cell.
  • recombinant cells can be produced which can be used for preparing a POI using a method of the present invention.
  • the recombinant vectors according to the invention, described above are introduced into a suitable cell and expressed.
  • the cell used for preparing a POI as described herein can be prepared by introducing nucleotide sequences encoding the fusion protein(s), the tRNA ncAA molecule(s) and the POI into the cell.
  • Said nucleotide sequences can be located on separate nucleic acid molecules (vectors) or on the same nucleic acid molecule (e.g., vector), in any combination, and can be introduced into the cell in combination or sequentially.
  • cloning and transfection techniques are used, for example co-precipitation, protoplast fusion, electroporation, virus-mediated gene delivery, lipofection, microinjection or others, for introducing the stated nucleic acid molecules in the respective cell. Suitable techniques are described for example in Current Protocols in Molecular Biology, F. Ausubel et al., Ed., Wiley Interscience, New York 1997, or Sambrook et al. Molecular Cloning: A Laboratory Manual. 2 nd edition, Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989.
  • the cell used for POI expression is grown or cultured in a manner known by a person skilled in the art.
  • a liquid medium can be used for culturing.
  • Culture can be batchwise, semi-batchwise or continuous. Nutrients can be present at the beginning of the culturing or can be supplied later, semi-continuously or continuously.
  • the expressed POIs can be purified by known techniques, such as, e.g., molecular sieve chromatography (gel filtration), such as Q-sepharose chromatography, ion exchange chromatography and hydrophobic chromatography, and other common protein purification techniques such as ultrafiltration, crystallization, salting-out, dialysis and native gel electrophoresis. Suitable methods are described, for example, in Cooper, T. G., Biochemische Anlagenmen [Biochemical processes], Verlag Walter de Gruyter, Berlin, New York or in Scopes, R., Protein Purification, Springer Verlag, New York, Heidelberg, Berlin.
  • tags for protein purification are well known in the art and include, e.g., histidine tags (e.g., HiS 6 tag) and epitopes that can be recognized as antigens of antibodies (described for example in Harlow, E. and Lane, D., 1988, Antibodies: A Laboratory Manual. Cold Spring Harbor (N.Y.) Press). These tags can serve for attaching the proteins to a solid carrier, for example a polymer matrix, which can for example be used as packing in a chromatography column, or can be used on a microtiter plate or on some other carrier.
  • a solid carrier for example a polymer matrix, which can for example be used as packing in a chromatography column, or can be used on a microtiter plate or on some other carrier.
  • a tag linked to a POI can also serve for detecting the POI.
  • Tags for protein detection are well known in the art and include, e.g., fluorescent dyes, enzyme markers, which form a detectable reaction product after reaction with a substrate, and others.
  • the expression can be achieved by culturing the cell in the presence of one or more ncAAs corresponding to the ncAA residue(s) of the POI (wherein said ncAAs may expediently be comprised in the culture medium) for a time suitable to allow translation of the POI.
  • ncAAs may expediently be comprised in the culture medium
  • it may be required to induce expression by adding a compound inducing transcription, such as, e.g., arabinose, isopropyl /3-D-thiogalactoside (IPTG) or tetracycline that allows transcription.
  • IPTG isopropyl /3-D-thiogalactoside
  • the POI may optionally be recovered from the translation system.
  • the POI can be recovered and purified, either partially or substantially to homogeneity, according to procedures known to and used by those of skill in the art.
  • recovery usually requires cell disruption.
  • Methods of cell disruption include physical disruption, e.g., by (ultrasound) sonication, liquid-sheer disruption (e.g., via French press), mechanical methods (such as those utilizing blenders or grinders) or freeze-thaw cycling, as well as chemical lysis using agents which disrupt lipid-lipid, protein-protein and/or protein- lipid interactions (such as detergents), and combinations of physical disruption techniques and chemical lysis.
  • Standard procedures for purifying polypeptides from cell lysates or culture media are also well known in the art and include, e.g., ammonium sulfate or ethanol precipitation, acid or base extraction, column chromatography, affinity column chromatography, anion or cation exchange chromatography, phosphocellulose chromatography, hydrophobic interaction chromatography, hydroxylapatite chromatography, lectin chromatography, gel electrophoresis and the like. Protein refolding steps can be used, as desired, in making correctly folded mature proteins. High performance liquid chromatography (HPLC), affinity chromatography or other suitable methods can be employed in final purification steps where high purity is desired.
  • HPLC high performance liquid chromatography
  • affinity chromatography affinity chromatography or other suitable methods can be employed in final purification steps where high purity is desired.
  • Antibodies made against the polypeptides of the invention can be used as purification reagents, i.e. for affinity-based purification of the polypeptides.
  • purification reagents i.e. for affinity-based purification of the polypeptides.
  • a variety of purification/protein folding methods are well known in the art, including, e.g., those set forth in Scopes, Protein Purification, Springer, Berlin (1993); and Deutscher, Methods in Enzymology Vol. 182: Guide to Protein Purification, Academic Press (1990); and the references cited therein.
  • polypeptides can possess a conformation different from the desired conformations of the relevant polypeptides.
  • polypeptides produced by prokaryotic systems often are optimized by exposure to chaotropic agents to achieve proper folding.
  • the expressed polypeptide is optionally denatured and then renatured. This is accomplished, e.g., by solubilizing the proteins in a chaotropic agent such as guanidine HCI.
  • a chaotropic agent such as guanidine HCI.
  • guanidine, urea, DTT, DTE, and/or a chaperonin can be added to a translation product of interest.
  • Methods of reducing, denaturing and renaturing proteins are well known to those of skill in the art.
  • Polypeptides can be refolded in a redox buffer containing, e.g., oxidized glutathione and L-arginine.
  • polypeptides produced by the methods of the invention are also described. Such polypeptides can be prepared by a method of the invention that makes use of the OT system described herein. 7. Kits
  • the present invention also provides kits for preparing a POI having at least one non- canonical amino acid (ncAA) residue.
  • the kit of the invention may comprise at least one expression vector for at least one fusion protein of the present invention.
  • the fusion protein(s) encoded by the expression vector(s) in the kit may comprise at least one O-RS segment and at least one RNA-TP segment.
  • the kit may further comprise at least one ncAA, or salt thereof, corresponding to the at least one ncAA residue of the POI.
  • said O-RS segment is capable of aminoacylating a tRNA with the at least one ncAA.
  • the kit may further comprise at least one expression vector for an orthogonal tRNA ncAA (0-tRNA ncAA ) molecule.
  • Further components of the kit may include at least one expression vector comprising a multiple cloning site and a targeting nucleotide sequence (TN), wherein an RNA molecule comprising said TN is able to interact via said TN with an RNA-targeting polypeptide (RNA-TP).
  • RNA-TP RNA-targeting polypeptide
  • RNA-TP RNA-targeting polypeptide
  • RNA-TP RNA-targeting polypeptide
  • the kit may further comprise at least one reporter construct encoding an easily detectable (e.g. fluorescent) reporter polypeptide having at least one non-canonical amino acid (ncAA) residue such that the mRNA translated from said construct comprises a TN as described herein.
  • kits of the present invention can be used in methods of the invention for preparing ncAA- residue containing POIs as described herein.
  • the present invention further provides the following non-limiting embodiments E1 to E50.
  • E1 An assembler fusion protein (AFP) comprising:
  • a polypeptide segment derived from an intracellular targeting polypeptide (IC-TP segment), wherein said intracellular targeting polypeptide targets, and thus becomes locally enriched at, an intracellular structural element within or directly adjacent to the cytoplasm; and (a2) a polypeptide segment derived from a phase separation polypeptide (PSP segment), wherein said phase separation polypeptide has the ability to undergo self-association in the cytoplasm of a cell so as to create sites of high local concentration in the cytoplasm, and
  • RNA-TP RNA-targeting polypeptide
  • polypeptide segments are functionally linked in said AFP.
  • E2 The AFP of E1 comprising at least two APs, preferably at least one IC-TP segment and at least one PSP segment.
  • E3 The AFP of E1 or E2 having one of the following structures (from the N-terminus to the C-terminus):
  • n, o, independently of each other, are integers selected from 1 , 2, 3, 4 designates a peptidic linkage.
  • E4 The AFP of any one of E1-E3, wherein the at least one EP is selected from RNA-TP segments.
  • E5 The AFP of any one of E1-E3, wherein the at least one EP is selected from O-RS
  • E6 The AFP of any one of E1-E3 comprising at least one EP selected from RNA-TP segments and at least one EP selected from O-RS segments.
  • E7 The AFP of any one of E1-E6 comprising at least one IC-TP segment selected from dyneins and kinesins, and fragments and mutants of dyneins and kinesins, which retain the ability to target, and become enriched at, the plus or the minus end of microtubules.
  • E8 The AFP of any one of E1-E6 comprising at least one IC-TP segment selected from transmembrane domains of membrane proteins, and functional fragments and mutants of transmembrane domains which retain the ability to target, and become enriched at, the cytoplasmic side of membranes, in particular membranes selected from the cell membrane, nuclear membrane and mitochondrial membrane.
  • E9 The AFP of any one of E1-E8 comprising at least one IC-TP segment selected from:
  • KIFI6B1 -400 comprising the amino acid sequence of SEQ ID NO:20, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:20;
  • KIF13Ai- 4i i ,A39 o comprising the amino acid sequence of SEQ ID NO:22, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:22;
  • TOMM20 I -7 O comprising the amino acid sequence of SEQ ID NO:24, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:24;
  • LcK comprising the amino acid sequence of SEQ ID NO:26, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:26;
  • FRB-CD28 comprising the amino acid sequence of SEQ ID NO:28, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:28;
  • FUS-CD28 comprising the amino acid sequence of SEQ ID NO:30, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:30;
  • EB1 comprising the amino acid sequence of SEQ ID NO:302, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:303 CG1 comprising the amino acid sequence of SEQ ID NO:304, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:304 EBAG9 comprising the amino acid sequence of SEQ ID NO:292 (full length) or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to SEQ ID NO:292 ; or
  • CMP Sia Tr comprising the amino acid sequence of SEQ ID NO:296, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:296;
  • P450 2C1 targeting the cytoplasmic side of the ER membrane or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity thereto, in particular a fragment comprising the N- terminal first 27 (SEQ ID NO:298); or first 29 (SEQ ID NO:300) amino acid residues; or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to SEQ ID NO:298 or 300.
  • E10 The AFP of any one of E1-E9 comprising at least one PSP segment selected from intrinsically disordered proteins (I DPs), in particular prion-like domains, and functional fragments and mutants of I DPs, or prio-like domains, which retain the ability to undergo self-association in the cytoplasm of a cell so as to create sites of high local concentration in the cytoplasm.
  • I DPs intrinsically disordered proteins
  • E11 The AFP of any one of E1-E10 comprising at least one PSP segment selected from:
  • SPD5 comprising the amino acid sequence of SEQ ID NO:32, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:32;
  • FUS comprising the amino acid sequence of SEQ ID NO:34, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:34; and EWSR1 comprising the amino acid sequence of SEQ ID NO:36, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:36.
  • E12 The AFP of any one of E1-E11 comprising at least one RNA-TP segment selected from RNA-binding segments of viral coat proteins, and functional fragments and mutants of RNA-binding segments of viral coat proteins which retain the ability to interact specifically with an RNA motif of the virus.
  • E13 The AFP of any one of E1-E12 comprising at least one RNA-TP segment selected from:
  • MCP comprising the amino acid sequence of SEQ ID NO: 14, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 14;
  • a N 22 comprising the amino acid sequence of SEQ ID NO: 16, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:16;
  • POP comprising the amino acid sequence of SEQ ID NO:306, or a functional fragment or mutant thereof having at least 60% at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:306.
  • E14 The AFP of any one of E1-E13 comprising at least one O-RS segment selected from:
  • Methanococcus jannaschii tyrosyl-tRNA synthetase Methanococcus jannaschii tyrosyl-tRNA synthetase
  • thermophila pyrrolysyl-tRNA synthetase Methanosarcina thermophila pyrrolysyl-tRNA synthetase
  • Desulfitobacterium hafniense pyrrolysyl-tRNA synthetase Desulfitobacterium hafniense pyrrolysyl-tRNA synthetase
  • E15 The AFP of any one of E1-E14 comprising at least one O-RS segment selected from:
  • PylRS AF comprising the amino acid sequence of SEQ ID NO:8, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:8;
  • PylRS ⁇ comprising the amino acid sequence of SEQ ID NO:10, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 10; PylRS ⁇ 1 comprising the amino acid sequence of SEQ ID NO:12, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 12; IFRS1 comprising the amino acid sequence of SEQ ID NO:224, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:
  • OMeRS comprising the amino acid sequence of SEQ ID NO:236 or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:236.
  • E16 An assembler fusion protein (AFP) combination comprising at least two AFPs of any one of E1-E15.
  • E17 The AFP combination of E16 comprising at least one first AFP comprising at least one RNA-TP segment, and at least one second AFP comprising at least one O-RS segment.
  • RNA-TP/O-RS fusion protein comprising:
  • RNA-TP RNA-targeting polypeptide
  • RNA-TP/O-RS fusion protein of E18 having one of the following structures (from the N-terminus to the C-terminus):
  • x and y independently of each other, are integers selected from 1 , 2, 3, 4 and 5; and designates a peptidic linkage.
  • E20 The RNA-TP/O-RS fusion protein of E18 or E19 comprising at least one RNA-TP segment selected from RNA-binding segments of viral coat proteins, and functional fragments and mutants of RNA-binding segments of viral coat proteins which retain the ability to interact specifically with an RNA motif of the virus.
  • E21 The RNA-TP/O-RS fusion protein of any one of E18-E20 comprising at least one RNA- TP segment selected from:
  • MCP comprising the amino acid sequence of SEQ ID NO: 14, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 14;
  • a N 22 comprising the amino acid sequence of SEQ ID NO: 16, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 16;
  • POP comprising the amino acid sequence of SEQ ID NO:306, or a functional fragment or mutant thereof having at least 60% at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:306.
  • E22 The RNA-TP/O-RS fusion protein of any one of E18-E21 comprising at least one O-RS segment selected from:
  • Methanococcus jannaschii tyrosyl-tRNA synthetase Methanococcus jannaschii tyrosyl-tRNA synthetase
  • thermophila pyrrolysyl-tRNA synthetase Methanosarcina thermophila pyrrolysyl-tRNA synthetase
  • Desulfitobacterium hafniense pyrrolysyl-tRNA synthetase Desulfitobacterium hafniense pyrrolysyl-tRNA synthetase
  • E23 The RNA-TP/O-RS fusion protein of any one of E18-E22 comprising at least one O-RS segment selected from:
  • PylRS AF comprising the amino acid sequence of SEQ ID NO:8, or a functional fragment or mutant thereof having at least 60% sequence identity to the amino acid sequence of SEQ ID NO:8;
  • PylRS ⁇ comprising the amino acid sequence of SEQ ID NO:10, or a functional fragment or mutant thereof having at least 60% sequence identity to the amino acid sequence of SEQ ID NO: 10;
  • PylRS ⁇ 1 comprising the amino acid sequence of SEQ ID NO:12, or a functional fragment or mutant thereof having at least 60% sequence identity to the amino acid sequence of SEQ ID NO: 12;
  • IFRS1 comprising the amino acid sequence of SEQ ID NO:224, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:224;
  • CbzRS comprising the amino acid sequence of SEQ ID NO:226; or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:226;
  • CpkRS comprising the amino acid sequence of SEQ ID NO:228 or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of
  • OMeRS comprising the amino acid sequence of SEQ ID NO:236 or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:236.
  • E24 A nucleic acid molecule, or a combination of two or more nucleic acid molecules, comprising:
  • E25 A nucleic acid molecule, or a combination of two or more nucleic acid molecules, comprising:
  • E26 An expression cassette comprising the nucleotide sequence of the nucleic acid molecule, or the combination of nucleic acid molecules, of E24 or E25.
  • E27 An expression vector comprising at least one expression cassette of E26.
  • E28 A cell comprising at least one nucleic acid molecule, or combination of nucleic acid molecules, of E24 or E25, at least one expression cassette of E26, or at least one expression vector of E27.
  • E29 The cell of E28 which is a eukaryotic cell.
  • E30 The cell of E28 which is a mammalian cell.
  • E31 The cell of any one of E28-E30 comprising at least one nucleic acid molecule, or combination of nucleic acid molecules, of E24, or at least one expression cassette comprising the nucleotide sequence of said nucleic acid molecule, or combination of nucleic acid molecules, or at least one expression vector comprising said expression cassette.
  • E32 The cell of E31 comprising a nucleotide sequence that encodes, or is complementary to a nucleotide sequence encoding, at least one AFP of any one of E1-E3 and E7-E15 comprising at least one EP selected from RNA-TP segments and at least one EP selected from O-RS segments.
  • E33 The cell of E31 comprising a nucleotide sequence that encodes, or is complementary to a nucleotide sequence encoding, at least one AFP of any one of E1-E3 and E7-E15 comprising at least one EP selected from RNA-TP segments, and at least one AFP of any one of E1-E3 and E7-E15 comprising at least one EP selected O-RS segments.
  • E34 The cell of any one of E28-E30 comprising at least one nucleic acid molecule, or combination of nucleic acid molecules, of E25, or at least one expression cassette comprising the nucleotide sequence of said nucleic acid molecule, or combination of nucleic acid molecules, or at least one expression vector comprising said expression cassette.
  • E35 The cell of any one E28-E34, wherein the cell expresses the at least one AFP, the at least one AFP combination or the at least one RNA-TP/O-RS fusion protein, respectively, that is encoded by the nucleotide sequence of said nucleic acid molecule, or combination of nucleic acid molecules.
  • E36 A method for preparing a polypeptide of interest (POI) comprising in its amino acid sequence one or more non-canonical amino acid (ncAA) residues, wherein the method comprises expressing the POI in a cell of any one of E31-E33 in the presence of said one or more ncAAs, wherein the cell comprises:
  • a targeting nucleotide sequence that is functionally linked to the CS P01 and is able to interact with an RNA-TP segment of at least one of the AFPs in the cell;
  • the method optionally further comprises recovering the expressed POI.
  • E37 A method for preparing a polypeptide of interest (POI) comprising in its amino acid sequence one or more non-canonical amino acid (ncAA) residues, wherein the method comprises expressing the POI in a cell of E35 in the presence of said one or more ncAAs, wherein the cell comprises:
  • a targeting nucleotide sequence that is functionally linked to the CS P01 and is able to interact with an RNA-TP segment of at least one of the RNA-TP/O-RS fusion proteins in the cell;
  • the method optionally further comprises recovering the expressed POI.
  • E38 A method for preparing a polypeptide of interest (POI) comprising in its amino acid sequence one or more non-canonical amino acid (ncAA) residues, said method comprising the steps of:
  • orthogonal tRNA ncAA molecules and one or more of the O-RS segments of the AFPs in the cell form one or more orthogonal aminoacyl tRNA synthetase/tRNA ncAA (0-RS/0-tRNA ncAA ) pairs,
  • said 0-RS/0-tRNA ncAA pairs allow for introducing said one or more ncAA residues into the amino acid sequence of said POI
  • steps (a) and (b) can be concomitantly or sequentially in any order; (c) then, expressing said POI in said cell in the presence of said one or more ncAAs, wherein
  • the POI-encoding nucleotide sequence (CS P01 ) comprises one or more selector codons encoding said one or more ncAA residues
  • said CS P01 is functionally linked to a targeting nucleotide sequence (TN), thus forming a CS pol /TN fusion sequence,
  • TN targeting nucleotide sequence
  • said CS pol /TN fusion sequence is able to interact, via its TN, with an RNA-TP segment of at least one of the AFPs in the cell;
  • E39 A method for preparing a polypeptide of interest (POI) comprising in its amino acid sequence one or more non-canonical amino acid (ncAA) residues, said method comprising the steps of:
  • orthogonal tRNA ncAA molecules and one or more of the O-RS segments of the RNA-TP/O-RS fusion proteins in the cell form one or more orthogonal aminoacyl tRNA synthetase/tRNA ncAA (0-RS/0-tRNA ncAA ) pairs,
  • said 0-RS/0-tRNA ncAA pairs allow for introducing said one or more ncAA residues into the amino acid sequence of said POI
  • steps (a) and (b) can be concomitantly or sequentially in any order;
  • the POI-encoding nucleotide sequence (CS P01 ) comprises one or more selector codons encoding said one or more ncAA residues
  • CS P01 is functionally linked to a targeting nucleotide sequence (TN), thus forming a CS pol /TN fusion sequence, - said CS pol /TN fusion sequence is able to interact, via its TN, with an RNA-TP segment of at least one of the RNA-TP/O-RS fusion proteins in the cell;
  • TN targeting nucleotide sequence
  • E40 The method of any one of E36-E39, wherein the TN is selected from viral RNA motifs bound by a viral coat protein, and functional fragments and mutants thereof which retain the ability to be bound by a viral coat protein.
  • E41 The method of any one of E36-E40, wherein the TN is selected from:
  • MS2 RNA stem-loop comprising the RNA sequence encoded by the nucleotide sequence of SEQ ID NO: 17, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:17;
  • BoxB comprising the RNA sequence encoded by the nucleotide sequence of SEQ ID NO: 18, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 18, and
  • pp7 RNA stem-loop existing in at least two different versions and comprising the RNA sequence encoded by the nucleotide sequence of in particular a polynucleotide having an RNA sequence corresponding to (encoded by) the nucleotide (DNA) sequence of SEQ ID NO:289 or SEQ ID NO:290 or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:289 or 290.
  • E42 The method of any one of E36-E41 , wherein the selector codon(s) encoding the ncAA residue(s) of the POI are selected from Amber, Ochre and Opal stop codons.
  • E43 A nucleic acid molecule comprising: (i) a nucleotide sequence (CS P01 ) that encodes a polypeptide of interest (POI), said POI comprising one or more non-canonical amino acid (ncAA) residues which are encoded in the CS P01 by selector codons, and
  • RNA-TP RNA-targeting polypeptide
  • E44 The nucleic acid molecule of E43, wherein the TN is selected from viral RNA motifs bound by a viral coat protein, and functional fragments and mutants thereof which retain the ability to be bound by a viral coat protein.
  • E45 The nucleic acid molecule of E43 or E44, wherein the TN is selected from:
  • MS2 RNA stem-loop comprising the RNA sequence encoded by the nucleotide sequence of SEQ ID NO: 17, or a functional fragment or mutant thereof having at least 60%,%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 17;
  • BoxB comprising the RNA sequence encoded by the nucleotide sequence of SEQ ID NO:18, or a functional fragment or mutant thereof having at least 60%, %, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 18; and
  • pp7 RNA stem-loop existing in at least two different versions and comprising the RNA sequence encoded by the nucleotide sequence of in particular a polynucleotide having an RNA sequence corresponding to (encoded by) the nucleotide (DNA) sequence of SEQ ID NO:289 or SEQ ID NO:290 or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:289 or 290.
  • E46 The nucleic acid molecule of any one of E43-E45, wherein the selector codon(s) encoding the ncAA residue(s) of the POI are selected from Amber, Ochre and Opal stop codons.
  • E47 A kit for preparing a polypeptide of interest (POI) having at least one non-canonical amino acid (ncAA) residue, the kit comprising:
  • At least one expression vector of E27 At least one expression vector of E27.
  • E48 The kit of E47, wherein the expression vector encodes a fusion protein comprising at least one O-RS segment and at least one RNA-TP segment.
  • E49 The kit of E47 or E48 further comprising at least one expression vector for an orthogonal tRNA ncAA (0-tRNA ncAA ) molecule.
  • E50 The kit of any one of E47-E49 further comprising at least one expression vector comprising a multiple cloning site and a targeting nucleotide sequence (TN), wherein an RNA molecule comprising said TN is able to interact via said TN with an RNA- targeting polypeptide (RNA-TP).
  • TN targeting nucleotide sequence
  • AP i.e. IC-TP and PSP
  • EP RNA-TP or O-RS
  • synthetic protein segments which induce and control macromolecular interactions.
  • One or more, like 2, 3, 4, 5, 6, 7, 8, 9, or 10, preferably one such protein segment may be operably fused into a single AFP of the invention.
  • SYNZIPs having the ability to form heterodimeric coiled-coil protein structures. Such SYNZIPs are pairs of synthetic peptides capable of interacting with each other and are used to induce and control macromolecular interactions.
  • Non-limiting examples are the pairs SYNZIP 1 :2; SYNZIP 3:4 and SYNZIP 5:6.
  • Particularly preferred according to the invention is the heterospecific coiled-coil pair SYNZIP2:SYNZIP1 as described by Reinke, A.W., Grant, R.A., Keating, A.E. (2010) J Am Chem Soc 132 6025-6031 (SYNZIP 1 : SEQ ID NO:312, SYNZIP 2: SEQ ID NO:314, SYNZIP 3: SEQ ID NO:316; SYNZIP 4: SEQ ID NO:318, as well as functional fragments and mutants of these SYNZIP polypeptides.
  • Said functional fragments and mutants may comprise at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid of the polypeptide they are derived from.
  • HEK293T cells ATCC CRL-3216 and COS-7 cells (ATCC, CRL-1651) were maintained in Dulbecco's modified Eagle's medium (Life Technologies, 41965-039) supplemented with 1 % penicillin-streptomycin (Sigma, 10,000 U/ml penicillin, 10 mg/ml streptomycin, 0.9% NaCI), 2 mM L-glutamine (Sigma), 1 mM sodium pyruvate (Life Technologies) and 10% FBS (Sigma). Cells were cultured at 37°C in a 5% CO2 atmosphere and passaged every 2-3 days up to 15-20 passages.
  • ncAAs Stock and working solutions for all of the used ncAAs were prepared as described in Nikic et al. (Nat Protoc 10(5):780-791 , 2015).
  • SCO cyclooctyne lysine, SiChem SC-8000
  • 3-lodophenylalanine (Chem-lmpex International Inc.) was used at a final concentration of 1 mM.
  • SCO is efficiently recognized by PylRS AF (Y306A, Y384F) (see Plass et ai, Angew Chem 2011 , 50:3878-3881).
  • 3-lodophenylalanine is recognized by PylRS ⁇ (C346A, N348A) (see Wang et ai, ACS Chem Biol 2013, 8:405-415).
  • HEK293T cells were harvested after one day after transfection, resuspended in 1x PBS and passed through 100 pm nylon mesh. Co-transfections for flow cytometry were performed at a 1 : 1 : 1 : 1 ratio with 1.2 pg total DNA with:
  • tRNA Pyl a plasmid encoding the tRNA Pyl having the anticodon which matches (i.e., is the reverse complement) of the stop codon in the POI-encoding sequence
  • Cell culture medium was exchanged for fresh medium containing the ncAA to be incorporated into the POI 4-6 h post-transfection and left until the time of harvesting.
  • FSC-A forward scatter area
  • SSC-A side scatter area
  • SSC-W side scatter width
  • the cells were rinsed with 1x PBS, fixed in 2% paraformaldehyde in 1x PBS for 10 min at RT, rinsed with 1x PBS again and then permeabilized in 0.5% Triton X in 1x PBS for 15 min at RT. After rinsing the permeabilized cell samples twice with 1x PBS, said samples were incubated for 90 min in blocking solution (3% BSA in 1x PBS for 90 min at RT), and then with 1 pg/ml primary antibody (polyclonal rat anti-PyIRS, prepared as described in Nikic et al.
  • the cell samples were rinsed with 1x PBS and incubated with 2 pg/ml secondary antibody (chicken anti-rat lgG(H+L) cross-adsorbed Alexa Fluor 594 conjugated antibody (Thermo Fisher Scientific, A-21471) and/or goat anti-rabbit lgG(H+L) cross- adsorbed Alexa Fluor 647 conjugated F(ab') 2 (Thermo Fisher Scientific, A-21246)) in blocking solution for 60 min at RT.
  • DNA was stained with Hoechst 33342 (1 pg/ml in 1xPBS) for 10 min at RT. If only DNA was stained, the cells were fixed and permeabilized as described above and then stained with Hoechst 33342 (1 pg/ml in 1xPBS) for 10 min at RT. Finally, the cells were rinsed twice with 1x PBS.
  • FISH experiments were performed one day after transfection analogously to the FISH experiments described in Nikic et al. (Angew Chem Int Ed Engl 2016, 55(52): 16172-16176).
  • the hybridization protocol was adapted for 24-well plates from Pierce et al. (Methods Cell Biol 122:415-436, 2014).
  • the hybridization probe 5'-CTAACCCGGCTGAACGGATTTAGAGTCCATTCGATC-3' (labelled at the 5' terminus with Cy5; SEQ ID NO:1) was used at 0.25 pM. After four washes with SSC and one wash with TN buffer (0.1 M TrisHCI, 150 mM NaCI), cells were incubated for 1 h at RT with 3% BSA in TN buffer prior to standard immunofluorescence labeling as described above.
  • the hybridization probe for tRNA Pyl (5’-CT AACCCGGCT G AACGG ATTT AG AGTCCATTCGAT C-3’ , labelled at the 5' terminus with digoxigenin; SEQ ID NO:2) was used at 0.16 pM
  • the hybridization probe for the MS2 RNA stem-loop sequence (5’-CTGCAGACATGGGTGATCCTCATGTTTTCTA-3', labelled at the 5' terminus with Alexa Fluor 647; SEQ ID NO:3) was used at 0.75 pM.
  • the cells were incubated for 1h at RT in blocking buffer (0.1 M TrisHCI, 150 mM NaCI, 1x blocking reagent (Sigma 11096176001). Then, the cells were incubated with fluorescein conjugated sheep anti-digoxigenin Fab (Sigma 11207741910) at a 1 :200 dilution in blocking buffer overnight at 4°C. The next day, 3 washes of 5 minutes were done in Tween buffer (0.1 M TrisHCI, 150 mM NaCI, 0.5% Tween20). DNA was stained with Hoechst 33342 (1 pg/ml in 1xPBS) for 10 min at RT.
  • blocking buffer 0.1 M TrisHCI, 150 mM NaCI, 1x blocking reagent (Sigma 11096176001). Then, the cells were incubated with fluorescein conjugated sheep anti-digoxigenin Fab (Sigma 11207741910) at a 1 :200 dilution in blocking buffer overnight at 4°C.
  • Confocal images were acquired on a Leica SP8 STED 3X microscope equipped with a 63x/1.40 oil immersion objective using the following laser lines for excitation: 405 nm for Hoechst 33342, 488 nm for fluorescein and GFP, 548 nm for mOrange, 594 nm for Alexa Fluor 594, 647 nm for Alexa Fluor 647 and Cy5. Emission light was collected with HyD detectors at 420-500 nm and 605-680 nm respectively.
  • Ribosomal immunofluorescence images were taken on an Olympus Fluoroview FV3000 microscope equipped with a 60x/1.40 oil immersion objective using the following laser lines for excitation: 488 nm for GFP, 594 nm for Alexa Fluor 594, 640 nm for Alexa Fluor 647.
  • Two different fluorescent protein reporters were cloned into a pBI-CMV1 vector (Clontech 631630), one protein in one multiple cloning site and the other reporter in the other multiple cloning site.
  • the CDS for one of the reporters encoded an mRNA carrying two MS2 RNA stem-loops fused to the 3' untranslated region (“MS2-tag”), while the encoded mRNA of the other reporter was not MS2-tagged.
  • NLS::GFP 39TAG ::MS2-tag reporter NLS::GFP 39TAG was cloned with two copies of MS2 RNA stem-loops into the pBI-CMV1 vector as a reporter for successful Amber suppression in imaging experiments.
  • GFPs which are applicable in the context of the invention are:
  • G F P 66ATA GFp with isoleucine site (SEQ ID NO:248)
  • mCherrys which are applicable in the context of the invention are:
  • mCherry constructs comprising different TN loops which are applicable in the context of the invention are:
  • mCherry 1 90TAG -2xPP7 mCherry with amber site and 2x pp7 loops (SEQ ID NO:216)
  • mCherry 1 90TAG -4xPP7 mCherry with amber site and 4x pp7 loops (SEQ ID NO:218)
  • mCherry 1 90TAG -6xPP7 mCherry with amber site and 6x pp7 loops SEQ ID NO:220
  • H2B-mCherry 1 90TAG -2xMS2 Human Histone H2B type 1-J ( Uniprot : P06899) fused to mCherry with amber site and 2x ms2-loops (SEQ ID NO:222)
  • AFP molecules may be fused into the polypeptide chain of any of the AFP molecules described herein, in particular at a position within the fusion molecule which does not inhibit the function of anyone of the other polypeptide segments (APs and EPs) of the AFP molecule.
  • APs and EPs polypeptide segments
  • Examples of such epitope-tag containing AFP molecules are given below.
  • Constructs for OT assemblies were prepared as follows: tRNA Pyl was cloned under the control of a human U6 promoter, and all other constructs were under CMV promoters cloned in the pcDNA3.1 (Invitrogen V86020) vector. MCP protein was cloned from the addgene plasmid #31230 and FUS from the Addgene plasmid #26374. In all FUS fusions, amino acids 1-478(S108N) were used, replacing the C-terminal NLS region by a Flag-tag.
  • KIF13AI- 4 H and KIFI6B1 -400 were cloned from human cDNA and inserted into pcDNA3.1 via restriction cloning. P390 of KIF13AI- 4 H was removed via side directed mutagenesis. KIF13Ai- 4 n ,AP390 and KIF16B. 400 fusions with MCP, PylRS AF , EWSR1 ::MCP, FUS::PylRS AF , FUS ⁇ PylRS ⁇ , SPD5::MCP and SPD5::PylRS AF were assembled via Gibson assembly (see Gibson et al., Nat Methods 2009, 6:343-345).
  • INSR 676TAG ::mOrange was fused to an MS2-tag by replacing Vim 116TAG -mOrange in the pBI vector bearing Nup153::EGFP 149TAG and Vim 116TAG ::mOrange::MS2-tag to yield a bicistronic vector with INSR 676TAG ::mOrange j n one anc
  • Multicistronic Amber suppression vectors for COS-7 cell experiments As COS-7 cells have lower transfection efficiency; we generated multicistronic vectors harboring the components of an OT assembly.
  • multicistronic Amber suppression vectors To assemble multicistronic Amber suppression vectors, first one copy of tRNA Pyl under the control of a human U6 promoter was inserted into the pBI-CMV1 vector via Gibson assembly. Subsequently, first the AFP CDS KIF16B::FUS::PylRS AF and finally the AFP CDS KIF16B::EWSR1 ::MCP were inserted via Gibson assembly.
  • An OT assembly ( ⁇ T organelle”, Fig. 1) was engineered having the following components: i) An mRNA-targeting system in which two MS2 RNA stem-loops (MS2-tag) were fused to the mRNA of choice coding for the POI, creating an mRNA::ms2 fusion.
  • the MS2-tag binds specifically to the MS2 bacteriophage coat protein (MCP) (see Bertrand et a!., Mol Cell 1998, 2:437-445), which will thus form a stable and specific mRNA::ms2-MCP complex in cells.
  • MCP MS2 bacteriophage coat protein
  • the MS2-tag was always fused to the 3’ untranslated region (3’ UTR) of the mRNA, which ensures translation to yield a scar-less final POI.
  • tRNA/RS suppressor pair A tRNA/RS suppressor pair.
  • the orthogonal tRNA/RS pair from the Methanosarcina mazei pyrrolysyl system (tRNA Pyl /PylRS) was chosen because it has enabled the encoding of more than 200 ncAAs with diverse functionalities into proteins using GCE in a multitude of cell types and species, including E. coli , mammalian cells and even living mice (see, e.g., Liu et al., Annu Rev Biochem 2010, 79:413-444; Lemke, ChemBioChem 2014, 15:1691-1694; Chin, Nature 2017, 550;53-60).
  • the assembler (AP) was the key component required to form an OT assembly.
  • the purpose of the assembler was to create membrane-less structures in the form of a dense phase, aggregate, droplet or condensate, in which the mRNA::ms2-MCP complex is brought into close proximity of the tRNA Pyl /PylRS pair.
  • the Caenorhabditis elegans protein spindle-defective protein 5 (SPD5) has been shown to phase separate into particularly large (several micron-sized) droplets (see Woodruff et al., Cell 2017, 169:1066-1077, e1010).
  • SPD5 is locally highly concentrated compared to the remaining soluble fraction in the cytoplasm (by several orders of magnitude). It was expected that a protein fused to SPD5 would condense into droplets.
  • PylRS fused to SPD5 and MCP fused to SPD5 were expected to be highly enriched.
  • P2 is denoted SPD5::PylRS * SPD5::MCP.
  • K1 is denoted KIF13Ai- 4 n ,AP39 o::PylRS * KIF13Ai- 4 n ,AP39 o: :MCP.
  • K2 is denoted KIFieB ⁇ oo ⁇ PylRS-KIFieB ⁇ oo MCP.
  • a dual-reporter construct in which GFP and mCherry mutants are simultaneously expressed from two different expression cassettes from one plasmid, ensuring that the mRNA ratio between them is constant across all experiments. Stop codons were introduced at permissive sites into GFP at position 39 (GFP 39STOP ) and into mCherry at position 185 (mCherry 185STOP ; Fig. 2B). Only if stop codon suppression is successful will the corresponding green or red fluorescent protein be produced.
  • Transfected cells (tRNA Pyl and ncAA were always present unless specifically noted otherwise) were analyzed by fluorescence flow cytometry (FFC); settings were adjusted so that an approximate diagonal results in the FFC plots if GFP and mCherry are expressed from this plasmid using the conventional cytoplasmic PylRS system, which cannot differentiate mRNAs.
  • FFC fluorescence flow cytometry
  • a selective and functional OT organelle should selectively express mCherry only if the MS2-tag is fused to the 3’ UTR of the mCherry mRNA, leading to the appearance of a vertical line in the cytometry plot (Fig. 2B).
  • this ncAA is efficiently encoded by a Y306A, Y384F double mutant of PylRS (for simplicity this mutant is designated PylRS herein, unless otherwise specified) (see Nikic et ai, Angew Chem 2014, 53:2245-2249; Plass, Angew Chem 2012, 51 :4166-4170; Plass et ai, Angew Chem 201 1 , 50:3878-3881). Omission of the ncAA served as a standard negative control and lead to no expression of GFP or mCherry. The performance of each OT system was evaluated according to its selectivity and relative efficiency.
  • Selectivity is defined as the ratio r of the mean mCherry FFC signal divided by the mean GFP signal. Final values are expressed as fold selectivity relative to that of cytoplasmic PylRS. Relative efficiency is defined as the mean mCherry signal of each system divided by the mean mCherry signal of the cytoplasmic PylRS system, which serves as the reference (here defined as 100%). All results on selectivity (dark-gray positive bars) and efficiency (light-gray negative bars) are summarized in the bar plot in Figure 2C. Selected FFC data is also shown in Figure 2D.
  • the simplest strategy B (MCP fused to PylRS) showed an about 1.5-fold selectivity gain (Fig. 2C).
  • the OT system P1 (based on phase separation of FUS/EWSR1) had a somewhat lower selectivity gain (Fig. 2C, D).
  • the P2 system (based on SPD5) showed an approximate twofold selectivity gain (Fig. 2C).
  • For K1 a twofold increase in selectivity was observed (Fig. 2C).
  • the K2 system behaved similarly (Fig. 2C,D). In total, the selectivity gains were relatively small, but robustly detected and distinguishable from a simple efficiency drop.
  • AFPs comprising combinations of the APs described in example 1 were tested in an analogous manner, those were:
  • K1 : :P1 Kl F13Ai-*i 1 ,DR39 0 : : FUS: : PylRS-KI F13A!_ 41 i ,DR39 0 : : EWSR 1 : : MCP,
  • K2::P1 KIF16Bi- «x>::FUS::PylRS*KIF16Bi- «x>::EWSR1 ::MCP,
  • K1 ::P2 KIF13Ai- 4 n ,Ap39 o::SPD5::PylRS * KIF13Ai- 4i i ,Ap39 o::SPD5::MCP,
  • K2::P2 KIF16Bi- «x>::SPD5::PylRS*KIF16Bi- «x>::SPD5::MCP.
  • EXAMPLE 3 - AFPs comprising a combination of APs including a membrane-targeting AP
  • AFPs comprising combinations of APs derived from phase separation polypeptides (PSPs), FUS and EWSR1 (also termed EWS herein), optionally fused to SYNZIP segments, and different APs which acts as a membrane-targeting signal, LcK, EB1 , CG1 , EBAG9 M length , EBAG9 I -29 , CMP Sia Tr, P450 2C1 i- 27 and P450 2CI 1-29 were tested in a manner analogous to example 2.
  • PSPs phase separation polypeptides
  • FUS and EWSR1 also termed EWS herein
  • LcK is a cell membrane-targeting signal (Resh, Bba-Mol Cell Res 1999, 1451 :1-16) that adds an amphipathic helix post translationally to the POI.
  • the AFPs LcK::FUS::PylRS and LcK::EWSR1 ::MCP were co-expressed in HE293T cells (see Fig. 3 and 6C).
  • Testing of this system with the same dual reporter resulted in a dramatic shift in the signal and a strong selectivity for the expression solely of the MS2-tagged mCherry compared to the control PylRS. See Fig. 4 and Fig. 5 showing a 26-fold selectivity gain as compared to the control.
  • IF and FISH for MCP, PylRS and tRNA show a clear membrane signal with appearance of occasional droplet-like structures and a perfect co-localization of all the components.
  • EB1 is a microtubule plus ends-targeting signal ((Nehlig A, Molina A, Rodrigues-Ferreira S, Honore S, Nahmias C. Regulation of end-binding protein EB1 in the control of microtubule dynamics. Cell Mol Life Sd. 2017;74(13)2381-2393. doi:10.1007/s00018-017-2476-2).
  • EB1 :FUS::PylRS with EB1 ::EWSR1 ::MCP or EB1 ::FUS::MCP::PylRS were expressed in HE293T cells. Testing of this system with the same dual reporter resulted in a shift in the signal and a strong selectivity for the expression solely of the MS2-tagged mCherry compared to the control PylRS. See Fig.6B.
  • CG1 is a nuclear membrane-targeting signal (Kim SJ, Fernandez-Martinez J, Nudelman I, et al. Integrative structure and functional anatomy of a nuclear pore complex. Nature. 2018;555(7697):475-482. doi:10.1038/nature26003)
  • the AFP constructs CG1 ::FUS::PylRS and CG1 ::EWSR1 ::MCP were co-expressed in HE293T cells. Testing of this system with the same dual reporter resulted in a shift in the signal and a strong selectivity for the expression solely of the MS2-tagged mCherry compared to the control PylRS. See Fig. 6E.
  • EBAG9 fu n length and E BAG 91 -29 are Golgi membrane-targeting signals (Engelsberg A, Hermosilla R, Karsten U, Schiilein R, Dorken B, Rehm A.
  • the Golgi protein RCAS1 controls cell surface expression of tumor-associated O-linked glycan antigens. J Biol Chem. 2003278(25)22998-23007. doi:10.1074/jbc.M301361200).
  • the AFPconstructs EBAG9i- 29 ::FUS::PylRS and EBAG9 I -29 ::EWSR1 ::MCP were co-expressed in HE293T cells.
  • CMP Sia Tr is a Golgi membrane-targeting signal (Eckhardt M, Gotza B, Gerardy-Schahn R. Membrane topology of the mammalian CMP-sialic acid transporter. J Biot Chem. 1999;274(13):8779-8787. doi:10.1074/jbc.274.13.8779).
  • P450 2C1 I -27 is an ER membrane-targeting signal (Fazal FM, Han S, Parker KR, et al. Atlas of Subcellular RNA Localization Revealed by APEX-Seq. Cell. 2019;178(2):473-490.e26. doi:10.1016/j.cell.2019.05.027).
  • the AFP constructs P450 2C1 i- 27 ::FUS::PylRS and P450 2C1 1.27 ::EWSR1 ::MCP or P450 2C1 1.29 ::FUS::MCP::PylRS were co-expressed in HE293T cells. Testing of this system with the same dual reporter resulted in a shift in the signal and a strong selectivity for the expression solely of the MS2-tagged mCherry compared to the control PylRS. See Fig. 6G.
  • GCE can also be used to introduce multiple ncAAs into the same POI (see, e.g., Liu et al., Annu Rev Biochem 2010, 79:413-444; Lemke, ChemBioChem 2014, 15:1691-1694; Chin, Nature 2017, 550;53-60).
  • ncAA 3-iodophenylalanine
  • a phenylalanine derivative instead of a lysine derivative (such as SCO)
  • a PylRS mutant N346A, C348A
  • nucleoporin 153 (Nup153) versus cytoskeletal vimentin.
  • Nup153 locates to the nuclear pore complex and is more than 1500 amino acids long. Hence, its mRNA is approximately six-fold larger than those of the fluorescent protein reporters used above.
  • transmembrane proteins can be selectively expressed using the QJK2::PI asse mbly.
  • Membrane protein expression represents another layer of translational complexity, as ribosomes need to bind the endoplasmic reticulum during translation, where the proteins are co-translationally inserted into the membrane.
  • a fusion of insulin receptor 1 with an Amber codon at position 676 with mOrange I NSR 676TAG :: mOrange
  • This construct was tagged with an MS2-tag in the 3’ UTR and cloned with Nup153::EGFP 149TAG into one dual-cassette plasmid. Then the construct was expressed in HEK293T cells either in the presence of the cytoplasmic PylRS system or in the presence of the OT K2::P1 assembly. In the presence of the OT K2::P1 assembly, selective expression of the MS2-tagged protein and the expected plasma membrane localization of INSR 676TAG ::mOrange were observed (data not shown), indicating the potential of the OT system of the present invention to participate in even more complex membrane-associated translational processes.
  • EXAMPLE 10 Spatial distribution of elements of the OT system in the cell The spatial distribution of AFPs and particularly PylRS in cells was assessed using immunofluorescence (IF). Additionally, fluorescence in situ hybridization (FISH) was used for detecting tRNA Pyl . In contrast to the dual color reporter used in the FFC experiments above, in all IF/FISH experiments a single color NLS-GFP 39TAG reporter that was fused to an MS2- tag (nls-gfp 39TAG ::ms2) was used to identify cells active in Amber suppression (this yields a green nucleus if Amber suppression is successful and helped to optimize distinguishable color channels).
  • IF immunofluorescence
  • FISH fluorescence in situ hybridization
  • Ribosomes were stained to see whether they co-localize to the OT K2::P1 assembly. IF staining of the ribosomal protein RPL26L1 revealed strong co-localization with the OT K2::P1 organelle (data not shown) demonstrating ribosome recruitment, tentatively due to binding to mRNA::ms2 during translation. High ribosomal mobility can also explain why it was possible to successfully express the membrane protein INSR (construct: INSR 676TAG ::mOrange::ms2).
  • tRNA Pyl itself is recruited to the OT K2::P1 assembly due to its affinity for assembler: ylRS and can readily co-partition into the droplet to be aminoacylated with its cognate ncAA, while assembler: :MCP recruits MS2-tagged mRNA.
  • EXAMPLE 11 Further OT systems
  • a variety of other OT systems were tested and found to allow for selective orthogonal translation of the reporter (i.e. the POI).
  • a summary of these experiments is shown in the Table 1 below.
  • the cytoplasmic NES-PyIRS system as previously described by Nikic et al. (Angew Chem Int Ed Engl 2016, 55(52):16172-16176) but with the corresponding AF, AA or AAAF mutations was used as a nonspecific reference (negative control). All experiments were performed in presence of the codon-specific tRNA Pyl and PylRS mutant corresponding ncAAs.
  • EXAMPLE 12 Further OT systems
  • a variety of similar OT systems were tested, differing with respect to the mRNA targeting components, and were found to allow for selective orthogonal translation of the reporter (i.e. the POI).
  • a summary of these experiments is shown in the Table 2 below. The results are shown in Figures 7A, B and C.
  • the cytoplasmic NES-PyIRS system as previously described by Nikic et al. (Angew Chem Int Ed Engl 2016, 55(52):16172-16176) was used as a nonspecific reference (negative control). All experiments were performed in presence of the codon-specific tRNA Pyl and PylRS mutant corresponding ncAAs.
  • SYNZIP1 forms a pair with SYNZIP2
  • SYNZIP3 forms a pair with SYNZIP4.
  • all other described SYNZIPs should work similarly (https://pubs.acs.org/doi/pdf/10.1021/ia907617a).
  • CG1 CG1 (Nup42) nucleoporin protein for targeting to nuclear membrane
  • CMPSiaTr CMP sialic acid transporter for targeting to Golgi membrane
  • EWSR1 Ewing sarcoma breakpoint region 1 (also termed EWS herein)
  • FRB-CD28 synthetic membrane targeting domain derived from transmembrane proteins CD4, FRB (similar to mTOR) and CD28
  • FUS-CD28 synthetic membrane targeting fusion polypeptide derived from CD4, FUS and
  • IFRS1 Methanosarcina mazei PylRS (L305M, Y306L, L309S, N346S,
  • N S R 676TAG insulin receptor amino acid position 676 encoded by Amber codon
  • KIF13A kinesin family member 13A -
  • KIF13A specifically refers to the fragment covering amino acid residues 1-411 of KIF13A wherein P390 is deleted (KI F1 3AI-4H ,AP39O).
  • KIF16B kinesin family member 16B - Unless specified otherwise herein, “KIF16B” specifically refers to the fragment covering amino acid residues 1-400 of KIF16B (KIFI6B1.400).
  • OMeRS Methanosarcina mazei PyrRS (A302T, Y384F, N346V, C348W,
  • OT assembly spacially enriched components of the GCE machinery in a
  • P450 2C1 1-27 P450 2C1 residues 1-27 (N-terminal) for targeting of ER membranes
  • POI TAG POI comprising an Amber-(TAG-)encoded amino acid residue (or coding sequence therefor)
  • PylRS AAAF mutant M. mazei pyrrolysyl tRNA synthetase comprising amino acid substitutions Y306A, N346A, C348A and Y384F
  • tRNA Pyl tRNA that is coupled to pyrrolysyl or another non-canonical amino acid residue by a wild-type or modified PylRS and has an anticodon that, for site-specific incorporation of a (non-canonical) amino acid residue into a POI, is preferably the reverse complement of a selector codon.
  • the tRNA Pyl used in the examples carried the anticodon against the stop codon Amber (tRNA Pyl cUA ), Ochre (tRNA Pyl uUA ) or
  • CTGCAGACATGGGTGATCCTCATGTTTTCTA (SEQ ID NO : 3 )
  • PylRS AF Metalosarcina mazei pyrrolysyl tRNA synthetase double mutant: Y306A, Y384F; Uniprot: Q8PWY1/
  • PylRS AAAF Metalsarcina mazei pyrrolysyl tRNA synthetase quadruple mutant: Y306A, N346A, C348A, Y384F; Uniprot: Q8PWY1 )
  • MCP coat protein of Enterobacteria phage MS2
  • KIE16B !-4OO Homo sapiens kinesin family member 16B fragment covering amino acid residues 1-400; Uniprot: Q96L93
  • KIF13A 1-4ll &p39 o Homo sapiens kinesin family member 13A fragment covering amino acid residues 1-411 wherein P390 is deleted; Uniprot: Q9H1H9)
  • LcK posttranslational modification site for plasma membrane targeting of Mus musculus lymphocyte-specific protein tyrosine kinase; Uniprot: P06240
  • DNA GGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTC (SEQ ID NO: 25)
  • FRB-CD28 synthetic membrane targeting fusion polypeptide derived from Mus musculus CD4 (Uniprot: P06332), FRB (similar to Homo sapiens mTOR; Uniprot: P42345) and Mus musculus CD28 (Uniprot: P31041)
  • FUS-CD28 synthetic membrane targeting fusion polypeptide derived from Mus musculus CD4 (Uniprot: P06332), Homo sapiens fused-in sarcoma (Uniprot: P35637) and Mus musculus CD28 (Uniprot: P31041)
  • EWSR1 Homo sapiens Ewing sarcoma breakpoint region 1; Uniprot: Q01844.

Abstract

The present invention is concerned with orthogonal translation systems which allow for the site-specific introduction of non-canonical amino acid residues into a target protein (POI) in a POI-mRNA-selective manner. Specifically, the present invention relates to assembler fusion proteins which bring an RNA-targeting polypeptide (RNA-TP) segment and an orthogonal aminoacyl tRNA synthetase (O-RS) segment into spatial proximity of one another, either by direct linkage in RNA-TP/O-RS fusion proteins, or though the action of "assemblers" fused to each of these segments in assembler fusion proteins (AFPs). The invention also relates to AFP combinations and nucleic acid molecules comprising a POI-encoding sequence together with a targeting nucleotide sequence that is able to interact with an RNA-TP. The invention further relates to nucleic acid molecules, expression cassettes and expression vectors encoding said RNA-TP/O-RS fusion proteins or AFPs, cells comprising same, as well as methods and kits for translationally preparing POIs.

Description

Means and methods for preparing engineered target proteins by genetic code expansion in a target protein-selective manner
FIELD OF THE INVENTION
The present invention is concerned with orthogonal translation systems which allow for the site-specific introduction of non-canonical amino acid (ncAA) residues into a polypeptide of interest (POI) in a POI-mRNA-selective manner. Specifically, the present invention relates to fusion proteins which bring an RNA-targeting polypeptide (RNA-TP) segment and an orthogonal aminoacyl tRNA synthetase (O-RS) segment into spatial proximity of one another. This is achieved by combining an RNA-TP segment and an O-RS segment in one and the same fusion protein (RNA-TP/O-RS fusion protein), or by the action of one or more polypeptide segments which act as“assemblers” (APs) and facilitate a local enrichment of assembler fusion proteins (AFPs) comprising the one or more APs together with an RNA-TP segment or an O-RS segment, thus bringing said RNA-TP and O-RS segments into close proximity of one another. The invention also relates to AFP combinations and nucleic acid molecules comprising a POI-encoding sequence together with a targeting nucleotide sequence (TN) that is able to interact with an RNA-TP. The invention further relates to nucleic acid molecules, expression cassettes and expression vectors encoding said RNA- TP/O-RS fusion proteins or AFPs, cells comprising same, as well as methods and kits for translationally preparing POIs.
BACKGROUND OF THE INVENTION
The ability to engineer orthogonal (i.e. non-crossreactive) translation systems site-specifically into living cells enables the introduction of new functionality into proteins. However, this is a herculean task, as translation is a complex multistep process in which at least 20 different aminoacylated tRNAs, their cognate aminoacyl tRNA synthetases (RS), ribosomes and diverse other factors work in concert to synthesize a polypeptide chain from the RNA transcript. An ideal orthogonal system would show no cross-reactivity with factors of the host machinery, minimizing its impact on the housekeeping translational activity and normal physiology of the cell. Towards this goal, genetic code expansion (GCE) is a method that enables reprogramming of a specific codon. With GCE, an orthogonal (suppressor) RS (O-RS) can aminoacylate its cognate suppressor tRNA with non-canonical amino acids (ncAAs). These ncAAs are typically custom designed and harbor chemical functionalities that can, for example, enable protein function to be photocontrolled, encode posttranslational modifications or allow the introduction of fluorescent labels for microscopy studies using click chemistry. To introduce ncAAs site-specifi cally into a polypeptide of interest (POI), the anticodon loop of the tRNA is chosen to decode and thus suppress one of the stop codons (see, e.g., Liu et al., Annu Rev Biochem 2010, 79:413-444; Lemke, ChemBioChem 2014, 15:1691-1694; Chin, Nature 2017, 550;53-60). To minimize the impact on the host cell machinery, the Amber stop codon (corresponding tRNAcuA) is often utilized, owing to its particularly low abundance in E.cou, to terminate endogenous proteins (<10%). Nevertheless, in principle any Amber codon in the genome can be suppressed, potentially leading to unwanted background suppression of non- targeted host proteins. If ncAA-modified proteins are recombinantly produced for in vitro applications, this background incorporation might be tolerable as long as the yields of purified full-length protein are acceptable. However, the challenge is different if the host is considered more than just a bioreactor vessel that can be sacrificed for its protein. In order to study the function of a host-cell POI in situ, the physiological condition of that host cell is an important factor. In that context, minimization of background incorporation of the ncAA is particularly required to ensure well-controlled experiments.
At least three elegant approaches have been developed to enable orthogonal translation in E.coii, that is, to decode a specific codon only for the RNA of the POI and not the entire genome i) Orthogonal ribosomes recognizing a unique Shine-Dalgarno sequence have been developed to decode quadruplet codons, which are then used instead of stop codons to site- specifically encode an ncAA into a POI. (See, e.g., Heumann et al., Nature 2010, 464:441 :444; Orelle et al., Nature 2015, 524:119-124; Fried et al., Angew Chem 2015, 54:12791-12794.) ii) Recently, genome engineering has advanced to the stage that E.coii strains can be depleted of selected native codons, providing a genetically clean (e.g. Amber codon free) host background for selective decoding of specific codons only in the POI. (See, e.g., Isaacs et al., Science 2011 , 333:348-353; Lajoie et al., Science 2013, 342:357-360; Ostrov et al., Science 2016, 353:819-822; Wang et al., Nature 2016, 539:59-64.) iii) Unique non-canonical codons have been designed using an artificial base pair encoded only in the coding sequence of the POI. This lowers the risk of nonspecific decoding in other parts of the genome (see Zhang et ai, Nature 2017, 551 :644-647). However, due to genome complexity, it is not straightforward to transfer these orthogonal translation approaches to eukaryotes (see, e.g., Thompson et ai, ACS Chem Biol 2018, 13:313-325), in which additionally the Amber codon is highly abundant (20% in mammalian cells).
There is therefore a high demand for strategies for POI-selective orthogonal translation which are versatile and work not only for well-characterized prokaryotes such as E.coli, which are relatively easy to handle and manipulate, but are also applicable to eukaryotic cells. It was therefore an object of the present invention to address this challenge.
SUMMARY OF THE INVENTION
The inventors found that orthogonal translation systems (OT systems) which are able to selectively translate the mRNA of a POI can be created by generating spatial proximity between the mRNA of the POI and the O-RSs which allow for translationally introducing the ncAA residues into the growing polypeptide chain of the POI. The inventors demonstrated for a variety of POIs, including membrane proteins, that their OT systems allow for site- specifically introducing ncAA residues into a POI in a mammalian cell with selectivity for the mRNA of the POI compared to other mRNAs in the cytoplasm that contain the same stop codon (that is used as selector codon for encoding the ncAA residue of the POI).
In the orthogonal translation systems of the invention, the spatial proximity is achieved by including a targeting sequence (TN) in the mRNA of the POI that can selectively interact with an RNA-targeting polypeptide (RNA-TP), and linking the O-RS with such RNA-TP. Said linkage can be in a fusion protein comprising both, the O-RS and the RNA-TP (RNA-TP/O- RS fusion protein).
In another approach, this can be achieved by the action of one or more polypeptide segments which act as“assemblers” (APs) in facilitating a local enrichment of at least two assembler fusion proteins (AFPs) at least one of which comprising the one or more APs and an RNA-TP segment and at least one other AFP comprising the one or more APs and an O- RS segment, thus bringing said RNA-TP and O-RS segments (RNA-TP and O-RS also designated“effector” or ΈR”) into close proximity of one another. The local enrichment of the AFPs allows for the formation of assemblies (OT assemblies, also termed“OT organelles” herein) which can act as artificial orthogonally translating organelles.
The inventors demonstrated that different types of APs can be used. A first type includes APs which drive local enrichment at (previously existing) intracellular structures (such as, e.g., microtubules or the cytoplasmic side of membranes such as the cell membrane or the nuclear membrane, the ER, mitochondrial or Golgi organelles), termed intracellular targeting polypeptide (IC-TP) segments. A second type of APs generates high local AFP concentrations by self-association in the cytoplasm (in particular by phase separation) termed phase separation polypeptide (PSP) segments herein. Said AP types may also be combined with other polypeptide elements having the ability to form multimeric structures, like in particular, coiled coil heterodimers, as formed by synthetic SYNZIP polypeptide pairs. Similarly, said EP types may also be combined with other polypeptide elements having the ability to form multimeric structures, like in particular, coiled coil heterodimers, as formed by synthetic SYNZIP polypeptide pairs. Such multimer formation further improves local enrichment of AFPs.
The inventors further found that AFPs combining different AP types are particularly useful.
In still another approach, AFPs are provided encompassing in a single polypeptide, i.e. fused together, both types of EP segments, i.e. the RNA-TP and O-RS segment, one or both types of AP segments, i.e. the IC-TP and/or PSP segment, optionally supplemented by said polypeptide elements having the ability to form multimeric structures (SYNZIP polypeptide). This provides the advantage that all the elements required for generating an OT system of the invention are included in one single AFP.
Thus, in a first aspect, the present invention relates to an assembler fusion protein (AFP) comprising:
(a) at least one first polypeptide segment acting as assembler (AP) that is selected from:
(a1) a polypeptide segment derived from an intracellular targeting polypeptide (IC-TP segment), wherein said intracellular targeting polypeptide targets, and thus becomes locally enriched at, an intracellular structural element within or directly adjacent to the cytoplasm; and (a2) a polypeptide segment derived from a phase separation polypeptide (PSP segment), wherein said phase separation polypeptide has the ability to undergo self-association in the cytoplasm of a cell so as to create sites of high local concentration in the cytoplasm, and
(b) at least one second polypeptide segment acting as an effector (EP) that is selected from:
b1) an RNA-targeting polypeptide (RNA-TP) segment, and
b2) an orthogonal aminoacyl tRNA synthetase (O-RS) segment;
wherein said polypeptide segments are functionally linked in said AFP.
In a second aspect, the present invention relates to an assembler fusion protein (AFP) combination comprising at least two AFPs of the present invention as described herein. Preferably, the AFP combination comprises at least one AFP comprising a RNA-TP segment and at least one AFP comprising an O-RS segment. Including into at least one AFP of said combination a first SYNZIP element and including in at least another AFP of said combination a second SYNZIP element, wherein said first and said second SYNZIP act together by forming a heterodimer structure, represents another advantageous form of said second aspect..
In a third aspect, the present invention relates to a fusion protein (RNA-TP/O-RS fusion protein) comprising:
(i) at least one RNA-targeting polypeptide (RNA-TP) segment; and
(ii) at least one orthogonal aminoacyl tRNA synthetase (O-RS) segment,
wherein said polypeptide segments are functionally linked in said RNA-TP/O-RS fusion protein.
In a further aspect, the present invention provides a nucleic acid molecule, or a combination of two or more nucleic acid molecules, comprising:
(i) a nucleotide sequence that encodes at least one RNA-TP/O-RS fusion protein of the present invention as described herein, or
(ii) a nucleic acid sequence complementary to (i), or
(iii) both of (i) and (ii). In a further aspect, the present invention provides a nucleic acid molecule, or a combination of two or more nucleic acid molecules, comprising:
(i) a nucleotide sequence that encodes at least one AFP of the present invention as described herein, or
(ii) a nucleic acid sequence complementary to (i), or
(iii) both of (i) and (ii).
In a further aspect, the present invention provides a nucleic acid molecule, or a combination of two or more nucleic acid molecules, comprising:
(i) a nucleotide sequence that encodes at least one AFP combination of the present invention as described herein, or
(ii) a nucleic acid sequence complementary to (i), or
(iii) both of (i) and (ii).
In further aspects, the present invention provides an expression cassette comprising the nucleotide sequence of the nucleic acid molecule, or the combination of nucleic acid molecules, of the present invention as described herein.
In particular embodiments, the present invention provides an expression cassette comprising:
(i) a nucleotide sequence that encodes at least one RNA-TP/O-RS fusion protein of the present invention as described herein, or
(ii) a nucleic acid sequence complementary to (i), or
(iii) both of (i) and (ii).
In further particular embodiments, the present invention provides an expression cassette comprising:
(i) a nucleotide sequence that encodes at least one AFP of the present invention as described herein, or
(ii) a nucleic acid sequence complementary to (i), or
(iii) both of (i) and (ii).
In further particular embodiments, the present invention provides an expression cassette comprising: (i) a nucleotide sequence that encodes at least one AFP combination of the present invention as described herein, or
(ii) a nucleic acid sequence complementary to (i), or
(iii) both of (i) and (ii).
In further aspects, the present invention provides an expression vector comprising at least one expression cassette of the present invention as described herein.
In further aspects, the present invention provides a cell comprising at least one nucleic acid molecule, or combination of nucleic acid molecules, of the present invention as described herein. In particular embodiments, the cell comprises at least one expression cassette or at least one expression vector of the present invention as described herein.
In a further aspect, the present invention relates to a method for preparing a polypeptide of interest (POI) comprising in its amino acid sequence one or more non-canonical amino acid (ncAA) residues. Said method comprises expressing the POI in a cell of the present invention in the presence of said one or more ncAAs, wherein the cell comprises:
(i) at least one AFP comprising a RNA-TP segment and at least one AFP comprising an O-RS segment as described herein;
(ii) a POI-encoding nucleotide sequence (CSP01) wherein said one or more ncAA residues of the POI are encoded by selector codon(s),
(iii) a targeting nucleotide sequence (TN) that is functionally linked to the CSP01 and is able to interact with an RNA-TP segment of at least one of the AFPs in the cell;
(iv) one or more orthogonal tRNAncAA (0-tRNAncAA) molecules which carry the anticodon(s) complementary to the selector codon(s) of the CSP01, and wherein said 0-tRNAncAA molecules together with one or more O-RS segments of the AFPs in the cell form one or more orthogonal 0-RS/0-tRNAncAA pairs which allow for the introduction of said one or more ncAA residues into the amino acid sequence of the POI;
and wherein the method optionally further comprises recovering the expressed POI. Said at least one AFP comprising a RNA-TP segment and said at least one AFP comprising an O-RS segment recited in (i) can be one and the same type of AFP, i.e. an AFP comprising both a RNA-TP segment and an O-RS segment. Alternatively, said at least one AFP comprising a RNA-TP segment and said at least one AFP comprising an O-RS segment recited in (i) can be different AFPs.
In a further aspect, the present invention relates to a method for preparing a polypeptide of interest (POI) comprising in its amino acid sequence one or more non-canonical amino acid (ncAA) residues. Said method comprises expressing the POI in a cell of the present invention in the presence of said one or more ncAAs, wherein the cell comprises:
(i) RNA-TP/O-RS fusion proteins of the present invention as described herein;
(ii) a POI-encoding nucleotide sequence (CSP01) wherein said one or more ncAA residues of the POI are encoded by selector codon(s),
(iii) a targeting nucleotide sequence (TN) that is functionally linked to the CSP01 and is able to interact with an RNA-TP segment of at least one of the RNA-TP/O-RS fusion proteins in the cell;
(iv) one or more orthogonal tRNAncAA (0-tRNAncAA) molecules which carry the anticodon(s) complementary to the selector codon(s) of the CSP01, and wherein said 0-tRNAncAA molecules together with one or more O-RS segments of the RNA- TP/O-RS fusion proteins in the cell form one or more orthogonal 0-RS/0-tRNAncAA pairs which allow for the introduction of said one or more ncAA residues into the amino acid sequence of the POI;
and wherein the method optionally further comprises recovering the expressed POI.
In a further aspect, the present invention relates to a method for preparing a polypeptide of interest (POI) comprising in its amino acid sequence one or more non-canonical amino acid (ncAA) residues. Said method comprises the steps of:
(a) expressing in a cell one or more AFPs comprising at least one RNA-TP segment and one or more AFPs comprising at least one O-RS segment as described herein;
(b) expressing in said cell one or more orthogonal tRNAncAA (0-tRNAncAA) molecules, wherein
- said orthogonal tRNAncAA molecules and one or more of the O-RS segments of the AFPs in the cell form one or more orthogonal aminoacyl tRNA synthetase/tRNAncAA (0-RS/0-tRNAncAA) pairs,
- said 0-RS/0-tRNAncAA pairs allow for introducing said one or more ncAA residues into the amino acid sequence of said POI, wherein steps (a) and (b) can be concomitantly or sequentially in any order;
(c) then, expressing said POI in said cell in the presence of said one or more ncAAs, wherein
- the POI-encoding nucleotide sequence (CSP01) comprises one or more selector codons encoding said one or more ncAA residues,
- said selector codons match the anticodons of said one or more 0-tRNAncAA molecules;
- said CSP01 is functionally linked to a targeting nucleotide sequence (TN), thus forming a CSpol/TN fusion sequence,
- said CSpol/TN fusion sequence is able to interact, via its TN, with an RNA-TP segment of at least one of the AFPs in the cell;
and
(d) optionally recovering the expressed POI.
In a further aspect, the present invention relates to a method for preparing a polypeptide of interest (POI) comprising in its amino acid sequence one or more non-canonical amino acid (ncAA) residues. Said method comprises the steps of:
(a) expressing in a cell RNA-TP/O-RS fusion proteins of the present invention as described herein;
(b) expressing in said cell one or more orthogonal tRNAncAA (0-tRNAncAA) molecules, wherein
- said orthogonal tRNAncAA molecules and one or more of the O-RS segments of the RNA-TP/O-RS fusion proteins in the cell form one or more orthogonal aminoacyl tRNA synthetase/tRNAncAA (0-RS/0-tRNAncAA) pairs,
- said 0-RS/0-tRNAncAA pairs allow for introducing said one or more ncAA residues into the amino acid sequence of said POI,
wherein steps (a) and (b) can be concomitantly or sequentially in any order;
(c) then, expressing said POI in said cell in the presence of said one or more ncAAs, wherein
- the POI-encoding nucleotide sequence (CSP01) comprises one or more selector codons encoding said one or more ncAA residues,
- said selector codons match the anticodons of said one or more 0-tRNAncAA molecules; - said CSpo' is functionally linked to a targeting nucleotide sequence (TN), thus forming a CSpol/TN fusion sequence,
- said CSpol/TN fusion sequence is able to interact, via its TN, with an RNA-TP segment of at least one of the RNA-TP/O-RS fusion proteins in the cell;
and
(d) optionally recovering the expressed POI.
In a further aspect, the present invention relates to a nucleic acid molecule comprising:
(i) a nucleotide sequence (CSP01) that encodes a polypeptide of interest (POI), said POI comprising one or more, identical or different, non-canonical amino acid (ncAA) residues which are encoded in the CSP01 by selector codons, and
(ii) a targeting nucleotide sequence (TN), wherein an RNA molecule comprising said TN is able to interact via said TN with an RNA-targeting polypeptide (RNA-TP).
In a further aspect, the present invention relates to a kit for preparing a polypeptide of interest (POI) having at least one non-canonical amino acid (ncAA) residue, the kit comprising:
at least one ncAA, or salt thereof, corresponding to the at least one ncAA residue of the POI, and
at least one expression vector of the present invention as described herein.
Said expression vector comprises at least one expression cassette comprising:
(i) a nucleotide sequence that encodes at least one RNA-TP/O-RS fusion protein of the present invention, at least one AFP of the present invention, or at least one AFP combination of the present invention, or
(ii) a nucleic acid sequence complementary to (i), or
(iii) both of (i) and (ii).
BRIEF DESCRIPTION OF THE FIGURES
Figure 1 shows a schematic representation of the spatial separation of the components which allow for orthogonal translation so as to decode a specific stop codon in a uniquely tagged mRNA. (A) Conventional expression of the synthetase PylRS leads to aminoacylation of its cognate stop codon suppressor tRNAPyl with a custom designed ncAA. This leads to site-specific ncAA incorporation whenever the respective stop codon occurs in mRNA of the POI. Given that many endogenous mRNAs terminate on the same stop codon, utilizing this approach in the cytoplasm potentially leads to misincorporation of the ncAA into unwanted proteins (left box). (B) To avoid this, the present invention allows that the mRNA encoding the POI and the orthogonal aminoacyl-tRNA synthetase (e.g., PylRS) can be brought into close proximity to one another through the use of an RNA-targeting polypeptide segment (e.g., MCP) and assemblers (APs), . This allows for spatial enrichment of all components so as to create an OT assembly (ΌT organelle”), including the mRNA encoding the POI, the orthogonal aminoacyl-tRNA synthetase, the tRNA, and ribosomes (right box). Here aminoacylated tRNAPyl is particularly available in direct proximity of the OT organelle, so that particularly here stop codon suppression (of the POI mRNA) can occur. This leads to a selective suppression of stop codons (and thus expression) of the POI mRNA over corresponding stop codons in mRNAs that are not targeted to the OT assembly. While in (A) GCE occurs stop codon-specific, in (B) it should occur stop codon-specific and mRNA- specific.
Figure 2A shows a schematic representation of different assembler classes. B = bimolecular MCP::PylRS fusion, P1 = fusions to FUS and EWSR1 , P2 = SPD5, K1 = truncation of kinesin KIF13A (KI F1 3AI-4I I ,AP39O), K2 = truncation of kinesin KIF16B (KIFI6B1-400) and combinations thereof (K1 ::P1 , K1 ::P2, K2::P1 , K2::P2).
Figure 2B shows a schematic representation of the dual-color reporter. mRNAs encoding the fluorescent proteins GFP and mCherry, containing stop codons at permissive sites, are expressed from one plasmid, each with its own CMV promoter, ensuring a constant ratio of mRNA throughout each experiment. The mRNA of the mCherry reporter is tagged with two MS2 RNA stem-loops (“ms2”, also referred to as MS2-tag herein), mRNA(mCherry)::ms2. In the presence of ncAA and tRNAPyl, in the case of cytoplasmic PylRS, both GFP39STOP and mCherry185STOP are produced, leading to a diagonal in fluorescence flow cytometry (FFC) analysis (left box). However, under the same conditions, orthogonal translation in OT organelles enables selective stop codon suppression of mRNA(mCherry)::ms2, resulting in an mCherry-positive and GFP-negative population (drawn schematically as a vertical population in the right box). In both schemes, non-transfected HEK293T cells are represented by a gray circle at the bottom. Figure 2C shows the selectivity and relative efficiency of various exemplary OT systems. For all experiments the indicated constructs were co-expressed with tRNAPyl (anticodon corresponding to the indicated codon) and the dual reporter ( QF^STOP _ mCherry185STOP::ms2). GCE was performed in presence of the indicated ncAAs, and cells were analyzed by FFC. The dark gray bars (normalized to cytoplasmic PylRS) represent the fold change in the ratios r of the mean fluorescence intensities of mCherry versus GFP (derived from FFC, see Fig. 2D, E) for all the systems tested. The light-gray bars represent the relative efficiency as defined by the mean fluorescence intensity of mCherry for each condition divided by cytoplasmic PylRS control (derived from FFC, see Fig. 2D, E). Shown are the mean values of at least three independent experiments; error bars represent the SEM. The box highlights the best performing OT organelle (OTK2::P1).
Figure 2D shows the results of the FFC analysis of the dual-color reporter expressed with the four indicated systems in transfected HEK293T cells and tRNAPyl in the presence of the ncAA SCO, a lysine derivative with a cyclooctyne side chain. Highly selective and efficient orthogonal translation was observed for the OT assembly (the black arrow indicates a bright, highly mCherry-positive population). Shown in the dot plots are the sums of at least three independent experiments. Axes indicate fluorescence intensity in arbitrary units.
Figure 2E shows FFC plots for the OT assembly selectively translating Opal and Ochre codons only of recruited mRNA(mCherry185TGA)::ms2 and mRNA(mCherry185TAA)::ms2, respectively.
Figure 3 shows a schematic representation of the constructs composing the following systems: PylRS, MCP::PylRS, FUS::MCP::PylRS and LcK::FUS::PylRS*LcK::EWS::MCP.
Figure 4 shows the flow cytometry analysis of the dual reporter expression with the 4 different systems depicted in Figure 3. HEK293T cells were transfected with constructs encoding the dual reporter, tRNA, LcK::FUS::PylRS and LcK::EWS::MCP or PylRS, MCP:: PylRS, FUS::MCP:: PylRS and pcDNA3.1. Shown is the sum of at least three independent experiments. Axes indicate fluorescence intensity in arbitrary units. Figure 5 shows a bar plot with the ratios of the mean fluorescence intensity of mCherry vs. GFP fluorescence for all the tested systems. Plots represent mean values of at least 3 biological replicates, error bars indicate standard error of means.
Figure 6 provides an overview of different approaches of the present invention for generating OT organelles, which target to the surface of different intra-cellular structures. Different constructs are expressed and the results of the respective fluorescence flow cytometry (FFC) analyses are shown. On top of the figure the dual color reporter construct GFP39TAG *mCherry 185TAG::ms2 (see also Figure 2B) as applied in each of the schematically illustrated experiments A to G is depicted and a schematic illustration of different targeted cellular compartments is shown. Control experiments performed without the effector polypeptide MCP (-MCP) are also illustrated for each of the experiments A to G:
A: OT organelle targeted to microtubules and obtained by expressing the system KIF16Bi-4oo::FUS::PylRS*KIF16Bi-4oo::EWSR1 ::MCP or the construct
KIF16Bi-4oo::FUS::PylRS (control);
B: OT organelle targeted to microtubule plus ends and obtained by expressing the constructs EB1 ::FUS::MCP::PvlRS or EB1 ::FUS::PvlRS (control).
C: OT organelle targeted to plasma membrane and obtained by expressing the system LcK: : FUS: : PylRS*LcK: : EWSR1 : : MCP or the construct LcK::FUS::PylRS (control).
D: OT organelle targeted to mitochondrial membrane and obtained by expressing the system TOM20i-7o::FUS::PylRS*TOM20i-7o::EWSR1 ::MCP or the construct TOM20i- 7o::FUS::PylRS (control).
E: OT organelle targeted to nuclear membrane and obtained by expressing the system CG1 ::FUS::PylRS*CG1 ::EWSR1 ::MCP or the construct CG1 ::FUS::PylRS (control).
F (left side): OT organelle targeted to Golgi membrane and obtained by expressing the system EBAG9i-29::FUS::PylRS· EBAG9I-29::EWSR1 ::MCP or the construct EBAG9i- 29::FUS::PylRS (control).
F (right side): OT organelle targeted to Golgi membrane and obtained by expressing the system CMP Sia Tr::FUS::PylRS· CMP Sia Tr::MCP or the construct CMP Sia Tr: : FUS: : PylRS (control).
G: OT organelle targeted to ER membrane and obtained by expressing the system P450 2C11-27::FUS::PylRS*P450 2C11.27::EWSR1 ::MCP or the construct P450 2C11-27::FUS::PylRS (control). Figure 7 provides an overview of different approaches of the present invention for recruiting RNA using the interaction of different RNA loops and respective RNA targeting proteins. The results of the respective fluorescence flow cytometry (FFC) analyses are shown and compared to the respective analysis as obtained for non-targeted PylRS alone:
A: System ms-2-MCP incorporates the ms2 loops in the UTR of an mRNA molecule and recruits the mRNA with the MCP protein into the artificial organelle.
B: System boxB-AN22 incorporates the boxB loops in the UTR of an mRNA molecule and recruits the mRNA with the AN22 protein into the artificial organelle
C: System pp7-PCP incorporates the pp7 loops in the UTR of an mRNA molecule and recruits the mRNA with the PCP protein into the artificial organelle.
Figure 8 illustrates a further approach of the present invention for generating OT organelles which will work on the surface of different cellular structures. Here the targeting to plasma membrane is exemplified. The particular approach is characterized by the pairwise incorporation of so-called synthetic heterodimeric-coiled coil peptides SYNZIP1 and SYNZIP2 fused into the system LcK::FUS::SYNZIP1 ::PylRS*EWSR1 ::SYNZIP2::MCP; upon expression SYNZIP1 and 2 pair and recruit MCP to a plasma membrane based OT organelle which in turn enables the selective orthogonal translation of a subsequently recruited mRNA comprising the ms2 targeting nucleotide loops. Selective translation is illustrated by the results of the respective FFC analysis (A). In a comparative approach with the system LcK::FUS::PylRS*EWSR1 ::SYNZIP2::MCP, wherein SYNZIP1 is missing, no selectivity of translation could be observed (B).
DETAILED DESCRIPTION OF THE INVENTION
Unless otherwise defined herein, scientific and technical terms as used in the context of the present invention shall have the meanings that are commonly understood by those of ordinary skill in the art. The meaning and scope of the terms should be clear. However, in the event of any latent ambiguity, definitions provided herein take precedent over any dictionary or extrinsic definition. Further, unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular. If not otherwise stated, nucleotide sequences are depicted herein in the 5' to 3' direction. If not otherwise stated, amino acid sequences are depicted herein in the direction from N- terminus to C-terminus.
If not otherwise stated, the polypeptide of interest (POI) that is translationally expressed by the OT system according to the present invention comprises one or more ncAA residues which are encoded in the nucleotide sequence encoding the POI (CSP01) by selector codons.
1. Fusion proteins
1.1. General
The fusion proteins of the invention may be construed in different manner.
A first type includes fusion proteins wherein at least two types of effector polypeptides (EPs), comprising at least one RNA-TP and at least one O-RS, are comprised by one and the same fusion protein (also designated as RNA-TP/O-RS fusion proteins).
A second type includes fusion proteins which comprise at least one assembler polypeptide (AP) and at least one type of EP selected from RNA-TP segments and O-RS segments (also designated AFPs). In particular, AFPs can comprise both RNA-TP and O-RS segments, such as one or more RNA-TP segments and one or more O-RS segments in any sequential order, in addition to the at least one type of AP. Thus, AFPs in particular are selected from the following fusion protein types (segments functionally linked in any order within the polypeptide chain; one or more segments of the same type in any order within the polypeptide chain):
(RNA-TP/AP)
(O-RS/AP)
(RNA-TP/O-RS/AP)
APs are selected from IC-TPs and PSPs, and may be composed of one or more IC-TPs and/or one or more of PSPs in any sequential order. Thus, AFPs more particularly are selected from the following fusion protein types (segments functionally linked in any order within the polypeptide chain; one or more segments of the same type in any order within the polypeptide chain):
(RNA-TP/ IC-TP)
(O-RS/ IC-TP)
(RNA-TP/O-RS/ IC-TP)
(RNA-TP/PSP)
(O-RS/PSP)
(RNA-TP/O-RS/PSP)
(RNA-TP/PSP/ IC-TP)
(O-RS/ PSP/ IC-TP)
(RNA-TP/O-RS/ PSP/ IC-TP)
APs and/or EPs may also comprise (as part of the fusion protein) heterooligomer forming, in particular heterodimer forming polypeptide segments, like in particular synthetic coiled coil SYNZIP peptides. AFP combinations comprising such interacting SYNZIP pairs distributed between members of said AFP combination, so that each AFP comprises merely one member of such interacting SYNZIP pair are particular embodiments.
The term“segment” as used herein in the context of fusion proteins indicates that the thus designated element (e.g., RNA-TP, O-RS, IC-TP, PSP, SYNZIP) is part of the fusion protein, i.e. linked to the remainder of the fusion protein. The segments of the fusion proteins of the invention are functionally linked, i.e. linked such that they still function as RNA-TP, O-RS, IC- TP and PSP or SYNZIP, respectively. Said linkage is preferably covalent, and in particular is a peptidic linkage.
For example, the RNA-TP segment comprised in the fusion proteins of the present invention is a segment of the fusion protein that is derived from, and functions in the context of the fusion protein as, an RNA-TP, thus allowing the fusion protein to interact with (bind to) the targeted RNA, wherein said interaction is expediently a specific one. Thus, an RNA-TP segment may comprise the (entire) amino acid sequence, or a functional fragment, of an RNA-targeting polypeptide as described herein. Analogously, an O-RS segment comprised by the fusion proteins of the present invention is a segment of the fusion protein that is derived from, and functions in the context of the fusion protein as, an O-RS, thus conferring to the fusion protein O-RS enzymatic activity, that is the ability to catalyze the aminoacylation of an O-tRNA with an ncAA. Thus, an O-RS segment may comprise the (entire) amino acid sequence, or a functional fragment, of an O-RS as described herein.
The assembler fusion proteins (AFPs) described herein comprise at least one polypeptide segment acting as an assembler (AP). As used herein the term AP refers to any polypeptide segment that allows for enrichment of AFPs comprising said segment at spatially distinct sites within a living cell. Expediently said spatially distinct sites are located within, or directly adjacent to, the cytoplasm of the cell and readily accessible to the translational machinery of the cell (which includes canonical aminoacylated tRNAs, translation factors, ribosomal subunits, etc.) as well as the O-tRNAs which allow for the introduction of the ncAA residues into the POI.
There are different types of polypeptide segments which can serve as APs in the present invention. One type of APs are polypeptide segments which are derived from, and function in the context of the fusion protein as, intracellular targeting polypeptides (IC-TPs). These IC- TP segments may comprise the (entire) amino acid sequence, or a function fragment, of an IC-TP. IC-TPs target, and thus become locally enriched at, intracellular structural elements within, or directly adjacent to, the cytoplasm. Examples of such structural elements include microtubules, the cytoplasmic side of membranes such as the cell membrane, the nuclear membrane, the mitochondrial membrane, the Golgi membrane, the ER membrane, etc.
Accordingly, in particular embodiments, the fusion protein of the present invention comprises at least one IC-TP segment that targets, and facilitates local enrichment of the fusion protein at, microtubules, in particular the plus end or the minus end of the microtubules). For instance, dyneins and kinesins (proteins of the dynein or kinesin family of proteins), and functional fragments and mutants thereof, can be used as IC-TPs for such function.
In further particular embodiments, the fusion protein of the present invention comprises at least one IC-TP segment that is derived from, and functions as, a membrane anchor. For example, the fusion protein of the present invention comprises at least one IC-TP segment that targets, and facilitates local enrichment of the fusion protein at, the (inner) cell membrane (in particular the cytoplasmic side of the cell membrane). In another example, the fusion protein of the present invention comprises at least one IC-TP segment that targets, and facilitates local enrichment of the fusion protein at, the (outer) nuclear membrane (in particular the cytoplasmic side of the nuclear membrane). In further particular embodiments, the fusion protein of the present invention comprises at least one IC-TP segment that targets, and facilitates local enrichment of the fusion protein at, the outer mitochondrial membrane (in particular the cytoplasmic side of the mitochondrial membrane). In further particular embodiments, the fusion protein of the present invention comprises at least one IC-TP segment that targets, and facilitates local enrichment of the fusion protein at, the outer ER membrane (in particular the cytoplasmic side of the ER membrane). In further particular embodiments, the fusion protein of the present invention comprises at least one IC-TP segment that targets, and facilitates local enrichment of the fusion protein at, the outer Golgi membrane (in particular the cytoplasmic side of the Golgi membrane). For instance, the transmembrane domain of membrane proteins, and functional fragments and mutants thereof, can be used as IC-TPs for such function.
Polypeptides which target, and thus become locally enriched at, intracellular structural elements as described above, are known in the art and are useful as IC-TPs in the present invention. Specific examples of suitable IC-TPs include, but are not limited to:
optionally truncated kinesin polypeptides which constitutively move towards, and become locally enriched at, microtubule-plus ends in living cells, for example optionally truncated kinesin family member 16B (KIF16B), e.g. optionally truncated Homo sapiens KIF16B (Uniprot: Q96L93), in particular the fragment covering KIF16B amino acid residues 1-400 (KIFI6B1-400) comprising the amino acid sequence of SEQ ID NO:20; or optionally truncated kinesin family member 13A (KIF13A), e.g. optionally truncated Homo sapiens KIF13A (Uniprot: Q9H1 H9), in particular the KIF13A fragment covering amino acid residues 1-411 wherein P390 is deleted (KI F1 3AI-4H ,A39O) comprising the amino acid sequence of SEQ ID NO:22; polypeptides EB1 , a microtubule tip binding protein, that binds to growing microtubule plus ends (Nehlig A, Molina A, Rodrigues- Ferreira S, Honore S, Nahmias C. Regulation of end-binding protein EB1 in the control of microtubule dynamics. CeU Moί Life ScL 2017;74(13)2381-2393. doi:10.1007/s00018- 017-2476-2) (Uniprot:Q15691) and hence targets the organelle to microtubule-plus ends and comprising the amino acid sequence of SEQ ID NQ:302 polypeptides targeting the outer mitochondrial membrane derived from transmembrane-proteins such as, e.g., optionally truncated translocase of outer mitochondrial membrane 20 (TOMM20), for example optionally truncated Homo sapiens TOMM20 (Uniprot: Q15388), in particular the fragment covering amino acid residues 1-70 of TOMM20 (TOMM20I -7O) comprising the amino acid sequence of SEQ ID NO:24;
cell membrane-targeting polypeptides derived from transmembrane-proteins such as, e.g., lymphocyte-specific protein tyrosine kinase (LcK; e.g., Mus musculus LcK, Uniprot: P06240), CD4 (e.g., Mus musculus CD4, Uniprot: P06332), FRB (similar to Homo sapiens mTOR; Uniprot: P42345), CD28 (e.g., Mus musculus CD28, Uniprot: P31041) and combinations thereof, in particular polypeptides comprising the amino acid sequence of SEQ ID NO:26, SEQ ID NO:28 or SEQ ID NO:30;
polypeptides CG1 , a nucleoporin that binds to the cytoplasmic side of the nuclear pore complex (Fernandez-Martinez J, Kim SJ, Shi Y, et al. Structure and Function of the Nuclear Pore Complex Cytoplasmic mRNA Export Platform. Cell. 2016;167(5):1215— 1228.e25. doi: 10.1016/j.celL2016.10.028) (also designated Nup42) (Uniprot:O15504) targeting the cytoplasmic side of the nuclear membrane comprising the amino acid sequence of SEQ ID NO:304
polypeptides EBAG9, Golgi membrane protein with one transmembrane helix (Engelsberg A, Hermosilla R, Karsten U, SchCilein R, Dorken B, Rehm A. The Golgi protein RCAS1 controls cell surface expression of tumor-associated O-linked glycan antigens. J Biol Chem. 2003;278(25):22998-23007. doi:10.1074/jbc.M301361200
(Uniprot:000559) targeting the cytoplasmic side of the Golgi membrane comprising the amino acid sequence of SEQ ID NO:292 (full length) or comprising the first 29 N- terminal amino acid residues of SEQ ID NO:294; or polypeptides CMP Sia Tr, the CMP sialic acid transporter, a Golgi protein with 10 transmembrane helices (Eckhardt M, Gotza B, Gerardy-Schahn R. Membrane topology of the mammalian CMP-sialic acid transporter. J Biol Chem. 1999;274(13):8779-8787. doi: 10.1074/j bc.274.13.8779) (Uniprot: P78382) targeting the cytoplasmic side of the Golgi membrane comprising the amino acid sequence of SEQ ID NO:296
polypeptide fragments of P450 2C1 , a endoplasmic reticulum resident protein (Fazal FM, Han S, Parker KR, et al. Atlas of Subcellular RNA Localization Revealed by APEX- Seq. Cell. 2019;178(2):473^90.e26. doi:10.1016/j.cell.2019.05.027) (Uniprot: P78382) targeting the cytoplasmic side of the ER membrane in particular a fragment comprising the N-terminal first 27 (SEQ ID NO:298); or the first 29 (SEQ ID NO:300;)amino acid residues
The transmembrane protein stomatin-like protein 3 (SLP-3) (membrane comprising the amino acid sequence of SEQ ID NO:310; aa 1-59 (Homo sapiens, Uniprot: Q8TAV4), localizing to the plasma membrane and vesicular membranes (Lapatsina L, Jira JA, Smith ES, et al. Regulation of ASIC channels by a stomatin/STOML3 complex located in a mobile vesicle pool in sensory neurons. Open Biol. 2012;2(6):120096. doi: 10.1098/rsob.120096)
as well as functional fragments and mutants of these polypeptides. Said functional fragments and mutants may have at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid of the polypeptide they are derived from.
A further type of APs are polypeptide segments, which are derived from, and function in the context of the fusion protein as, phase separation polypeptides (PSPs). PSPs are polypeptides, which have the ability to self-assemble in the cytoplasm of a cell so as to create sites of high local concentration in the cytoplasm. Specifically, PSPs are able to drive phase separation (in particular liquid-liquid phase separation) leading to the formation of membrane-less compartments in the cytoplasm. Said compartments may take the form of droplets, aggregates, condensates or a dense phase. In particular, PSPs include intrinsically disordered proteins (I DPs) which are an important class of proteins that drive phase separation (see, e.g., Alberti et a!., Bioessays 2016, 38:959-968 and references cited therein such as Patel et a/., Cell 2015, 162:1066-1077; Han et a/., Cell 2012, 149:768-779; Kato et al., Cell 2012, 149:753-767). There are three different classes of ICPs, proteins of each, or functional fragments or mutants thereof, can be used as PSPs in the present invention. One prominent class of IDPs contains so called prion-like domains which are devoid of charges and contain polar amino acid residues (Q, N, S, G) with interspersed aromatic residues (F, Y). See, e.g., Malinovska et al., Biochim Biophys Acta 2013, 1834:918-931 ; Alberti et al., 2009, Cell 137:146-158, Malinovska et al., Prion 2015, 9:339-346. Another class of IDPs is also characterized by low sequence complexity but frequently contains acidic and basic amino acid side chains, e.g. RGG repeat containing IDPs such as Ddx4. See Nott et aL, Cell 2015, 57:936-947. Specific examples of suitable IC-TPs include, but are not limited to:
spindle-defective protein 5 (SPD5) (e.g., Caenorhabditis elegans SPD5; Uniprot: P91349), in particular a polypeptide comprising the amino acid sequence of SEQ ID NO:32;
fused-in sarcoma (FUS) (e.g., Homo sapiens FUS; Uniprot: P35637), in particular a polypeptide comprising the amino acid sequence of SEQ ID NO:34;
Ewing sarcoma breakpoint region 1 (EWSR1) (e.g., Homo sapiens EWSRt, Uniprot: Q01844) , in particular a polypeptide comprising the amino acid sequence of SEQ ID NO:36;
ATP-dependent RNA helicase laf-1 (RGG domain, 1-168, LAF-1 membrane comprising the amino acid sequence of SEQ ID NO:308;) ( Caenorhabditis elegans, Uniprot: D0PV95), (Schuster BS, Reed EH, Parthasarathy R, et al. Controllable protein phase separation and modular recruitment to form responsive membraneless organelles. Nat Commun. 2018;9(1):2985. Published 2018 Jul 30. doi:10.1038/s41467-018-05403-1) as well as functional fragments and mutants of these polypeptides. Said functional fragments and mutants may comprise at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid of the polypeptide they are derived from.
The number of APs comprised by fusion proteins of the present invention is not particularly limited, i.e. a fusion protein may comprise 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10 or more same or different APs. Fusion proteins of the present invention which comprise at least one AP selected from IC-TP segments and at least one AP selected from PSP segments are particularly preferred. Likewise, the number of RNA-TP segments is not particularly limited and may be independently selected from 1 , 2, 3, 4, 5 or more, as for example 6, 7, 8, 9 or 10, different or same RNA-TP segments. Likewise, the number of O-RS segments is not particularly limited and may be independently selected from 1 , 2, 3, 4, 5 or more, as for example 6, 7, 8, 9 or 10, different or same O-RS segments. This applies to both AFPs as well as to RNA-TP/O-RS fusion proteins. The number of segments in the fusion proteins of the present invention of course influences the size of the fusion protein that is not particularly limited but typically less than 3500 amino acid residues, such as less than 3000 amino acid residues. The order of the segments within the fusion proteins of the invention is not particularly limited either. The RNA-TP, O-RS and/or AP segments may thus be functionally linked in any order. Examples of RNA-TP/O-RS fusion protein structures (comprising both types of EP segments) include, but are not limited to,
[RNA-TP]x- [0-RS]y
[0-RS]y- [RNA-TP]x
wherein x and y, independently of each other, are integers selected from 1 , 2, 3, 4 and 5; designates a peptidic linkage.
[RNA-TP]x for x³2 may include the same or different RNA-TP segments. [0-RS]y for y³2 may include the same or different O-RS segments.
Examples of RNA-TP/O-RS fusion protein structures include, but are not limited to:
[IC-TP]m - [EP]o
[EP]o - [IC-TP]m
[PSP]n - [EP]o
[EP]o - [PSP]n
[IC-TP]m - [EP]o - [PSP]n
[PSP]n - [EP]o - [IC-TP]m
[IC-TP]m - [PSP]n - [EP]o
[EP]o - [PSP]n - [IC-TP]m
[PSP]n - [IC-TP]m - [EP]o
[EP]o - [IC-TP]m - [PSP]n
wherein m, n and o, independently of each other, are integers selected from 1 , 2, 3, 4 or 5, or are selected from 1 , 2, 3, 4, 5, 6 and
Figure imgf000023_0001
designates a peptidic linkage.
In a preferred embodiment“m” is the integer 1.
In another preferred embodiment“n” is an integer selected from 1 and 2.
In still another preferred embodiment“o” is an integer selected from 1 , 2, 3, 4, 5 or 6 if EP is selected from RNA-TPs. In still another preferred embodiment“o” is an integer selected from 1 or 2, if EP is selected from O-RSs.
In still another preferred embodiment of RNA-TP/O-RS fusion protein structures those are preferred wherein at least one ICT-TP takes a C- or N- terminal position within the polypeptide chain.
In still another preferred embodiment of RNA-TP/O-RS fusion protein structures those are preferred wherein at least one EP takes a C- or N- terminal position within the polypeptide chain.
In still another preferred embodiment of RNA-TP/O-RS fusion protein structures those are preferred wherein at least one ICT-TP takes a C- or N- terminal position within the polypeptide chain while at least one EP takes a N- or C- terminal position, respectively, within the polypeptide chain. Any PSP, if present in such structure, is positioned within the polypeptide chain.
[IC-TP]m for m³2 may include the same or different IC-TP segments. Preferably IC-TPs of the same functionality (targeting the same type of cellular structure (as for example same membrane type or type or organelle) are applied. [PSP]n for n³2 may include the same or different PSP segments. [EP]0 for o³2 may include the same or different EPs. Where [EP]0 includes different EPs, for example at least one EP may be a RNA-TP segment and at least one may be an O-RS segment.
The fusion proteins of the present invention provide an orthogonal translation (OT) system wherein the one or more O-RS (segments) required for the introduction of the one or more ncAA residues into the POI are brought into spatial proximity to at least one RNA-targeting polypeptide (RNA-TP) segment. The mRNA of the POI comprises at least one targeting nucleotide sequence (TN) that is able to interact with an RNA-TP segment of at least one of the fusion proteins of the OT system. Said interaction is expediently a specific one. The RNA-TP segments of the fusion proteins of the invention are preferably mRNA-targeting polypeptide segments. The RNA-TP segment of the fusion protein and the TN of the POI mRNA are expediently chosen so as to specifically interact with (bind to) one another. Suitable pairs of RNA-TP segment and TN for this purpose can be selected from coat proteins of RNA viruses and the nucleic acid motifs bound by said coat proteins. Such viral coat proteins and protein-bound RNA motifs are known in the art.
Specific examples of suitable RNA-TPs include, but are not limited to:
MCP (coat protein of Enterobacteria phage MS2), in particular a polypeptide comprising the amino acid sequence of SEQ ID NO: 14;
AN22 (22 amino acid RNA-binding domain of lambda phage antiterminator protein N), in particular a polypeptide comprising the amino acid sequence of SEQ ID NO:16;
POP (coat protein of Bacteriophage PP7, Wu B, Chao JA, Singer RH. Fluorescence fluctuation spectroscopy enables quantitative imaging of single mRNAs in living cells. Biophys J. 2012;102(12)2936-2944. doi:10.1016/j.bpj.2012.05.017), in particular a polypeptide comprising the amino acid sequence of SEQ ID NO:306;
as well as functional fragments and mutants of these polypeptides. Said functional fragments and mutants may comprise at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid of the polypeptide they are derived from.
Specific examples of suitable TNs include, but are not limited to:
Enterobacteria phage MS2 RNA stem-loop, in particular a polynucleotide having an RNA sequence corresponding to (encoded by) the nucleotide (DNA) sequence of SEQ ID NO:17;
BoxB (lambda phase RNA stem-loop, specific binding site of AN22), in particular a polynucleotide having an RNA sequence corresponding to (encoded by) the nucleotide (DNA) sequence of SEQ ID NO:18;
Bacteriophage pp7 RNA stem loops (Wu B, Chao JA, Singer RH. Fluorescence fluctuation spectroscopy enables quantitative imaging of single mRNAs in living cells. Biophys J. 2012; 102(12)2936-2944. doi:10.1016/j.bpj.2012.05.017) in particular a polynucleotide having an RNA sequence corresponding to (encoded by) the nucleotide (DNA) sequence of SEQ ID NO:289 or SEQ ID NO:290 as well as functional fragments and mutants thereof. Said functional fragments and mutants may comprise at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% nucleotide sequence identity to the polynucleotide sequences they are derived from. Such TNs may be used as a single copy segment or as multiple copy segment composed of more than one, as for example two, three, four, five, six or more repetitive units of the TN.
MCP specifically interacts with MS2 RNA stem-loops. Thus, where the RNA-TP segment(s) of the fusion protein(s) comprise (consist of) segments which are derived from, and function as, MCP, the mRNA of the POI expediently comprises one or more MS2 RNA stem-loops, e.g. two, three, four, five or six MS2 RNA stem-loops. AN22 specifically interacts with BoxB. Thus, where the RNA-TP segment(s) of the fusion protein(s) comprise (or consist of) segments which are derived from, and function as, AN22, the mRNA of the POI expediently comprises one or more BoxB motifs, e.g. one, two, three, four, five or six or more BoxB motifs. PCP specifically interacts with pp7 RNA stem-loops. Thus, where the RNA-TP segment(s) of the fusion protein(s) comprise (consist of) segments which are derived from, and function as, PCP, the mRNA of the POI expediently comprises one or more pp7 RNA stem-loops, e.g. two, three, four, five or six or more pp7 RNA stem-loops.
Several RSs have been used for genetic code expansion including the Methanococcus jannaschii tyrosyl-tRNA synthetase, E.coli tyrosyl-tRNA synthetase, E.coli leucyl-tRNA synthetase pyrrolysyl-tRNA synthetases from certain Methanosarcina (such as M. mazei, M. barkeri, M. acetivorans, M. thermophila), Methanococcoides (M. burtonii ) or Desulfitobacterium ( D . hafniense ). Corresponding orthogonal RS/tRNA pairs have been used to genetically encode a variety of functionalities in polypeptides (Chin, Annu Rev Biochem 2014, 83:379-408; Chin et al., J Am Chem Soc 2001 , 124:9026; Chin et al., Science 2003, 301 :964; Nguyen et al., J Am Chem Soc 2009, 131 :8720; Yanagisawa et al., Chem Biol 2008, 15:1187). Depending on the cell used for the translation of the POI, these RS can be used as O-RS in the present invention.
Pyrrolysyl tRNA synthetases (PylRSs) which can be used in methods and fusion proteins of the invention may be wildtype or genetically engineered PylRSs. Examples for wildtype PylRSs include, but are not limited to PylRSs from archaebacteria and eubacteria such as Methanosarcina maize, Methanosarcina barkeri, Methanococcoides burtonii, Methanosarcina acetivorans, Methanosarcina thermophila and Desulfitobacterium hafniense. Genetically engineered PylRSs have been described, for example, by Neumann et al. (Nat Chem Biol 2008, 4:232), by Yanagisawa et al. (Chem Biol 2008, 15:1187), and in EP2192185A1. The efficiency of genetic code expansion using PylRS can be increased by modifying the amino acid sequence of the PylRS such that it is not directed to the nucleus. To this end, the nuclear localization signal (NLS) can be removed from the PylRS or can be overridden by introducing a suitable nuclear export signal (NES). PylRSs which are used in the fusion proteins and methods of the present invention may be PylRSs lacking the NLS and/or comprising a NES as described, e.g., in WO 2018/069481.
Accordingly, examples of O-RS segment(s) which can be used in the fusion proteins of the present invention include, but are not limited to:
Methanococcus jannaschii tyrosyl-tRNA synthetase;
Escherichia coli tyrosyl-tRNA synthetase;
Escherichia coli leucyl-tRNA synthetase;
Methanosarcina mazei pyrrolysyl-tRNA synthetase;
Methanosarcina barkeri pyrrolysyl-tRNA synthetase;
Methanosarcina acetivorans pyrrolysyl-tRNA synthetase;
Methanosarcina thermophila pyrrolysyl-tRNA synthetase;
Methanococcoides burtonii pyrrolysyl-tRNA synthetase;
Desulfitobacterium hafniense pyrrolysyl-tRNA synthetase;
as well as functional (i.e., enzymatically active) fragments and mutants of these polypeptides. Said functional fragments and mutants may comprise at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the aminoacyl tRNA synthetase they are derived from.
Particular examples of O-RS segments useful as in the present invention which are derived from M. mazei pyrrolysyl-tRNA synthetases include, but are not limited to:
O-RS segments derived from PylRSAF ( Methanosarcina mazei pyrrolysyl tRNA synthetase double mutant: Y306A, Y384F; Uniprot: Q8PWY1), for example O-RS segments comprising the amino acid sequence of SEQ ID NO:8;
O-RS segments derived from PylRS^ ( Methanosarcina mazei pyrrolysyl tRNA synthetase double mutant: N346A, C348A; Uniprot: Q8PWY1), for example O-RS segments comprising the amino acid sequence of SEQ ID NO: 10;
O-RS segments derived from PylRS^1 ( Methanosarcina mazei pyrrolysyl tRNA synthetase quadruple mutant: Y306A, N346A, C348A, Y384F; Uniprot: Q8PWY1), for example O-RS segments comprising the amino acid sequence of SEQ ID NO:12; O-RS segment derived from IFRS1 , a Methanosarcina mazei pyrrolysyl tRNA mutant (L305M, Y306L, L309S, N346S, C348M), for example O-RS segments comprising the amino acid sequence of SEQ ID NO:224
O-RS segment derived from CbzRS, a Methanosarcina mazei pyrrolysyl tRNA mutant (Y306M, L309G, C348T), for example O-RS segments comprising the amino acid sequence of SEQ ID NO:226
O-RS segment derived from CpkRS, a Methanosarcina mazei pyrrolysyl tRNA mutant (A302S), for example O-RS segments comprising the amino acid sequence of SEQ ID NO:228
O-RS segment derived from OMeRS, a Methanosarcina mazei pyrrolysyl tRNA mutant: (A302T, Y384F, N346V, C348W, V401 L), for example O-RS segments comprising the amino acid sequence of SEQ ID NO:236
as well as functional (i.e., enzymatically active) fragments and mutants of these polypeptide segments. Said functional fragments and mutants may comprise at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the aminoacyl tRNA synthetase they are derived from.
According to particular embodiments, wild-type and mutant M. mazei PylRSs as described herein are used for aminoacylation of tRNA with ncAAs as described in WO2012/104422 or WO2015/107064. Exemplary ncAAs for this purpose include, but are not limited to, 2-amino- 6-(cyclooct-2-yn-1-yloxycarbonylamino)hexanoic acid (SCO), 2-amino-6-(cyclooct-2-yn-1- yloxyethoxycarbonylamino)hexanoid acid, 2-amino-6[(4E-cyclooct-4-en-1-yl)oxycarbonyl- amino]hexanoic acid (TOO), 2-amino-6[(2E-cyclooct-2-en-1-yl)oxycarbonylamino]hexanoic acid (TOO*), 2-amino-6-(prop-2-ynoxycarbonylamino)hexanoic acid (PrK) and 2-amino-6-(9- biocyclo[6.1.0]non-4-ynylmethoxycarbonylamino)hexanoid acid (BCN).
In another embodiment of the present invention, the above-mentioned AP (IC-TP and PSP) segments and/or the above mentioned EP (RNA-TP and O-RS) segments, independently of each other, may be further combined with natural or, more particularly, synthetic protein segments, which induce and control macromolecular interactions. In particular, such further protein segments are operably fused into the polypeptide chain of an AFP of the invention. One or more, like 2, 3, 4, 5, 6, 7, 8, 9 or 10, preferably however one such protein segment may be operably fused into a single AFP of the invention. Fusion into the AFP polypeptide chain should be such that the activity of the other polypeptide segments, AP and EP, is substantially unaffected, in particular not inhibited (i.e. AP and EP remain operable), while the ability of the additional polypeptide segment to induce and control macromolecular interactions is retained. Described in literature are so-called SYNZIP peptides, forming multimeric structures. Of particular interest in the context of the invention are SYNZIPs having the ability to form specific heterodimeric coiled-coil protein structures. Such SYNZIPs are pairs of synthetic peptides capable of interacting with each other and are used to induce and control macromolecular interactions. Non-limiting examples are the pairs SYNZIP 1 :2; SYNZIP 3:4 and SYNZIP 5:6. Particularly preferred according to the invention is the heterospecific coiled-coil pair SYNZIP2:SYNZIP1 as described by Reinke, A.W., Grant, R.A., Keating, A.E. (2010) J Am Chem Soc 132 6025-6031 (SYNZIP 1 : SEQ ID NO:312; SYNZIP 2: SEQ ID NO:314, SYNZIP 3: SEQ ID NO:316; SYNZIP 4: SEQ ID NO:318, as well as functional fragments and mutants of these SYNZIP polypeptides. Said functional fragments and mutants may comprise at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid of the polypeptide they are derived from). As a pairwise use is required to induce macromolecular interaction, these SYNZIPs are preferably used pairwise in AFP combinations as described herein. By the interaction of such SYNZIP pairs integrated in different AFP fusion proteins the formation of OT organelles according to the present invention may be further supported.
In still another embodiment of the present invention a fusion protein of the invention may be further modified by introducing into (fusing of) at least one so-called“epitope tag”, i.e. a short oligopeptide sequence, which serves as antibody binding sites, useful for detecting/quantifying the expressed fusion products of the invention. Non-limiting examples of such tags are the following:
VSV-G: Vesicular stomatitis virus glycoprotein epitope tag (SEQ ID NO:680)
HA: Human influence hemagglutinin epitope tag (SEQ ID NO:682)
Myc: Human c-Myc proto-oncogene epitope tag (SEQ ID NO:684)
1.2 Particular examples of AFP constructs of the invention
Each individual exemplified construct may be construed in the N->C or C->N direction. The depicted schemes are given in the N->C direction. In the case of segment blocks [IC-TP]m, [PSP]n, [0-RS]y and [RNA-TP]X, wherein m, n, y or x are an integer >1 , the repetitive segments within such block may be identical or different, preferably identical.
The segments [IC-TP], [PSP], [O-RS], [RNA-TP]X, and [SYNZIP] as applied therein may be prepared from the respective examples of segments described above in section 1.1..
1.2.1. Intracellular structure-targeting monofunctional AFPs
1.2.1.1 Intracellular structure-targeting monofunctional AFPs (i.e. comprising one type of EP)
Individually preferred examples thereof are:
[IC-TP]m - [0-RS]y with m =1 or 2, preferably 1 ; y = 1 or 2, preferably 1 ;
[IC-TP]m - [RNA-TP]X with m =1 or 2, preferably 1 ; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4;
[IC-TP]m - [PSP]n - [0-RS]y with m =1 or 2, preferably 1 ; n =1 , 2 or 3, preferably 1 or 2; y = 1 or 2, preferably 1 ;
[IC-TP]m - [PSP]n - [RNA-TP]X with m =1 or 2, preferably 1 ; n =1 , 2 or 3, preferably 1 or 2; x = 1 , 2, 3, 4, 5 or 6, preferably 2, 3 or 4;
[IC-TP]m - [0-RSi]y - [PSP]n - [0-RS2]y with m =1 or 2, preferably 1 ; n =1 , 2 or 3, preferably 1 or 2; y independently of each other = 1 or 2, preferably 1 ; and O-RS1 and O-RS2 identical or different, preferably identical;
[IC-TP]m - [PSPi]n - [0-RS]y - [PSP2]n with m =1 or 2, preferably 1 ; n independently of each other 1 , 2 or 3, preferably 1 or 2; y independently of each other = 1 or 2, preferably 1 ; and PSPi and PSP2 identical or different;
[IC-TP]m - [RNA-TPi]x - [PSP]n - [RNA-TP2]X with m =1 or 2, preferably 1 ; n =1 , 2 or 3, preferably 1 or 2; x independently of each other = 1 , 2, 3, 4, 5 or 6, preferably 2, 3 or 4; and RNA-TP1 and RNA-TP2 identical or different, preferably identical; [IC-TP]m - [PSPi]n - [0-RSi]y - [PSP2]n - [0-RS2]y with m =1 or 2, preferably 1 ; n independently of each other 1 , 2 or 3, preferably 1 or 2; y independently of each other = 1 or 2, preferably 1 ; O-RS1 and 0-RS2 identical or different, preferably identical; and PSPi and PSP2 identical or different;
[IC-TP]m - [PSPi]n - [RNA-TPi]x - [PSP2]n - [RNA-TP2]X with m =1 or 2, preferably 1 ; n independently of each other =1 , 2 or 3, preferably 1 or 2; x independently of each other = 1 , 2, 3, 4, 5 or 6, preferably 2, 3 or 4; RNA-TPi and RNA-TP2 identical or different; and PSPi and PSP2 identical or different.
1.2.1.2 Intracellular structure-targeting bifunctional AFPs (comprising two types of EP)
Individually preferred examples thereof are
[IC-TP]m - [0-RS]y - [RNA-TP]x with m =1 or 2, preferably 1 ; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4; y = 1 or 2, preferably 1 ;
[IC-TP]m - [RNA-TP]*- [0-RS]y with =1 or 2, preferably 1 ; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4; y = 1 or 2, preferably 1 ;
[IC-TP]m - [PSP]n - [0-RS]y - [RNA-TP]x with m =1 or 2, preferably 1 ; n =1 , 2 or 3, preferably 1 or 2; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4; y = 1 or 2, preferably 1 ;
[IC-TP]m - [PSP]n - [RNA-TP]*- [0-RS]y with m =1 or 2, preferably 1 ; n =1 , 2 or 3, preferably 1 or 2; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4; y = 1 or 2, preferably 1 ;
[IC-TP]m - [0-RS]y - [PSP]n - [RNA-TP]x with m =1 or 2, preferably 1 ; n =1 , 2 or 3, preferably 1 or 2; ; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4; y = 1 or 2, preferably 1 ;
[IC-TP]m - [RNA-TP]x - [PSP]n - [0-RS]y with m =1 or 2, preferably 1 ; n =1 , 2 or 3, preferably 1 or 2; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4; y = 1 or 2, preferably 1 ;
[IC-TP]m - [PSPi]„ - [0-RS]y - [PSP2]n - [RNA-TP]x with m =1 or 2, preferably 1 ; n independent of each n=1 , 2 or 3, preferably 1 or 2; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4; y = 1 or 2, preferably 1 ; and PSPi and PSP2 identical or different; [IC-TP]m - [PSPi]n - [RNA-TP]x - [0-RSi]y - [PSP2]n - [0-RS2]y with m =1 or 2, preferably 1 ; n independent of each =1 , 2 or 3, preferably 1 or 2; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4; y independently of each other = 1 or 2, preferably 1 ; and PSPi and PSP2 identical or different; and O-RS1 and 0-RS2 identical or different, preferably identical;
[IC-TP]m - [PSPi]„ - [0-RSi]y - [PSP2]n - [0-RS2]y - [RNA-TP]x with m =1 or 2, preferably 1 ; n independent of each =1 , 2 or 3, preferably 1 or 2; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4; y independently of each other = 1 or 2, preferably 1 ; and PSPi and PSP2 identical or different; and O-RS1 and 0-RS2 identical or different, preferably identical.
1.2.2. No Intracellular structure-targeting monofunctional AFPs
These are the same AFPs as listed above in section 1.2.1 , with the only exception that the segments [IC-TP] is missing, while the segments [PSP] are retained.
1.2.3. SYNZIP Variants
These are the same AFPs as listed above in sections 1.2.1 and 1.2.2 with the only exception that at least one of the segment [IC-TP], [PSP], [0-RS2] or [RNA-TP] is N- or C- terminally supplemented with a SYNZIP element. An AFP may contain, 1 , 2, 3, 4 or 5, preferably 1 or 2, identical or different, preferably identical SYNZIPs. Non-limiting examples of such molecules are:
1.2.3.1 Monofunctional SYNZIP AFPs
Individually preferred examples thereof are:
[PSP]n- [SYNZIP] - [0-RS]y with y = 1 or 2, preferably 1 ; n =1 , 2 or 3, preferably 1 or 2;
[PSP]n- [SYNZIP]- [RNA-TP]x with; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4; n =1 , 2 or 3, preferably 1 or 2;
[IC-TP]m- [SYNZIP] - [0-RS]y with m =1 or 2, preferably 1 ; y = 1 or 2, preferably 1 ; [IC-TP]m - [SYNZIP]- [RNA-TP]x with m =1 or 2, preferably 1 ; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4;
[IC-TP]m - [PSP]n- [SYNZIP] - [0-RS]y with m =1 or 2, preferably 1 ; n =1 , 2 or 3, preferably 1 or 2; y = 1 or 2, preferably 1 ;
[IC-TP]m - [PSP]n - [SYNZIP] - [RNA-TP]x with m =1 or 2, preferably 1 n =1 , 2 or 3, preferably 1 or 2; x = 1 , 2, 3, 4, 5 or 6, preferably 2, 3 or 4.
1.2.3.2 Bifunctional SYNZIP AFPs
Individually preferred examples thereof are:
[IC-TP]m - [0-RS]r [SYNZIP] - [RNA-TP]X with m =1 or 2, preferably 1 ; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4; y = 1 or 2, preferably 1 ;
[IC-TP]m - [RNA-TP]^- [SYNZIP]- [0-RS]y with m =1 or 2, preferably 1 ; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4; y = 1 or 2, preferably 1 ;
[IC-TP]m - [PSP]n- [SYNZIP] - [0-RS]y - [RNA-TP]X with m =1 or 2, preferably 1 ; n =1 , 2 or 3, preferably 1 or 2; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4; y = 1 or 2, preferably 1 ;
[IC-TP]m - [PSP]n - [SYNZIP]- [RNA-TP],c- [0-RS]y with m =1 or 2, preferably 1 ; n =1 , 2 or 3, preferably 1 or 2; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4; y = 1 or 2, preferably 1 ;
[IC-TP]m - [PSP]n- [SYNZIPa] - [0-RS]y [SYNZIPb] - [RNA-TP]X with m =1 or 2, preferably 1 ; n =1 , 2 or 3, preferably 1 or 2; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4; y = 1 or 2, preferably 1 ;and SYNZIPa and SYNZIPb identical or different, preferably identical
[IC-TP]m - [PSP]n - [SYNZIPa]- [RNA-TP]^- [SYNZIPb] - [0-RS]y with m =1 or 2, preferably 1 n =1 , 2 or 3, preferably 1 or 2; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4; y = 1 or 2, preferably 1 ; and SYNZIPa and SYNZIPb identical or different, preferably identical
[IC-TP]m - [PSPi]n - [SYNZIP]- [RNA-TP]^ [O-RS^y - [PSP2]n - [0-RS2]y with m =1 or 2, preferably 1 n =1 , 2 or 3, preferably 1 or 2; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4; y = 1 or 2, preferably 1 ; and PSPi and PSP2 identical or different; and O-RS1 and O-RS2 identical or different, preferably identical.
1.2.4. Monofunctional fusion proteins
Individually preferred examples thereof are:
[SYNZIP] - [0-RS]y with y = 1 or 2, preferably 1 ;
[SYNZIP]- [RNA-TP]x with; x = 1 , 2, 3, 4, 5 or 6 , preferably 2, 3 or 4;
As IC-TP and PSP is missing here, these may be preferably used in combination with an AFP molecule containing at least one C-TP and/or PSP segment.
1.3 Examples of individual fusion proteins
Very specific examples of fusion protein of the inventions, and particular combinations thereof are listed below in Tables 1 , 2 and 3. The content of this Tables 1 , 2 and 3 also forms part of general disclosure of the specification and its content is not explicitly and literally repeated here in the general part. The disclosure of Tables 1 and 2 in the respective column designated“Fusion protein(s) comprising O-RS and RNA-TP segments" shall be considered as disclosed independently from the content of the other columns of Tables 1 and 2 referring to specific reports and host cell lines.
2. Functional fragments and mutants
Described herein are fragments and mutants of particular RNA-TPs, O-RSs, IC-TPs, PSPs, TNs, as well as SYNZIPs which are functional (i.e. have the RNA-binding activity of the parent RNA-TP, the targeting activity for intracellular structures of the parent IC-TP, the self- assembly activity of the parent PSP, the binding activity for RNA-TP of the parent TN, the enzymatic activity of the parent O-RS, or the heterodimeric coiled-coil formation ability of parent SYNZIPs, respectively). Such fragments and mutants can be characterized by a minimum degree of sequence identity as described herein. Said amino acid or nucleotide sequence identity means identity over the entire length of the thus characterized amino acid or nucleotide sequence, respectively. The percentage identity values can be determined as known in the art on the basis of BLAST alignments, blastp algorithms (protein-protein BLAST), or using the Clustal method (Higgins et ai, Comput Appl. Biosci. 1989, 5(2): 151 -1 ).
Fragments and mutants of particular RNA-TPs, O-RSs, IC-TPs, SYNZIPS or PSPs which are useful in the present invention retain the relevant function (binding, self-assembly or enzymatic activity, respectively) of the parent polypeptide and can be obtained, e.g., by conservative amino acid substitution, i.e. the replacement of an amino acid residue with different amino acid residues having similar biochemical properties (e.g. charge, hydrophobicity and size) as known in the art. Typical examples are substitution of Leu by lie or vice versa, substitution of Asp by Glu or vice versa, substitution of Asn by Gin or vice versa, and others.
3. Orthogonal translation, tRNAs and POI coding sequences
The term "translation system" generally refers to a set of components necessary to incorporate a naturally occurring amino acid in a growing polypeptide chain (protein). Components of a translation system can include, e.g., ribosomes, tRNAs, aminoacyl tRNA synthetases, mRNA and the like. An aminoacyl tRNA synthetase (RS) is an enzyme capable of aminoacylating a tRNA with an amino acid or an amino acid analog. An RS used in processes of the invention is capable of aminoacylating a tRNA with the corresponding ncAA, i.e. aminoacylating a tRNAncAA. The term "orthogonal" as used herein refers to an element of a translation system (e.g., an orthogonal tRNA (O-tRNA) and/or an orthogonal aminoacyl tRNA synthetase (O-RS)) that is used with reduced efficiency by a translation system of interest (e.g., a cell).“Orthogonal" refers to the inability or reduced efficiency, e.g., less than 20% efficient, less than 10% efficient, less than 5% efficient, or e.g., less than 1 % efficient, of an O-tRNA or an O-RS to function with the endogenous RS or endogenous tRNAs, respectively, of a translation system of interest. For example, an O-tRNA in a translation system of interest is aminoacylated by any endogenous RA of the translation system with reduced or even zero efficiency, when compared to aminoacylation of an endogenous tRNA by the endogenous RS. In another example, an O-RS aminoacylates any endogenous tRNA in the translation system of interest with reduced or even zero efficiency, as compared to aminoacylation of the endogenous tRNA by an endogenous RS. Specifically, the term “orthogonal translation system” or “OT system” is used herein to refer to a translation system using an 0-RS/0-tRNAncAA pair that allows for introducing ncAA residues into a growing polypeptide chain.
0-RS/0-tRNAncAA pairs used in the invention preferably have following properties: the O- tRNAncAA is preferentially aminoacylated with the ncAA by the O-RS. In addition, the orthogonal pair functions in the translation system of interest (e.g, the cell) such that the O- tRNAncAA is used to incorporate the ncAA residue into the growing polypeptide chain of a POI. Incorporation occurs in a site specific manner. Specifically, the 0-tRNAncAA recognizes a selector codon (e.g., an Amber, Ochre or Opal stop codon) in the mRNA coding for the POI.
The term "preferentially aminoacylates" refers to an efficiency of, e.g., about 50% efficient, about 70% efficient, about 75% efficient, about 85% efficient, about 90% efficient, about 95% efficient, or about 99% or more efficient, at which an O-RS aminoacylates an O-tRNA with an unnatural amino acid compared to an endogenous tRNA or amino acid of a translation system of interest (e.g., a cell). The unnatural amino acid is then incorporated into a growing polypeptide chain with high fidelity, e.g., at greater than about 75% efficiency for a given selector codon, at greater than about 80% efficiency for a given selector codon, at greater than about 90% efficiency for a given selector codon, at greater than about 95% efficiency for a given selector codon, or at greater than about 99% or more efficiency for a given selector codon. tRNAs which can be used for being aminoacylated by a fusion protein of the present invention comprising at least one O-RS segment derived from a M. mazei pyrrolysyl tRNA synthetase include, but are not limited to pyrrolysyl tRNA of M. mazei and functional mutants thereof wherein the anticodon is the anticodon to a selector codon such as, e.g., the CUA anticodon to the Amber stop codon TAG, the anticodon UCA to the Opal stop codon TGA, and the anticodon UUA to the Ochre stop codon TAA. Examples for such pyrrolysyl tRNAs include, but are not limited to, those encoded by the nucleotide sequence of SEQ ID NO:4 (tRNAPyl cUA), SEQ ID NO:5 (tRNAPyl uCA) or SEQ ID NO:6 (tRNAPyl uUA). Non-limiting examples of further suitable tRNAs are the following ones derived from pyrrolysyl tRNA of M. mazei: tRNApyl' CGA Pyrrolysyl tRNA (for Serine codon), SEQ ID NO: 229
tRNApyl CGG Pyrrolysyl tRNA (for Proline codon), SEQ ID NO: 230 tRNApyl' UAA Pyrrolysyl tRNA (for Leucine codon), SEQ ID NO: 231
tRNApyl' UAG Pyrrolysyl tRNA (for Leucine codon), SEQ ID NO: 232
tRNApyl CGG Pyrrolysyl tRNA (for Arginine codon), SEQ ID NO: 233
tRNApyl AUA Pyrrolysyl tRNA (for Isoleucine codon), SEQ ID NO: 234
The term“selector codon” as used herein refers to a codon that is recognized (i.e. bound) by the 0-tRNAncAA in the translation process. The term is also used for the corresponding codons in polypeptide-encoding sequences of polynucleotides which are not messenger RNAs (mRNAs), e.g. DNA plasmids. The new OT systems described herein allow for orthogonal translation of POIs in a manner that is selective for the mRNA of said POIs compared to other mRNAs present in the cytoplasm of the cell. Nevertheless, it is preferable that the selector codon is a codon of low abundance in the cell chosen for expression, for example a codon of low abundance in naturally occurring eukaryotic cells. The new OT systems bring the mRNA of the POIs, the O-RS and the tRNAncAA into proximity to one another, thus supporting the introduction of the ncAA (rather than the introduction of an amino acid of a different tRNA that might potentially bind to the selector codon) at the selector codon-encoded amino acid position of the POI. Thus, the selector codon can be a sense codon. Nevertheless, in preferred embodiments, the selector codon is a codon that is not recognized by endogenous tRNAs of the cell used for preparing the POI.
The anticodon of the 0-tRNAncAA binds to a selector codon within an mRNA (the mRNA of the POI) and thus incorporates the ncAA site-specifically into the growing chain of the polypeptide (POI) encoded by said mRNA. Examples for selector codons which are useful in the new OT systems described herein include, but are not limited to:
nonsense codons, such as stop codons, e.g., Amber (UAG), Ochre (UAA), and Opal (UGA) codons;
codons consisting of more than three bases (e.g., four base codons);
codons derived from natural or unnatural base pairs; and
sense codons.
Where a selector codon is used that is a sense codon (i.e., a natural three base codon), it is preferable that the endogenous translation system of the cell used for POI expression according to a method of the present invention does not (or only scarcely) use said natural three base codon, e.g., a cell that is lacking, or has a reduced abundance of, a tRNA that recognizes the natural three base codon or a cell wherein the natural three base codon is a rare codon. The use of one or more stop codons, such as one or more of Amber, Ochre and Opal, as selector codons in the present invention is particularly preferred.
A number of selector codons can be introduced into a polynucleotide encoding a desired polypeptide (target polypeptide, POI), e.g., one or more, two or more, more than three, etc. selector codons. A POI can carry two or more ncAA residues. Said ncAA residues can be the same and encoded by the same type of selector codon, or can be different and encoded by different selector codons.
An anticodon has the reverse complement sequence of the corresponding codon.
A suppressor tRNA is a tRNA (such as an 0-tRNAncAA) that alters the reading of a messenger RNA (mRNA) in a given translation system (e.g., a cell). A suppressor tRNA can read through, e.g., a stop codon, a four base codon, or a rare codon.
The O-tRNA is preferentially aminoacylated by O-RS (rather than endogenous synthetases) and is capable of decoding a selector codon, as described herein. The O-RS recognizes the O-tRNA, e.g., with an extended anticodon loop, and preferentially aminoacylates the O-tRNA with an ncAA.
The O-tRNA and the O-RS used in the methods and/or fusion proteins of the invention can be naturally occurring or can be derived by mutation of a naturally occurring tRNA and/or RS from a variety of organisms. In various embodiments, the tRNA and RS are derived from at least one organism. In another embodiment, the tRNA is derived from a naturally occurring or mutated naturally occurring tRNA from a first organism and the RS is derived from naturally occurring or mutated naturally occurring RS from a second organism.
A suitable (orthogonal) tRNA/RS pair may be selected from libraries of mutant tRNA and RS, e.g.. based on the results of a library screening. Alternatively, a suitable tRNA/RS pair may be a heterologous tRNA/synthetase pair that is imported from a source species into the translation system. Preferably, the cell used as translation system is different from said source species. Methods for evolving tRNA/RS pairs are described, e.g., in WO 02/085923 and WO 02/06075. Conventional site-directed mutagenesis can be used to introduce selector codons into the coding sequence of a POI.
4. Nucleic acid molecules
The invention also relates to nucleic acid molecules (single-stranded or double-stranded DNA and RNA sequences, for example cDNA, mRNA), or combinations of such nucleic acid molecules, comprising a nucleotide sequence that encodes for at least one of the fusion proteins of the present invention, and/or a nucleotide sequence complementary thereto.
Further, the invention relates to nucleic acid molecules (single-stranded or double-stranded DNA and RNA sequences, for example cDNA, mRNA), or combinations of such nucleic acid molecules, comprising (i) a nucleotide sequences (CSP01) that encodes at least one POI, said POI comprising one or more ncAA residues which are encoded in the CSP01 by selector codons, and (ii) a targeting nucleotide sequence (TN) as described herein, wherein an RNA molecule comprising (the RNA version of) said TN is able to interact via said TN with an RNA-targeting polypeptide (RNA-TP).
The nucleic acid molecules of the invention can in addition contain untranslated sequences of the 3'- and/or 5'-end of the coding gene region. The TN is preferably located at the 3' end of the nucleic acid molecule encoding the POI(s). For example, nucleic acid molecules of the invention encoding the POI(s) can be prepared by introducing at least one TN at (in particular 3' of) the 3' untranslated region using common cloning techniques known in the art.
The nucleic acid molecules of the invention can in addition contain untranslated sequences of the 3'- and/or 5'-end of the coding gene region.
The invention further relates to, in particular recombinant, expression constructs or expression cassettes, containing, under the genetic control of regulatory nucleic acid sequences the nucleic acid sequence of the nucleic acid molecule, or combination of nucleic acid molecules, of the invention as described herein. The expression cassettes of the invention thus comprise the nucleic acid sequence coding for at least one POI (plus TN) or at least one fusion protein of the invention, and/or a nucleic acid sequence complementary thereto. The invention also relates to, in particular recombinant, vectors, comprising at least one of these expression constructs (expression vectors). An expression cassette typically comprises a promoter sequence that is located 5' (upstream) of, and functionally linked with, the nucleic acid sequence encoding the to-be- expressed POI(s) or fusion protein(s), a terminator sequence 3' (downstream) of said encoding sequence and optionally further regulatory elements. Examples of such further regulatory elements include, but are not limited to, targeting sequences, enhancers, polyadenylation signals, selectable markers, amplification signals, replication origins and the like. Suitable regulatory sequences are described for example in Goeddel, Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, CA (1990).
In addition to these regulatory sequences, the natural regulation of these sequences can still be present before the actual structural genes and optionally can have been genetically altered, so that the natural regulation has been switched off and expression of the genes has been increased. The nucleic acid construct can, however, also be of simpler construction, i.e. no additional regulatory signals have been inserted before the coding sequence and the natural promoter, with its regulation, has not been removed. Instead, the natural regulatory sequence is mutated so that regulation no longer takes place and gene expression is increased.
A "functional" linkage of elements of nucleic acid molecules, such as promotor, polypeptide encoding sequence, terminator, regulators, means that these elements are arranged such that the encoding sequence can be transcribed and the optional regulatory elements can perform their regulation of said transcription. This can be achieved by a direct linkage of the elements in one and the same nucleic acid molecule. However, such direct linkage is not necessarily required. Genetic control sequences, for example enhancer sequences, can even exert their function on the target sequence from more remote positions or even from other DNA molecules. Arrangements are preferred in which the nucleic acid sequence to be transcribed is positioned downstream (i.e. at the 3'-end of) the promoter sequence, so that the two sequences are joined together covalently. The distance between the promoter sequence and the nucleic acid sequence to be expressed can be smaller than 200 base pairs, or smaller than 100 base pairs or smaller than 50 base pairs.
For expression in a cell, the expression cassette is advantageously inserted into an expression vector. Expression vectors are chosen according to the cell to be used for expression which makes optimal expression of the encoding nucleotide sequences in the cell possible. Vectors are well known by a person skilled in the art and are given for example in "Cloning vectors" (Pouwels P. H. et a!., Ed., Elsevier, Amsterdam-New York-Oxford, 1985). Examples of expression vectors include, but are not limited to, plasmids, viral vectors (phages), e.g. SV40, CMV, baculovirus and adenovirus, transposons, IS elements, phasmids, cosmids, and linear or circular DNA. See, e.g., the book "Cloning Vectors" (Eds. Pouwels P. H. et al. Elsevier, Amsterdam-New York-Oxford, 1985, ISBN 0 444 904018). These vectors can be replicated autonomously in the (host) cell or can be replicated chromosomally. Expression vectors comprising at least one expression cassette of the present invention represent a further aspect of the invention.
For the expression of a POI in a cell according to the present invention, it is possible, e.g., to introduce a nucleic acid molecule which encodes the POI (e.g. an expression vector of the invention) into the cell. Alternatively, an existing gene of the cell can be modified so as to comprise selector codons at those amino acid positions where the POI is intended to carry ncAA residues. Methods for introducing (recombinant) polypeptide-encoding nucleic acid molecules into, or for modifying existing genes of, a cell are known in the art.
The term "expression" describes, in the context of the invention, the production of polypeptides encoded by the corresponding nucleic acid sequence in a cell. The term "expression" is also used for the production of tRNA molecules encoded by nucleic acid sequences in the cell.
The nucleic acid molecules of the invention, including the expression cassettes and expression vectors of the invention can be prepared using common cloning techniques known in the art. Common recombination and cloning techniques are used, as described for example in T. Maniatis, E.F. Fritsch and J. Sambrook, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1989) and in T.J. Silhavy, M.L. Berman and L.W. Enquist, Experiments with Gene Fusions, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1984) and in Ausubel, F.M. et al., Current Protocols in Molecular Biology, Greene Publishing Assoc and Wiley Interscience (1987).
The nucleic acid molecules, or combinations of nucleic acid molecules, of the invention, including expression cassettes and expression vectors of the invention, can be isolated, for example by methods known in the art.
An "isolated" nucleic acid molecule is separated from other nucleic acid molecules that are present in the natural source of the nucleic acid, and moreover can be essentially free of other cellular material or culture medium, when it is produced by recombinant techniques, or free of chemical precursors or other chemicals, when it is chemically synthesized.
A nucleic acid molecule according to the invention can be isolated by standard techniques of molecular biology and the sequence information provided according to the invention. For example, cDNA can be isolated from a suitable cDNA-bank, using one of the concretely disclosed complete sequences or a segment thereof as hybridization probe and standard hybridization techniques (as described for example in Sambrook, J., Fritsch, E.F. and Maniatis, T. Molecular Cloning: A Laboratory Manual. 2nd edition, Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989). Moreover, a nucleic acid molecule, comprising one of the disclosed sequences or a segment thereof, can be isolated by polymerase chain reaction, using the oligonucleotide primers that were constructed on the basis of this sequence. The nucleic acid thus amplified can be cloned into a suitable vector and can be characterized by DNA sequence analysis. The oligonucleotides according to the invention can moreover be produced by standard methods of synthesis, e.g. with an automatic DNA synthesizer.
5. ncAAs and post-translational POI modifications
The abbreviation "ncAA" refers generally to any non-canonical or non-natural amino acid, or amino acid residue, that is not among the 22 naturally occurring proteinogenic amino acids. Numerous ncAAs are well known in the art (see, e.g., Liu et al., Annu Rev Biochem 2010, 79:413-444; Lemke, ChemBioChem 2014, 15:1691-1694). The term“ncAA” also refers to amino acid derivatives, for example a-hydroxy acids (rather than a-amino acids). Such derivatives have been shown to be translationally incorporable as well. See, e.g., Ohta et al., 2008, ChemBioChem 9:2773-2778. Accordingly, the meaning of terms such as "aminoacylate" or "aminoacylation" used herein is not limited to the RS-catalyzed linkage of a tRNA and an a-amino acid but also includes the RS-catalyzed linkage of a tRNA and a ncAA derivative such as an a-hydroxy acid. Particular preferred ncAAs for use in the present invention are those which can be post- translationally further modified, for example using click chemistry reactions. Such click reactions include strain-promoted inverse-electron-demand Diels-Alder cycloadditions (SPIEDAC; see, e.g., Devaraj et ai, Angew Chem Int Ed Engl 2009, 48:7013)) as well as cycloadditions between strained cycloalkynyl groups, or strained cycloalkynyl analog groups having one or more of the ring atoms not bound by the triple bond substituted by amino groups), with azides, nitrile oxides, nitrones and diazocarbonyl reagents (see, e.g., Sanders et ai, J Am Chem Soc 2010, 133:949; Agard et ai, J Am Chem Soc 2004, 126:15046), for example strain promoted alkyne-azide cycloadditions (SPAAC). Such click reactions allow for ultrafast and biorthogonal covalent site-specific coupling of ncAA-labeling groups of target polypeptides with suitable groups of coupling partner molecule. Pairs of docking and labeling groups which can react via the above-mentioned click reactions are known in the art. Examples of suitable ncAAs for use in the present invention comprising docking groups include, but are not limited to, the ncAAs (“unnatural amino acids”,“UAAs”) described, e.g., in WO 2012/104422 and WO 2015/107064. Optionally substituted strained alkynyl groups include, but are not limited to, optionally substituted frans-cyclooctenyl groups, such as those described in. Optionally substituted strained alkenyl groups include, but are not limited to, optionally substituted cyclooctynyl groups, such as those described in WO 2012/104422 and WO 2015/107064. Optionally substituted tetrazinyl groups include, but are not limited to, those described in WO 2012/104422 and WO 2015/107064.
The ncAAs used in the context of the present invention can be used in the form of their salt. Salts of an ncAA as described herein means acid or base addition salts, especially addition salts with physiologically tolerated acids or bases. Physiologically tolerated acid addition salts can be formed by treatment of the base form of an ncAA with appropriate organic or inorganic acids. ncAAs containing an acidic proton may be converted into their non-toxic metal or amine addition salt forms by treatment with appropriate organic and inorganic bases. Salts of carboxyl groups of ncAAs can be produced in a manner known in the art and comprise inorganic salts, for example sodium, calcium, ammonium, iron and zinc salts, and salts with organic bases, for example amines, such as triethanolamine, arginine, lysine, piperidine, etc. ncAAs may also be used in the form of salts of acid addition, for example salts with mineral acids, such as hydrochloric acid or sulfuric acid and salts with organic acids, such as acetic acid and oxalic acid. The ncAAs and salts thereof which are useful in the present invention also comprise the hydrates and solvent addition forms thereof, e.g. hydrates, alcoholates and the like.
Physiologically tolerated acids or bases are in particular those which are tolerated by the translation system used for preparation of POI with ncAA residues, e.g. are substantially non toxic to living eukaryotic cells. ncAAs, and salts thereof, useful in the context of the present the invention can be prepared by analogy to methods which are well known in the art and are described, e.g., in the various publications cited herein.
The nature of the coupling partner molecule depends on the intended use. For example, the target polypeptide may be coupled to a molecule suitable for imaging methods or may be functionalized by coupling to a bioactive molecule. For instance, in addition to the docking group, a coupling partner molecule may comprise a group selected from, but not limited to, dyes (e.g. fluorescent, luminescent, or phosphorescent dyes, such as dansyl, coumarin, fluorescein, acridine, rhodamine, silicon-rhodamine, BODIPY, or cyanine dyes), molecules able to emit fluorescence upon contact with a reagent, chromophores (e.g., phytochrome, phycobilin, bilirubin, etc.), radiolabels (e.g. radioactive forms of hydrogen, fluorine, carbon, phosphorous, sulphur, or iodine, such as tritium, 18F,1 1C, 14C, 32P, 33P, 33S, 35S, 1 1 In, 125l, 123l, 131l, 21 2B, 90Y or 186Rh), MRI-sensitive spin labels, affinity tags (e.g. biotin, His-tag, Flag-tag, strep-tag, sugars, lipids, sterols, PEG-linkers, benzylguanines, benzylcytosines, or co factors), polyethylene glycol groups (e.g., a branched PEG, a linear PEG, PEGs of different molecular weights, etc.), photocrosslinkers (such as p-azidoiodoacetanilide), NMR probes, X- ray probes, pH probes, IR probes, resins, solid supports and bioactive compounds (e.g. synthetic drugs). Suitable bioactive compounds include, but are not limited to, cytotoxic compounds (e.g., cancer chemotherapeutic compounds), antiviral compounds, biological response modifiers (e.g., hormones, chemokines, cytokines, interleukins, etc.), microtubule affecting agents, hormone modulators, and steroidal compounds. Specific examples of useful coupling partner molecules include, but are not limited to, a member of a receptor/ligand pair; a member of an antibody/antigen pair; a member of a lectin/carbohydrate pair; a member of an enzyme/substrate pair; biotin/avidin; biotin/streptavidin and digoxin/antidigoxin. The ability of certain (labeling groups of) ncAA residues to be coupled covalently in situ to (the docking groups of) conjugation partner molecules, in particular by a click reaction as described herein, can be used for detecting a target polypeptide having such ncAA residue(s) within a eukaryotic cell or tissue expressing the target polypeptide, and for studying the distribution and fate of the target polypeptides. Specifically, the method of the present invention for preparing a POI by expression in (e.g., eukaryotic) cells can be combined with super-resolution microscopy (SRM) to detect the POI within the cell or a tissue of such cells. Several SRM methods are known in the art and can be adapted so as to utilize click chemistry for detecting a target polypeptide expressed by a eukaryotic cell of the present invention. Specific examples of such SRM methods include DNA-PAINT (DNA point accumulation for imaging in nanoscale topography; described, e.g., by Jungmann et al., Nat Methods 11 :313-318, 2014), dSTORM (direct stochastic optical reconstruction microscopy) and STED (stimulated emission depletion) microscopy.
6. Translational preparation of POIs in cells
The OT systems provided by the invention allow for the translational preparation of a POI in a cell.
The cell used for preparing a POI according to the invention can be a prokaryotic cell. Alternatively, the cell used for preparing a POI according to the invention can be a eukaryotic cell. The cell used for preparing a POI according to the invention can be a separate cell such as, e.g., a single-cell microorganism or a cell line derived from cells of multicellular organisms. Alternatively, the cell used for preparing a POI according to the invention can be present in (and part of) a tissue, an organ, a body part or an entire multicellular organism. Thus, the methods of the invention for preparing a POI can be performed with a separate cell or a cell culture, or with a tissue or tissue culture, organ, body part or (entire multicellular) organism.
Eukaryotic cells are often more difficult to handle and manipulate compared to prokaryotes such as, e.g., E.coii, and therefore not or only very difficult accessible to known approaches for POI-selective orthogonal translation such as those described in the "Background of the invention" section above. The OT system and the methods of the invention are therefore particular advantageous when use for POI expression in eukaryotic cells (including, e.g., single- and multicellular eukaryotic organisms, and eukaryotic cell lines).
In principle, all prokaryotic or eukaryotic cells can be used for preparing a POI according to a method of the present invention. Microorganisms such as, e.g., bacteria, fungi or yeasts can be used, as well as eukaryotic cells, such as, e.g., mammalian cells, insect cells, yeast cells and plant cells. Eukaryotic cells and in particular mammalian cells are particularly preferred.
The cell used for preparing a POI according to the invention carries a POI-encoding nucleotide sequence (CSP01) wherein the ncAA residue(s) of the POI are encoded by selector codon(s). Said CSP01 is functionally linked with one or more targeting sequences (TNs). Translation yields an mRNA comprising the CSP01 and the TN(s). The cell further comprises one or more fusion proteins of the present invention, wherein said fusion protein(s) comprise at least one O-RS segment and at least one RNA-TP segment. Said O-RS and RNA-TP can be on separate fusion proteins (e.g. AFPs) of the invention. Alternatively, said O-RS and RNA-TP can be on one and the same fusion protein (e.g. on an RNA-TP/O-RS fusion protein or an AFP) of the invention. Via (at least one of) its TN(s) said mRNA can interact with (bind to) at least one of the RNA-TP segments of the fusion proteins of the invention in the cell. The cell further comprises one or more orthogonal tRNAncAA molecules (0-tRNAncAA) which carry the anticodon(s) to the selector codon(s) of the CSP01. Said 0-tRNAncAA molecules and one or more of the O-RS segments of the fusion proteins in the cell form one or more orthogonal 0-RS/0-tRNAncAA pairs which allow for introducing the ncAA residue(s) into the amino acid sequence of the (translationally prepared) POI.
The interaction of the mRNA comprising CSP01 and TN(s) with the RNA-TP segment(s), the aminoacylation of the 0-tRNAncAA with the ncAAs by the O-RS segment(s), and the translational preparation of the POI including the introduction of the ncAA residue(s) thought to take place in the cytoplasm, more particularly in the OT assembly (OT organelle), of the cell in the presence of the ncAAs.
The mRNA comprising CSP01 and TN(s) (mRNAP01) can be generated from a recombinant construct (e.g. expression vector) introduced into the cell. Alternatively, one or more endogenous genes of the cell can be modified so as to comprise one or more selector codons and one or more TNs. Techniques for introducing recombinant constructs into a cell as well as methods for modifying endogenous genes of a cell are well known in the art. The tRNAncAA molecules and fusion proteins of the invention can be generated from a recombinant construct (e.g. expression vector) introduced into the cell.
Using expression vectors according to the invention, recombinant cells can be produced which can be used for preparing a POI using a method of the present invention. Advantageously, the recombinant vectors according to the invention, described above, are introduced into a suitable cell and expressed.
The cell used for preparing a POI as described herein can be prepared by introducing nucleotide sequences encoding the fusion protein(s), the tRNAncAA molecule(s) and the POI into the cell. Said nucleotide sequences can be located on separate nucleic acid molecules (vectors) or on the same nucleic acid molecule (e.g., vector), in any combination, and can be introduced into the cell in combination or sequentially.
Preferably common cloning and transfection techniques, known by a person skilled in the art, are used, for example co-precipitation, protoplast fusion, electroporation, virus-mediated gene delivery, lipofection, microinjection or others, for introducing the stated nucleic acid molecules in the respective cell. Suitable techniques are described for example in Current Protocols in Molecular Biology, F. Ausubel et al., Ed., Wiley Interscience, New York 1997, or Sambrook et al. Molecular Cloning: A Laboratory Manual. 2nd edition, Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989.
For the methods of the present invention, the cell used for POI expression is grown or cultured in a manner known by a person skilled in the art. Depending on the type of cell, a liquid medium can be used for culturing. Culture can be batchwise, semi-batchwise or continuous. Nutrients can be present at the beginning of the culturing or can be supplied later, semi-continuously or continuously.
The expressed POIs can be purified by known techniques, such as, e.g., molecular sieve chromatography (gel filtration), such as Q-sepharose chromatography, ion exchange chromatography and hydrophobic chromatography, and other common protein purification techniques such as ultrafiltration, crystallization, salting-out, dialysis and native gel electrophoresis. Suitable methods are described, for example, in Cooper, T. G., Biochemische Arbeitsmethoden [Biochemical processes], Verlag Walter de Gruyter, Berlin, New York or in Scopes, R., Protein Purification, Springer Verlag, New York, Heidelberg, Berlin.
For isolating a POI, it can be advantageous to link the POI with a tag that can serve for easier purification. This can be achieved by introducing a corresponding tag-encoding sequence into the CSpo'. Suitable tags for protein purification are well known in the art and include, e.g., histidine tags (e.g., HiS6 tag) and epitopes that can be recognized as antigens of antibodies (described for example in Harlow, E. and Lane, D., 1988, Antibodies: A Laboratory Manual. Cold Spring Harbor (N.Y.) Press). These tags can serve for attaching the proteins to a solid carrier, for example a polymer matrix, which can for example be used as packing in a chromatography column, or can be used on a microtiter plate or on some other carrier.
A tag linked to a POI can also serve for detecting the POI. Tags for protein detection are well known in the art and include, e.g., fluorescent dyes, enzyme markers, which form a detectable reaction product after reaction with a substrate, and others.
For preparing a POI according to a method of the present invention, the expression can be achieved by culturing the cell in the presence of one or more ncAAs corresponding to the ncAA residue(s) of the POI (wherein said ncAAs may expediently be comprised in the culture medium) for a time suitable to allow translation of the POI. Depending on the nucleic acid(s) encoding the POI (and optionally the fusion proteins of the invention and/or the tRNAncAA molecules), it may be required to induce expression by adding a compound inducing transcription, such as, e.g., arabinose, isopropyl /3-D-thiogalactoside (IPTG) or tetracycline that allows transcription.
After translation, the POI may optionally be recovered from the translation system. For this purpose, the POI can be recovered and purified, either partially or substantially to homogeneity, according to procedures known to and used by those of skill in the art. Unless the target polypeptide is secreted into the culture medium, recovery usually requires cell disruption. Methods of cell disruption are well known in the art and include physical disruption, e.g., by (ultrasound) sonication, liquid-sheer disruption (e.g., via French press), mechanical methods (such as those utilizing blenders or grinders) or freeze-thaw cycling, as well as chemical lysis using agents which disrupt lipid-lipid, protein-protein and/or protein- lipid interactions (such as detergents), and combinations of physical disruption techniques and chemical lysis. Standard procedures for purifying polypeptides from cell lysates or culture media are also well known in the art and include, e.g., ammonium sulfate or ethanol precipitation, acid or base extraction, column chromatography, affinity column chromatography, anion or cation exchange chromatography, phosphocellulose chromatography, hydrophobic interaction chromatography, hydroxylapatite chromatography, lectin chromatography, gel electrophoresis and the like. Protein refolding steps can be used, as desired, in making correctly folded mature proteins. High performance liquid chromatography (HPLC), affinity chromatography or other suitable methods can be employed in final purification steps where high purity is desired. Antibodies made against the polypeptides of the invention can be used as purification reagents, i.e. for affinity-based purification of the polypeptides. A variety of purification/protein folding methods are well known in the art, including, e.g., those set forth in Scopes, Protein Purification, Springer, Berlin (1993); and Deutscher, Methods in Enzymology Vol. 182: Guide to Protein Purification, Academic Press (1990); and the references cited therein.
As noted, those of skill in the art will recognize that, after synthesis, expression and/or purification, polypeptides can possess a conformation different from the desired conformations of the relevant polypeptides. For example, polypeptides produced by prokaryotic systems often are optimized by exposure to chaotropic agents to achieve proper folding. During purification from, e.g., cell lysates, the expressed polypeptide is optionally denatured and then renatured. This is accomplished, e.g., by solubilizing the proteins in a chaotropic agent such as guanidine HCI. In general, it is occasionally desirable to denature and reduce expressed polypeptides and then to cause the polypeptides to re-fold into the preferred conformation. For example, guanidine, urea, DTT, DTE, and/or a chaperonin can be added to a translation product of interest. Methods of reducing, denaturing and renaturing proteins are well known to those of skill in the art. Polypeptides can be refolded in a redox buffer containing, e.g., oxidized glutathione and L-arginine.
Also described are polypeptides produced by the methods of the invention. Such polypeptides can be prepared by a method of the invention that makes use of the OT system described herein. 7. Kits
The present invention also provides kits for preparing a POI having at least one non- canonical amino acid (ncAA) residue. The kit of the invention may comprise at least one expression vector for at least one fusion protein of the present invention. The fusion protein(s) encoded by the expression vector(s) in the kit may comprise at least one O-RS segment and at least one RNA-TP segment. The kit may further comprise at least one ncAA, or salt thereof, corresponding to the at least one ncAA residue of the POI. Expediently said O-RS segment is capable of aminoacylating a tRNA with the at least one ncAA. The kit may further comprise at least one expression vector for an orthogonal tRNAncAA (0-tRNAncAA) molecule. Further components of the kit may include at least one expression vector comprising a multiple cloning site and a targeting nucleotide sequence (TN), wherein an RNA molecule comprising said TN is able to interact via said TN with an RNA-targeting polypeptide (RNA-TP). Expediently said TN is a sequence, which, when present in an RNA molecule, is able to interact with an RNA-TP segment of at least one of the fusion protein(s) encoded by the expression vector(s) comprised by the kit. The kit may further comprise at least one reporter construct encoding an easily detectable (e.g. fluorescent) reporter polypeptide having at least one non-canonical amino acid (ncAA) residue such that the mRNA translated from said construct comprises a TN as described herein.
The kits of the present invention can be used in methods of the invention for preparing ncAA- residue containing POIs as described herein.
PARTICULAR EMBODIMENTS
The present invention further provides the following non-limiting embodiments E1 to E50.
E1 : An assembler fusion protein (AFP) comprising:
(a) at least one first polypeptide segment acting as assembler (AP) that is selected from:
(a1) a polypeptide segment derived from an intracellular targeting polypeptide (IC-TP segment), wherein said intracellular targeting polypeptide targets, and thus becomes locally enriched at, an intracellular structural element within or directly adjacent to the cytoplasm; and (a2) a polypeptide segment derived from a phase separation polypeptide (PSP segment), wherein said phase separation polypeptide has the ability to undergo self-association in the cytoplasm of a cell so as to create sites of high local concentration in the cytoplasm, and
(b) at least one second polypeptide segment acting as an effector (EP) that is selected from:
b1) an RNA-targeting polypeptide (RNA-TP) segment, and
b2) an orthogonal aminoacyl tRNA synthetase (O-RS) segment;
wherein said polypeptide segments are functionally linked in said AFP.
E2: The AFP of E1 comprising at least two APs, preferably at least one IC-TP segment and at least one PSP segment.
E3: The AFP of E1 or E2 having one of the following structures (from the N-terminus to the C-terminus):
[IC-TP]m - [EP]o
[EP]o - [IC-TP]m
[PSP]n - [EP]o
[EP]o - [PSP]n
[IC-TP]m - [EP]o - [PSP]n
[PSP]n - [EP]o - [IC-TP]m
[IC-TP]m - [PSP]n - [EP]o
[EP]o - [PSP]n - [IC-TP]m
[PSP]n - [IC-TP]m - [EP]o
[EP]o - [IC-TP]m - [PSP]n
wherein m, n and o, independently of each other, are integers selected from 1 , 2, 3, 4
Figure imgf000051_0001
designates a peptidic linkage.
E4: The AFP of any one of E1-E3, wherein the at least one EP is selected from RNA-TP segments.
E5: The AFP of any one of E1-E3, wherein the at least one EP is selected from O-RS
segments. E6: The AFP of any one of E1-E3 comprising at least one EP selected from RNA-TP segments and at least one EP selected from O-RS segments.
E7: The AFP of any one of E1-E6 comprising at least one IC-TP segment selected from dyneins and kinesins, and fragments and mutants of dyneins and kinesins, which retain the ability to target, and become enriched at, the plus or the minus end of microtubules.
E8: The AFP of any one of E1-E6 comprising at least one IC-TP segment selected from transmembrane domains of membrane proteins, and functional fragments and mutants of transmembrane domains which retain the ability to target, and become enriched at, the cytoplasmic side of membranes, in particular membranes selected from the cell membrane, nuclear membrane and mitochondrial membrane.
E9: The AFP of any one of E1-E8 comprising at least one IC-TP segment selected from:
KIFI6B1 -400 comprising the amino acid sequence of SEQ ID NO:20, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:20;
KIF13Ai-4i i ,A39o comprising the amino acid sequence of SEQ ID NO:22, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:22;
TOMM20I-7O comprising the amino acid sequence of SEQ ID NO:24, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:24;
LcK comprising the amino acid sequence of SEQ ID NO:26, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:26;
FRB-CD28 comprising the amino acid sequence of SEQ ID NO:28, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:28;
FUS-CD28 comprising the amino acid sequence of SEQ ID NO:30, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:30;
EB1 comprising the amino acid sequence of SEQ ID NO:302, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:303 CG1 comprising the amino acid sequence of SEQ ID NO:304, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:304 EBAG9 comprising the amino acid sequence of SEQ ID NO:292 (full length) or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to SEQ ID NO:292 ; or comprising the first 29 N-terminal amino acid residues of SEQ ID NO:294; or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to SEQ ID NO:294
CMP Sia Tr, comprising the amino acid sequence of SEQ ID NO:296, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:296; and
P450 2C1 targeting the cytoplasmic side of the ER membrane or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity thereto, in particular a fragment comprising the N- terminal first 27 (SEQ ID NO:298); or first 29 (SEQ ID NO:300) amino acid residues; or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to SEQ ID NO:298 or 300.
E10: The AFP of any one of E1-E9 comprising at least one PSP segment selected from intrinsically disordered proteins (I DPs), in particular prion-like domains, and functional fragments and mutants of I DPs, or prio-like domains, which retain the ability to undergo self-association in the cytoplasm of a cell so as to create sites of high local concentration in the cytoplasm.
E11 : The AFP of any one of E1-E10 comprising at least one PSP segment selected from:
SPD5 comprising the amino acid sequence of SEQ ID NO:32, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:32;
FUS comprising the amino acid sequence of SEQ ID NO:34, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:34; and EWSR1 comprising the amino acid sequence of SEQ ID NO:36, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:36.
E12: The AFP of any one of E1-E11 comprising at least one RNA-TP segment selected from RNA-binding segments of viral coat proteins, and functional fragments and mutants of RNA-binding segments of viral coat proteins which retain the ability to interact specifically with an RNA motif of the virus.
E13: The AFP of any one of E1-E12 comprising at least one RNA-TP segment selected from:
MCP comprising the amino acid sequence of SEQ ID NO: 14, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 14; AN22 comprising the amino acid sequence of SEQ ID NO: 16, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:16; and POP comprising the amino acid sequence of SEQ ID NO:306, or a functional fragment or mutant thereof having at least 60% at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:306.
E14: The AFP of any one of E1-E13 comprising at least one O-RS segment selected from:
Methanococcus jannaschii tyrosyl-tRNA synthetase;
Escherichia coli tyrosyl-tRNA synthetase;
Escherichia coli leucyl-tRNA synthetase;
Methanosarcina mazei pyrrolysyl-tRNA synthetase;
Methanosarcina barkeri pyrrolysyl-tRNA synthetase;
Methanosarcina acetivorans pyrrolysyl-tRNA synthetase;
Methanosarcina thermophila pyrrolysyl-tRNA synthetase;
Methanococcoides burtonii pyrrolysyl-tRNA synthetase;
Desulfitobacterium hafniense pyrrolysyl-tRNA synthetase; and
and functional fragments and mutants thereof which retain aminoacyl-tRNA synthetase enzymatic activity.
E15: The AFP of any one of E1-E14 comprising at least one O-RS segment selected from:
PylRSAF comprising the amino acid sequence of SEQ ID NO:8, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:8;
PylRS^ comprising the amino acid sequence of SEQ ID NO:10, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 10; PylRS^1 comprising the amino acid sequence of SEQ ID NO:12, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 12; IFRS1 comprising the amino acid sequence of SEQ ID NO:224, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:224; CbzRS comprising the amino acid sequence of SEQ ID NO:226; or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:226; CpkRS comprising the amino acid sequence of SEQ ID NO:228 or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:228; and
OMeRS, comprising the amino acid sequence of SEQ ID NO:236 or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:236.
E16: An assembler fusion protein (AFP) combination comprising at least two AFPs of any one of E1-E15.
E17: The AFP combination of E16 comprising at least one first AFP comprising at least one RNA-TP segment, and at least one second AFP comprising at least one O-RS segment.
E18: A fusion protein (RNA-TP/O-RS fusion protein) comprising:
(i) at least one RNA-targeting polypeptide (RNA-TP) segment; and
(ii) at least one orthogonal aminoacyl tRNA synthetase (O-RS) segment,
wherein said polypeptide segments are functionally linked in said RNA-TP/O-RS fusion protein. E19: The RNA-TP/O-RS fusion protein of E18 having one of the following structures (from the N-terminus to the C-terminus):
[RNA-TP]x- [0-RS]y
[0-RS]y- [RNA-TP]x
wherein x and y, independently of each other, are integers selected from 1 , 2, 3, 4 and 5; and
Figure imgf000057_0001
designates a peptidic linkage.
E20: The RNA-TP/O-RS fusion protein of E18 or E19 comprising at least one RNA-TP segment selected from RNA-binding segments of viral coat proteins, and functional fragments and mutants of RNA-binding segments of viral coat proteins which retain the ability to interact specifically with an RNA motif of the virus.
E21 : The RNA-TP/O-RS fusion protein of any one of E18-E20 comprising at least one RNA- TP segment selected from:
MCP comprising the amino acid sequence of SEQ ID NO: 14, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 14;
AN22 comprising the amino acid sequence of SEQ ID NO: 16, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 16; and
POP comprising the amino acid sequence of SEQ ID NO:306, or a functional fragment or mutant thereof having at least 60% at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:306.
E22: The RNA-TP/O-RS fusion protein of any one of E18-E21 comprising at least one O-RS segment selected from:
Methanococcus jannaschii tyrosyl-tRNA synthetase;
Escherichia coli tyrosyl-tRNA synthetase;
Escherichia coli leucyl-tRNA synthetase;
Methanosarcina mazei pyrrolysyl-tRNA synthetase; Methanosarcina barkeri pyrrolysyl-tRNA synthetase;
Methanosarcina acetivorans pyrrolysyl-tRNA synthetase;
Methanosarcina thermophila pyrrolysyl-tRNA synthetase;
Methanococcoides burtonii pyrrolysyl-tRNA synthetase;
Desulfitobacterium hafniense pyrrolysyl-tRNA synthetase; and
and functional fragments and mutants thereof which retain aminoacyl-tRNA synthetase enzymatic activity.
E23: The RNA-TP/O-RS fusion protein of any one of E18-E22 comprising at least one O-RS segment selected from:
PylRSAF comprising the amino acid sequence of SEQ ID NO:8, or a functional fragment or mutant thereof having at least 60% sequence identity to the amino acid sequence of SEQ ID NO:8;
PylRS^ comprising the amino acid sequence of SEQ ID NO:10, or a functional fragment or mutant thereof having at least 60% sequence identity to the amino acid sequence of SEQ ID NO: 10;
PylRS^1 comprising the amino acid sequence of SEQ ID NO:12, or a functional fragment or mutant thereof having at least 60% sequence identity to the amino acid sequence of SEQ ID NO: 12;
IFRS1 comprising the amino acid sequence of SEQ ID NO:224, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:224; CbzRS comprising the amino acid sequence of SEQ ID NO:226; or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:226; CpkRS comprising the amino acid sequence of SEQ ID NO:228 or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:228;and
OMeRS, comprising the amino acid sequence of SEQ ID NO:236 or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:236.
E24: A nucleic acid molecule, or a combination of two or more nucleic acid molecules, comprising:
(i) a nucleotide sequence that encodes at least one AFP of any one of E1-E15, or at least one AFP combination of E16 or E17, or
(ii) a nucleic acid sequence complementary to the nucleotide sequence of (i).
(iii) both of (i) and (ii).
E25: A nucleic acid molecule, or a combination of two or more nucleic acid molecules, comprising:
(i) a nucleotide sequence that encodes at least one RNA-TP/O-RS fusion protein of any one of E18-E23, or
(ii) a nucleic acid sequence complementary to (i), or
(iii) both of (i) and (ii).
E26: An expression cassette comprising the nucleotide sequence of the nucleic acid molecule, or the combination of nucleic acid molecules, of E24 or E25.
E27: An expression vector comprising at least one expression cassette of E26.
E28: A cell comprising at least one nucleic acid molecule, or combination of nucleic acid molecules, of E24 or E25, at least one expression cassette of E26, or at least one expression vector of E27.
E29: The cell of E28 which is a eukaryotic cell.
E30: The cell of E28 which is a mammalian cell.
E31 : The cell of any one of E28-E30 comprising at least one nucleic acid molecule, or combination of nucleic acid molecules, of E24, or at least one expression cassette comprising the nucleotide sequence of said nucleic acid molecule, or combination of nucleic acid molecules, or at least one expression vector comprising said expression cassette. E32: The cell of E31 comprising a nucleotide sequence that encodes, or is complementary to a nucleotide sequence encoding, at least one AFP of any one of E1-E3 and E7-E15 comprising at least one EP selected from RNA-TP segments and at least one EP selected from O-RS segments.
E33: The cell of E31 comprising a nucleotide sequence that encodes, or is complementary to a nucleotide sequence encoding, at least one AFP of any one of E1-E3 and E7-E15 comprising at least one EP selected from RNA-TP segments, and at least one AFP of any one of E1-E3 and E7-E15 comprising at least one EP selected O-RS segments.
E34: The cell of any one of E28-E30 comprising at least one nucleic acid molecule, or combination of nucleic acid molecules, of E25, or at least one expression cassette comprising the nucleotide sequence of said nucleic acid molecule, or combination of nucleic acid molecules, or at least one expression vector comprising said expression cassette.
E35: The cell of any one E28-E34, wherein the cell expresses the at least one AFP, the at least one AFP combination or the at least one RNA-TP/O-RS fusion protein, respectively, that is encoded by the nucleotide sequence of said nucleic acid molecule, or combination of nucleic acid molecules.
E36: A method for preparing a polypeptide of interest (POI) comprising in its amino acid sequence one or more non-canonical amino acid (ncAA) residues, wherein the method comprises expressing the POI in a cell of any one of E31-E33 in the presence of said one or more ncAAs, wherein the cell comprises:
(i) a POI-encoding nucleotide sequence (CSP01) wherein said one or more ncAA residues of the POI are encoded by selector codon(s),
(ii) a targeting nucleotide sequence (TN) that is functionally linked to the CSP01 and is able to interact with an RNA-TP segment of at least one of the AFPs in the cell;
(iii) one or more orthogonal tRNAncAA (0-tRNAncAA) molecules which carry the anticodon(s) complementary to the selector codon(s) of the CSP01, and wherein said 0-tRNAncAA molecules together with one or more O-RS segments of at least one of the AFPs in the cell form one or more orthogonal 0-RS/0-tRNAncAA pairs which allow for the introduction of said one or more ncAA residues into the amino acid sequence of the POI;
and wherein the method optionally further comprises recovering the expressed POI.
E37: A method for preparing a polypeptide of interest (POI) comprising in its amino acid sequence one or more non-canonical amino acid (ncAA) residues, wherein the method comprises expressing the POI in a cell of E35 in the presence of said one or more ncAAs, wherein the cell comprises:
(i) a POI-encoding nucleotide sequence (CSP01) wherein said one or more ncAA residues of the POI are encoded by selector codon(s),
(ii) a targeting nucleotide sequence (TN) that is functionally linked to the CSP01 and is able to interact with an RNA-TP segment of at least one of the RNA-TP/O-RS fusion proteins in the cell;
(iii) one or more orthogonal tRNAncAA (0-tRNAncAA) molecules which carry the anticodon(s) complementary to the selector codon(s) of the CSP01, and wherein said 0-tRNAncAA molecules together with one or more O-RS segments of the RNA- TP/O-RS fusion proteins in the cell form one or more orthogonal 0-RS/0-tRNAncAA pairs which allow for the introduction of said one or more ncAA residues into the amino acid sequence of the POI;
and wherein the method optionally further comprises recovering the expressed POI.
E38: A method for preparing a polypeptide of interest (POI) comprising in its amino acid sequence one or more non-canonical amino acid (ncAA) residues, said method comprising the steps of:
(a) expressing in a cell one or more AFPs of any one of E1-E3 and E7-E15 comprising at least one RNA-TP segment and one or more AFPs of any one of E1-E3 and E7- E15 comprising at least one O-RS segment;
(b) expressing in said cell one or more orthogonal tRNAncAA (0-tRNAncAA) molecules, wherein
- said orthogonal tRNAncAA molecules and one or more of the O-RS segments of the AFPs in the cell form one or more orthogonal aminoacyl tRNA synthetase/tRNAncAA (0-RS/0-tRNAncAA) pairs,
- said 0-RS/0-tRNAncAA pairs allow for introducing said one or more ncAA residues into the amino acid sequence of said POI,
wherein steps (a) and (b) can be concomitantly or sequentially in any order; (c) then, expressing said POI in said cell in the presence of said one or more ncAAs, wherein
- the POI-encoding nucleotide sequence (CSP01) comprises one or more selector codons encoding said one or more ncAA residues,
- said selector codons match the anticodons of said one or more 0-tRNAncAA molecules;
- said CSP01 is functionally linked to a targeting nucleotide sequence (TN), thus forming a CSpol/TN fusion sequence,
- said CSpol/TN fusion sequence is able to interact, via its TN, with an RNA-TP segment of at least one of the AFPs in the cell;
and
(d) optionally recovering the expressed POI.
E39: A method for preparing a polypeptide of interest (POI) comprising in its amino acid sequence one or more non-canonical amino acid (ncAA) residues, said method comprising the steps of:
(a) expressing in a cell RNA-TP/O-RS fusion proteins of any one of E18-E23;
(b) expressing in said cell one or more orthogonal tRNAncAA (0-tRNAncAA) molecules, wherein
- said orthogonal tRNAncAA molecules and one or more of the O-RS segments of the RNA-TP/O-RS fusion proteins in the cell form one or more orthogonal aminoacyl tRNA synthetase/tRNAncAA (0-RS/0-tRNAncAA) pairs,
- said 0-RS/0-tRNAncAA pairs allow for introducing said one or more ncAA residues into the amino acid sequence of said POI,
wherein steps (a) and (b) can be concomitantly or sequentially in any order;
(c) then, expressing said POI in said cell in the presence of said one or more ncAAs, wherein
- the POI-encoding nucleotide sequence (CSP01) comprises one or more selector codons encoding said one or more ncAA residues,
- said selector codons match the anticodons of said one or more 0-tRNAncAA molecules;
- said CSP01 is functionally linked to a targeting nucleotide sequence (TN), thus forming a CSpol/TN fusion sequence, - said CSpol/TN fusion sequence is able to interact, via its TN, with an RNA-TP segment of at least one of the RNA-TP/O-RS fusion proteins in the cell;
and
(d) optionally recovering the expressed POI.
E40: The method of any one of E36-E39, wherein the TN is selected from viral RNA motifs bound by a viral coat protein, and functional fragments and mutants thereof which retain the ability to be bound by a viral coat protein.
E41 : The method of any one of E36-E40, wherein the TN is selected from:
MS2 RNA stem-loop comprising the RNA sequence encoded by the nucleotide sequence of SEQ ID NO: 17, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:17;
BoxB comprising the RNA sequence encoded by the nucleotide sequence of SEQ ID NO: 18, or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 18, and
pp7 RNA stem-loop existing in at least two different versions and comprising the RNA sequence encoded by the nucleotide sequence of in particular a polynucleotide having an RNA sequence corresponding to (encoded by) the nucleotide (DNA) sequence of SEQ ID NO:289 or SEQ ID NO:290 or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:289 or 290.
E42: The method of any one of E36-E41 , wherein the selector codon(s) encoding the ncAA residue(s) of the POI are selected from Amber, Ochre and Opal stop codons.
E43: A nucleic acid molecule comprising: (i) a nucleotide sequence (CSP01) that encodes a polypeptide of interest (POI), said POI comprising one or more non-canonical amino acid (ncAA) residues which are encoded in the CSP01 by selector codons, and
(ii) a targeting nucleotide sequence (TN), wherein an RNA molecule comprising said TN is able to interact via said TN with an RNA-targeting polypeptide (RNA-TP).
E44: The nucleic acid molecule of E43, wherein the TN is selected from viral RNA motifs bound by a viral coat protein, and functional fragments and mutants thereof which retain the ability to be bound by a viral coat protein.
E45: The nucleic acid molecule of E43 or E44, wherein the TN is selected from:
MS2 RNA stem-loop comprising the RNA sequence encoded by the nucleotide sequence of SEQ ID NO: 17, or a functional fragment or mutant thereof having at least 60%,%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 17;
BoxB comprising the RNA sequence encoded by the nucleotide sequence of SEQ ID NO:18, or a functional fragment or mutant thereof having at least 60%, %, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO: 18; and
pp7 RNA stem-loop existing in at least two different versions and comprising the RNA sequence encoded by the nucleotide sequence of in particular a polynucleotide having an RNA sequence corresponding to (encoded by) the nucleotide (DNA) sequence of SEQ ID NO:289 or SEQ ID NO:290 or a functional fragment or mutant thereof having at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid sequence of SEQ ID NO:289 or 290.
E46: The nucleic acid molecule of any one of E43-E45, wherein the selector codon(s) encoding the ncAA residue(s) of the POI are selected from Amber, Ochre and Opal stop codons. E47: A kit for preparing a polypeptide of interest (POI) having at least one non-canonical amino acid (ncAA) residue, the kit comprising:
at least one ncAA, or salt thereof, corresponding to the at least one ncAA residue of the POI, and
at least one expression vector of E27.
E48: The kit of E47, wherein the expression vector encodes a fusion protein comprising at least one O-RS segment and at least one RNA-TP segment.
E49: The kit of E47 or E48 further comprising at least one expression vector for an orthogonal tRNAncAA (0-tRNAncAA) molecule.
E50: The kit of any one of E47-E49 further comprising at least one expression vector comprising a multiple cloning site and a targeting nucleotide sequence (TN), wherein an RNA molecule comprising said TN is able to interact via said TN with an RNA- targeting polypeptide (RNA-TP).
Anyone of the above embodiments also encompass the following modification: The above-mentioned AP (i.e. IC-TP and PSP) segments and /or EP (RNA-TP or O-RS) segments may be further combined with synthetic protein segments, which induce and control macromolecular interactions. One or more, like 2, 3, 4, 5, 6, 7, 8, 9, or 10, preferably one such protein segment may be operably fused into a single AFP of the invention. Of particular interest in the context of the invention are SYNZIPs having the ability to form heterodimeric coiled-coil protein structures. Such SYNZIPs are pairs of synthetic peptides capable of interacting with each other and are used to induce and control macromolecular interactions. Non-limiting examples are the pairs SYNZIP 1 :2; SYNZIP 3:4 and SYNZIP 5:6. Particularly preferred according to the invention is the heterospecific coiled-coil pair SYNZIP2:SYNZIP1 as described by Reinke, A.W., Grant, R.A., Keating, A.E. (2010) J Am Chem Soc 132 6025-6031 (SYNZIP 1 : SEQ ID NO:312, SYNZIP 2: SEQ ID NO:314, SYNZIP 3: SEQ ID NO:316; SYNZIP 4: SEQ ID NO:318, as well as functional fragments and mutants of these SYNZIP polypeptides. Said functional fragments and mutants may comprise at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98% or at least 99% amino acid sequence identity to the amino acid of the polypeptide they are derived from.
The invention is further illustrated by the following non-limiting examples.
EXAMPLES
Methods
(A) Cell culture, transfections and feeding with ncAAs
HEK293T cells (ATCC CRL-3216) and COS-7 cells (ATCC, CRL-1651) were maintained in Dulbecco's modified Eagle's medium (Life Technologies, 41965-039) supplemented with 1 % penicillin-streptomycin (Sigma, 10,000 U/ml penicillin, 10 mg/ml streptomycin, 0.9% NaCI), 2 mM L-glutamine (Sigma), 1 mM sodium pyruvate (Life Technologies) and 10% FBS (Sigma). Cells were cultured at 37°C in a 5% CO2 atmosphere and passaged every 2-3 days up to 15-20 passages.
In all cases, cells were seeded 15-20h prior to transfection at a density resulting in 70-80% confluency at the time of transfection. Flow cytometry was performed using 24-well plates with plastic bottom (Nunclon Delta Surface ThermoScientific). Immunofluorescence labeling and FISH were performed on 24-well plates with glass bottom (Greiner Bio-One) or four-well Lab-Tek #1.0 borosilicate coverglass (ThermoFisher).
Transfections of HEK293T cells were performed with polyethylenimine (PEI, Sigma-Aldrich) using 3 pg PEI per 1 pg DNA. COS-7 cells were transfected using the JetPrime reagent (PeqLab) according to the manufacturer’s recommendations at a ratio of 1 :2.
For Amber suppression system test, cells were transfected at a ratio of a 1 : 1 : 1 : 1 with POITAG vectors, tRNAPyl, synthetase and MCP or mock constructs. 4-6 hours after transfection the medium to a fresh containing ncAA.
Stock and working solutions for all of the used ncAAs were prepared as described in Nikic et al. (Nat Protoc 10(5):780-791 , 2015). SCO (cyclooctyne lysine, SiChem SC-8000) was used at a final concentration of 250 mM. 3-lodophenylalanine (Chem-lmpex International Inc.) was used at a final concentration of 1 mM. SCO is efficiently recognized by PylRSAF (Y306A, Y384F) (see Plass et ai, Angew Chem 2011 , 50:3878-3881). 3-lodophenylalanine is recognized by PylRS^ (C346A, N348A) (see Wang et ai, ACS Chem Biol 2013, 8:405-415).
(B) Flow cytometry
HEK293T cells were harvested after one day after transfection, resuspended in 1x PBS and passed through 100 pm nylon mesh. Co-transfections for flow cytometry were performed at a 1 : 1 : 1 : 1 ratio with 1.2 pg total DNA with:
- a reporter plasmid encoding the POI (a stop codon encoding the amino acid position to be occupied by the ncAA),
- a plasmid encoding the tRNAPyl having the anticodon which matches (i.e., is the reverse complement) of the stop codon in the POI-encoding sequence (hereinafter simply referred to as tRNAPyl),
- a plasmid encoding the PylRS or functional mutant thereof, respectively, and
- either a plasmid encoding an MCP fusion polypeptide or a mock plasmid.
Cell culture medium was exchanged for fresh medium containing the ncAA to be incorporated into the POI 4-6 h post-transfection and left until the time of harvesting.
Data acquisition and analysis were performed using an LSRFortessa SORP Cell Analyzer (Becton, Dickinson and Company) and the FlowJo software (FlowJo). Cells were first gated by cell type using forward scatter area (FSC-A) and side scatter area (SSC-A) parameters. Subsequently, single cells were identified based on SSC-A and side scatter width (SSC-W). Each shown FFC plot is the sum of three independent biological replicates from which the mean and the SEM were calculated. At least 130,000 single cells were analyzed per condition. GFP fluorescence was acquired in the 488-530/30 channel and mCherry fluorescence in the 561-610/20 channel.
(C) PylRS immunostaining and imaging, fluorescence in situ hybridization (FISH)
For immune-labelling experiments, the cells were rinsed with 1x PBS, fixed in 2% paraformaldehyde in 1x PBS for 10 min at RT, rinsed with 1x PBS again and then permeabilized in 0.5% Triton X in 1x PBS for 15 min at RT. After rinsing the permeabilized cell samples twice with 1x PBS, said samples were incubated for 90 min in blocking solution (3% BSA in 1x PBS for 90 min at RT), and then with 1 pg/ml primary antibody (polyclonal rat anti-PyIRS, prepared as described in Nikic et al. (Angew Chem Int Ed Engl 2016, 55(52):16172-16176) and/or polyclonal rabbit anti-MCP (Merck, ABE76) and/or monoclonal rabbit anti-RPL26L1 antibody (EPR8478, Abeam, ab137046)) in blocking solution overnight at 4°C. The next day, the cell samples were rinsed with 1x PBS and incubated with 2 pg/ml secondary antibody (chicken anti-rat lgG(H+L) cross-adsorbed Alexa Fluor 594 conjugated antibody (Thermo Fisher Scientific, A-21471) and/or goat anti-rabbit lgG(H+L) cross- adsorbed Alexa Fluor 647 conjugated F(ab')2 (Thermo Fisher Scientific, A-21246)) in blocking solution for 60 min at RT. DNA was stained with Hoechst 33342 (1 pg/ml in 1xPBS) for 10 min at RT. If only DNA was stained, the cells were fixed and permeabilized as described above and then stained with Hoechst 33342 (1 pg/ml in 1xPBS) for 10 min at RT. Finally, the cells were rinsed twice with 1x PBS.
FISH experiments were performed one day after transfection analogously to the FISH experiments described in Nikic et al. (Angew Chem Int Ed Engl 2016, 55(52): 16172-16176). The hybridization protocol was adapted for 24-well plates from Pierce et al. (Methods Cell Biol 122:415-436, 2014).
For imaging of only tRNAPyl, the hybridization probe 5'-CTAACCCGGCTGAACGGATTTAGAGTCCATTCGATC-3' (labelled at the 5' terminus with Cy5; SEQ ID NO:1) was used at 0.25 pM. After four washes with SSC and one wash with TN buffer (0.1 M TrisHCI, 150 mM NaCI), cells were incubated for 1 h at RT with 3% BSA in TN buffer prior to standard immunofluorescence labeling as described above.
For imaging of both tRNAPyl and MS2 RNA stem-loop sequence, the hybridization probe for tRNAPyl (5’-CT AACCCGGCT G AACGG ATTT AG AGTCCATTCGAT C-3’ , labelled at the 5' terminus with digoxigenin; SEQ ID NO:2) was used at 0.16 pM, and the hybridization probe for the MS2 RNA stem-loop sequence (5’-CTGCAGACATGGGTGATCCTCATGTTTTCTA-3', labelled at the 5' terminus with Alexa Fluor 647; SEQ ID NO:3) was used at 0.75 pM. After four washes with SSC, the cells were incubated for 1h at RT in blocking buffer (0.1 M TrisHCI, 150 mM NaCI, 1x blocking reagent (Sigma 11096176001). Then, the cells were incubated with fluorescein conjugated sheep anti-digoxigenin Fab (Sigma 11207741910) at a 1 :200 dilution in blocking buffer overnight at 4°C. The next day, 3 washes of 5 minutes were done in Tween buffer (0.1 M TrisHCI, 150 mM NaCI, 0.5% Tween20). DNA was stained with Hoechst 33342 (1 pg/ml in 1xPBS) for 10 min at RT.
Confocal images were acquired on a Leica SP8 STED 3X microscope equipped with a 63x/1.40 oil immersion objective using the following laser lines for excitation: 405 nm for Hoechst 33342, 488 nm for fluorescein and GFP, 548 nm for mOrange, 594 nm for Alexa Fluor 594, 647 nm for Alexa Fluor 647 and Cy5. Emission light was collected with HyD detectors at 420-500 nm and 605-680 nm respectively.
Ribosomal immunofluorescence images were taken on an Olympus Fluoroview FV3000 microscope equipped with a 60x/1.40 oil immersion objective using the following laser lines for excitation: 488 nm for GFP, 594 nm for Alexa Fluor 594, 640 nm for Alexa Fluor 647.
Images were processed using FIJI software.
(D) Constructs, cloning and mutagenesis
Two different fluorescent protein reporters (dual color reporter) were cloned into a pBI-CMV1 vector (Clontech 631630), one protein in one multiple cloning site and the other reporter in the other multiple cloning site. The CDS for one of the reporters encoded an mRNA carrying two MS2 RNA stem-loops fused to the 3' untranslated region ("MS2-tag"), while the encoded mRNA of the other reporter was not MS2-tagged.
For examination of Amber suppression, The reporters GFP39TAG and mCherry185TAG were used as N-terminal fusion with NLS. For examination of Ochre and Opal suppression, analogous constructs were prepared (with GFP39TAA and mCherry185TAA, GFP39TGA and mCherry185TGA, respectively).
NLS::GFP39TAG::MS2-tag reporter: NLS::GFP39TAG was cloned with two copies of MS2 RNA stem-loops into the pBI-CMV1 vector as a reporter for successful Amber suppression in imaging experiments.
For examination of suppression of multiple Amber codons, pBI-CMV constructs for 0Pp39,i49TAG anc| Qpp39,i49,i82TAG were prepared which did not contain a second (e.g. mCherry) reporter in the second multiple cloning site.
Further non-limiting examples of GFPs which are applicable in the context of the invention are:
G F P66TAG GFp with Amber Sjte (SEQ ID NO:238)
G F P66TCG G Fp with Serine site (SEQ | D NO:240)
G FP66CCG G FP with Pr0|ine site (SEQ ID NO:242)
G F P66CTA GFp with Leucine site (SEQ ID NO:244)
G F P66TTA GFP with Leucine site (SEQ ID NO:246)
G F P66ATA GFp with isoleucine site (SEQ ID NO:248)
G F P66CGG GFp with Argjnjne site (SEQ ID NO:250)
G F P39TCG GFp with Serjne site (SEQ ID NO:252)
G F P39CCG GFp with Pr0|ine site (SEQ ID NO:254)
G F P39CTA GFp with Leucine site (SEQ ID NO:256)
G F P39CGG GFp with Asinine site (SEQ ID NO:258)
G F P39TCG LCK-GFP with Serine site (SEQ ID NO:278)
G F P39CCG LCK-GFP with Proline site (SEQ ID NO:280)
G F P39CTA LCK-GFP with Leucine site (SEQ ID NO:282)
Extended GFP39TCG GFP with Serine site at position 39 genetically fused to GFP66CCG (SEQ I D NO:284)
Extended GFP39CCG GFP with Proline site at position 39 genetically fused toGFP66TCG (SEQ ID NO:286)
Extended GFP39CTA GFP with Leucine site at position 39 genetically fused toGFP66TCG (SEQ ID NO:288)
Further non-limiting examples of mCherrys which are applicable in the context of the invention are:
mCherry72TAG mCherry with Amber site (SEQ ID NO:260)
mCherry72TCG mCherry with Serine site (SEQ ID NO:262)
mCherry72CCG mCherry with Proline site (SEQ ID NO:264)
mCherry72CTA mCherry with Leucine site (SEQ ID NO:266)
mCherry72TTA mCherry with Leucine site (SEQ ID NO:268)
mCherry72ATA mCherry with Isoleucine site (SEQ ID NO:270)
mCherry185TCG mCherry with Serine site (SEQ ID NO:272) mCherry 185CCG mCherry with Proline site (SEQ ID NO:274)
mCherry 185CTA mCherry with Leucine site (SEQ ID NO:276)
Further non-limiting examples of mCherry constructs comprising different TN loops which are applicable in the context of the invention are:
mCherry1 90TAG-2xPP7 mCherry with amber site and 2x pp7 loops (SEQ ID NO:216) mCherry1 90TAG-4xPP7 mCherry with amber site and 4x pp7 loops (SEQ ID NO:218) mCherry1 90TAG-6xPP7 mCherry with amber site and 6x pp7 loops (SEQ ID NO:220)
H2B-mCherry1 90TAG-2xMS2 Human Histone H2B type 1-J ( Uniprot : P06899) fused to mCherry with amber site and 2x ms2-loops (SEQ ID NO:222)
They may be fused into the polypeptide chain of any of the AFP molecules described herein, in particular at a position within the fusion molecule which does not inhibit the function of anyone of the other polypeptide segments (APs and EPs) of the AFP molecule. Examples of such epitope-tag containing AFP molecules are given below.
Constructs for OT assemblies were prepared as follows: tRNAPyl was cloned under the control of a human U6 promoter, and all other constructs were under CMV promoters cloned in the pcDNA3.1 (Invitrogen V86020) vector. MCP protein was cloned from the addgene plasmid #31230 and FUS from the Addgene plasmid #26374. In all FUS fusions, amino acids 1-478(S108N) were used, replacing the C-terminal NLS region by a Flag-tag. In all RS fusions the previously reported efficient NES::PylRSAF (Y306A, Y384F) sequence was used (see, e.g., Nikic et a/., Angew Chem Int Ed Engl 2016, 55(52): 16172-16176). The PylRS mutant PylRS^ (N346A, C348A) was cloned via site-directed mutagenesis starting from wildtype PylRS. The SPD5 gene was ordered from Genewiz and fused to MCP and PylRSAF via restriction cloning. KIF13AI-4H and KIFI6B1 -400 were cloned from human cDNA and inserted into pcDNA3.1 via restriction cloning. P390 of KIF13AI-4H was removed via side directed mutagenesis. KIF13Ai-4n ,AP390 and KIF16B. 400 fusions with MCP, PylRSAF, EWSR1 ::MCP, FUS::PylRSAF, FUS^PylRS^, SPD5::MCP and SPD5::PylRSAF were assembled via Gibson assembly (see Gibson et al., Nat Methods 2009, 6:343-345).
Constructs for differential imaging experiments: To selectively express Nup153-EGFP149TAG and Vim1 1 6TAG-mOrange, one gene was first inserted together with an MS2-tag into pBI- CMV1 (compare Nikic et ai, Angew Chem Int Ed Engl 2016, 55(52): 16172-16176). Subsequently, the other gene was inserted without MS2-tag. INSR676TAG::mOrange was fused to an MS2-tag by replacing Vim116TAG-mOrange in the pBI vector bearing Nup153::EGFP149TAG and Vim116TAG::mOrange::MS2-tag to yield a bicistronic vector with INSR676TAG::mOrange jn one anc| |\iup153::EGFP149TAG in the other cassette.
Multicistronic Amber suppression vectors for COS-7 cell experiments: As COS-7 cells have lower transfection efficiency; we generated multicistronic vectors harboring the components of an OT assembly. To assemble multicistronic Amber suppression vectors, first one copy of tRNAPyl under the control of a human U6 promoter was inserted into the pBI-CMV1 vector via Gibson assembly. Subsequently, first the AFP CDS KIF16B::FUS::PylRSAF and finally the AFP CDS KIF16B::EWSR1 ::MCP were inserted via Gibson assembly. Alternatively, a previously published pcDNA3.1 based construct (see Nikic et at., Angew Chem Int Ed Engl 2016, 55(52): 16172-16176) expressing NES::PylRSAF under a CMV promoter and tRNAPyl under a human U6 promoter was used. Constructs with U6-tRNAPyl, KIF16B::FUS::PylRSAF and KIF16B::EWSR1 ::MCP, or NES::PylRSAF were alternatively inserted into a pDonor vector (GeneCopoeia).
The respective sequence information on AFPs used in the following experiments can be taken from the listing of sequences given below.
EXAMPLE 1 - RNA-TP/O-RS fusion and AFPs comprising a single AP
An OT assembly (ΌT organelle”, Fig. 1) was engineered having the following components: i) An mRNA-targeting system in which two MS2 RNA stem-loops (MS2-tag) were fused to the mRNA of choice coding for the POI, creating an mRNA::ms2 fusion. The MS2-tag binds specifically to the MS2 bacteriophage coat protein (MCP) (see Bertrand et a!., Mol Cell 1998, 2:437-445), which will thus form a stable and specific mRNA::ms2-MCP complex in cells. The MS2-tag was always fused to the 3’ untranslated region (3’ UTR) of the mRNA, which ensures translation to yield a scar-less final POI. ii) A tRNA/RS suppressor pair. The orthogonal tRNA/RS pair from the Methanosarcina mazei pyrrolysyl system (tRNAPyl/PylRS) was chosen because it has enabled the encoding of more than 200 ncAAs with diverse functionalities into proteins using GCE in a multitude of cell types and species, including E. coli , mammalian cells and even living mice (see, e.g., Liu et al., Annu Rev Biochem 2010, 79:413-444; Lemke, ChemBioChem 2014, 15:1691-1694; Chin, Nature 2017, 550;53-60). iii) The assembler (AP) was the key component required to form an OT assembly. The purpose of the assembler was to create membrane-less structures in the form of a dense phase, aggregate, droplet or condensate, in which the mRNA::ms2-MCP complex is brought into close proximity of the tRNAPyl/PylRS pair.
The simplest strategy tested was the bimolecular fusion of MCP::PylRS (termed B, Fig. 2). In addition, strategies were tested which were expected to yield much larger assemblies. All of those assembly systems were composed of an assembler fusion to PylRS co-expressed with an assembler fusion to MCP. Assembler::PylRS*assembler::MCP were expected to form large aggregates (co-expression herein denoted with
Figure imgf000073_0001
One tested assembly strategy was based on phase separation of proteins and one based on the assembly of kinesins, which are abbreviated herein as P and K, respectively (Fig. 2A). Furthermore, for each P and K approach two different molecular designs (P1 , P2 and K1 , K2, respectively) were tested which are summarized as follows:
P1. Previous studies have established the capacity of the proteins fused-in sarcoma (FUS) and Ewing sarcoma breakpoint regions 1 (EWSR1) to form mixed droplet-like structures by phase separation. They both contain a prion-like disordered domain that facilitates phase separation into liquid, gel and solid states (see, e.g., Altmeyer et al., Nat Commun 2015, 6:8088; Patel et al., Cell 2015, 162:1066-1077). In a phase-separated state, these proteins are locally highly concentrated (several orders of magnitude) compared to the remaining soluble fraction in the cytoplasm. FUS was fused to PylRS and EWSR1 was fused to MCP. It was expected that this would lead to the formation of droplets in which MCP and PylRS are highly enriched. P1 is denoted FUS::PylRS*EWSR1 ::MCP.
P2. The Caenorhabditis elegans protein spindle-defective protein 5 (SPD5) has been shown to phase separate into particularly large (several micron-sized) droplets (see Woodruff et al., Cell 2017, 169:1066-1077, e1010). In a phase-separated state, SPD5 is locally highly concentrated compared to the remaining soluble fraction in the cytoplasm (by several orders of magnitude). It was expected that a protein fused to SPD5 would condense into droplets. Similarly to FUS-EWSR1 droplets, PylRS fused to SPD5 and MCP fused to SPD5 were expected to be highly enriched. P2 is denoted SPD5::PylRS*SPD5::MCP.
K1. Certain kinesin truncations constitutively move towards microtubule-plus ends in living cells (Soppina et ai, Proc Natl Acad Sci U.S.A. 2014, 11 1 :5562-5567). One such truncated kinesin is KIF13AI_4H ,AP39O, and it was expected that PylRS and MCP, respectively, fused to this kinesin truncation and co-expressed would be locally enriched, due to spatial targeting to microtubule-plus ends. K1 is denoted KIF13Ai-4n,AP39o::PylRS*KIF13Ai-4n,AP39o: :MCP.
K2. By analogy to K1 , the truncated kinesin KIFI6B1-400 was also tested. K2 is denoted KIFieB^oo^PylRS-KIFieB^oo MCP.
In order to evaluate these assemblers for facilitating functional orthogonal translation of an MS2-tagged mRNA, a dual-reporter construct was designed, in which GFP and mCherry mutants are simultaneously expressed from two different expression cassettes from one plasmid, ensuring that the mRNA ratio between them is constant across all experiments. Stop codons were introduced at permissive sites into GFP at position 39 (GFP39STOP) and into mCherry at position 185 (mCherry185STOP; Fig. 2B). Only if stop codon suppression is successful will the corresponding green or red fluorescent protein be produced. Transfected cells (tRNAPyl and ncAA were always present unless specifically noted otherwise) were analyzed by fluorescence flow cytometry (FFC); settings were adjusted so that an approximate diagonal results in the FFC plots if GFP and mCherry are expressed from this plasmid using the conventional cytoplasmic PylRS system, which cannot differentiate mRNAs. A selective and functional OT organelle should selectively express mCherry only if the MS2-tag is fused to the 3’ UTR of the mCherry mRNA, leading to the appearance of a vertical line in the cytometry plot (Fig. 2B). Unless otherwise reported, all experiments where performed in the presence of tRNAPyl and the ncAA SCO, a widely used and well characterized lysine derivative, the side chain of which carries a cyclooctyne that can be used in a variety of click-chemistry reactions to install diverse chemical groups onto the protein. As previously reported, this ncAA is efficiently encoded by a Y306A, Y384F double mutant of PylRS (for simplicity this mutant is designated PylRS herein, unless otherwise specified) (see Nikic et ai, Angew Chem 2014, 53:2245-2249; Plass, Angew Chem 2012, 51 :4166-4170; Plass et ai, Angew Chem 201 1 , 50:3878-3881). Omission of the ncAA served as a standard negative control and lead to no expression of GFP or mCherry. The performance of each OT system was evaluated according to its selectivity and relative efficiency. Selectivity is defined as the ratio r of the mean mCherry FFC signal divided by the mean GFP signal. Final values are expressed as fold selectivity relative to that of cytoplasmic PylRS. Relative efficiency is defined as the mean mCherry signal of each system divided by the mean mCherry signal of the cytoplasmic PylRS system, which serves as the reference (here defined as 100%). All results on selectivity (dark-gray positive bars) and efficiency (light-gray negative bars) are summarized in the bar plot in Figure 2C. Selected FFC data is also shown in Figure 2D.
The simplest strategy B (MCP fused to PylRS) showed an about 1.5-fold selectivity gain (Fig. 2C). The OT system P1 (based on phase separation of FUS/EWSR1) had a somewhat lower selectivity gain (Fig. 2C, D). The P2 system (based on SPD5) showed an approximate twofold selectivity gain (Fig. 2C). For K1 a twofold increase in selectivity was observed (Fig. 2C). The K2 system behaved similarly (Fig. 2C,D). In total, the selectivity gains were relatively small, but robustly detected and distinguishable from a simple efficiency drop. The observed selectivity effect (data not shown) was robust across a titration of Amber suppression efficiencies (specifically, 0.48 ng, 2.4 ng, 12 ng, 60 ng or 300 ng tRNAPyl construct, respectively, were used), indicating that bringing the ncAA aminoacylation activity (i.e. the tRNAPyl/PyIRS in the presence of ncAA) in direct proximity of the target mRNA represents a pathway to more selective codon suppression.
EXAMPLE 2 - AFPs comprising a combination of two APs
AFPs comprising combinations of the APs described in example 1 were tested in an analogous manner, those were:
K1 : :P1 = Kl F13Ai-*i 1 ,DR390: : FUS: : PylRS-KI F13A!_41 i ,DR390: : EWSR 1 : : MCP,
K2::P1 = KIF16Bi-«x>::FUS::PylRS*KIF16Bi-«x>::EWSR1 ::MCP,
K1 ::P2 = KIF13Ai-4n,Ap39o::SPD5::PylRS*KIF13Ai-4i i ,Ap39o::SPD5::MCP,
K2::P2 = KIF16Bi-«x>::SPD5::PylRS*KIF16Bi-«x>::SPD5::MCP.
For all combinations an at least fivefold selectivity gain was observed indicating orthogonal translation. The best performing of these systems was based on the fusion of FUS/EWSR1 with KI F16BI-4OO, K2::P1 and exhibited a selectivity of eightfold (box in Fig. 2C). This was also directly obvious from the FFC data, in which the bright, mCherry-positive cell population was clearly retained, whereas GFP expression was minimal (arrow in Fig. 2D).
EXAMPLE 3 - AFPs comprising a combination of APs including a membrane-targeting AP
AFPs comprising combinations of APs derived from phase separation polypeptides (PSPs), FUS and EWSR1 (also termed EWS herein), optionally fused to SYNZIP segments, and different APs which acts as a membrane-targeting signal, LcK, EB1 , CG1 , EBAG9 M length, EBAG9I -29, CMP Sia Tr, P450 2C1 i-27 and P450 2CI 1-29 were tested in a manner analogous to example 2.
LcK is a cell membrane-targeting signal (Resh, Bba-Mol Cell Res 1999, 1451 :1-16) that adds an amphipathic helix post translationally to the POI. For these experiments, the AFPs LcK::FUS::PylRS and LcK::EWSR1 ::MCP were co-expressed in HE293T cells (see Fig. 3 and 6C). Testing of this system with the same dual reporter resulted in a dramatic shift in the signal and a strong selectivity for the expression solely of the MS2-tagged mCherry compared to the control PylRS. See Fig. 4 and Fig. 5 showing a 26-fold selectivity gain as compared to the control. IF and FISH for MCP, PylRS and tRNA show a clear membrane signal with appearance of occasional droplet-like structures and a perfect co-localization of all the components.
Without wishing to be bound by theory, it is assumed that targeting the OT system to a membrane results in a confinement of the components to a 2D surface (i.e. a film), offering an even higher spatial segregation than a cytoplasmic droplet. In accordance with such a cumulative effect of the two combined assembler strategies (LcK for membrane targeting, and FUS/EWSR1 for droplet generation), it was shown that the presence of the FUS/EWSR1 “assemblers” is not a requirement in an LcK-fused (and thus membrane-anchored system) for obtaining selective Amber suppression (data not shown). Nevertheless, the combination of the LcK-targeting with FUS/EWSR1 resulted in a higher selectivity of the system. Further, it was found that swapping the MS2-tag on the fluorescent reporters, yielded a swapped selectivity in the FFC data, underlining the selective (orthogonal) translation of the MS2- tagged mRNA.
For a further LcK based experiment, the AFP constructs LcK::FUS::SYNZIP1 ::PylRS and EWSR1 ::SYNZIP2::MCP; were co-expressed in HE293T cells (see Fig. 8A). Testing of this system with the same dual reporter resulted in a dramatic shift in the signal and a strong selectivity for the expression solely of the MS2-tagged mCherry. Upon expression SYNZIP1 and 2 pair and recruit MCP to a plasma membrane based OT organelle. In a comparative approach co-expressing the AFP constructs LcK::FUS::PylRS and EWSR1 ::SYNZIP2::MCP, wherein SYNZIP1 is missing, no selectivity of translation could be observed (see Fig. 8B)
EB1 is a microtubule plus ends-targeting signal ((Nehlig A, Molina A, Rodrigues-Ferreira S, Honore S, Nahmias C. Regulation of end-binding protein EB1 in the control of microtubule dynamics. Cell Mol Life Sd. 2017;74(13)2381-2393. doi:10.1007/s00018-017-2476-2). For these experiments, the AFP construct EB1 ::PylRS with EB1 ::MCP, EB1 :FUS::PylRS with EB1 ::EWSR1 ::MCP or EB1 ::FUS::MCP::PylRS were expressed in HE293T cells. Testing of this system with the same dual reporter resulted in a shift in the signal and a strong selectivity for the expression solely of the MS2-tagged mCherry compared to the control PylRS. See Fig.6B.
CG1 is a nuclear membrane-targeting signal (Kim SJ, Fernandez-Martinez J, Nudelman I, et al. Integrative structure and functional anatomy of a nuclear pore complex. Nature. 2018;555(7697):475-482. doi:10.1038/nature26003) For these experiments, the AFP constructs CG1 ::FUS::PylRS and CG1 ::EWSR1 ::MCP were co-expressed in HE293T cells. Testing of this system with the same dual reporter resulted in a shift in the signal and a strong selectivity for the expression solely of the MS2-tagged mCherry compared to the control PylRS. See Fig. 6E.
EBAG9 fun length and E BAG 91 -29 are Golgi membrane-targeting signals (Engelsberg A, Hermosilla R, Karsten U, Schiilein R, Dorken B, Rehm A. The Golgi protein RCAS1 controls cell surface expression of tumor-associated O-linked glycan antigens. J Biol Chem. 2003278(25)22998-23007. doi:10.1074/jbc.M301361200). For these experiments, the AFPconstructs EBAG9i-29::FUS::PylRS and EBAG9I -29::EWSR1 ::MCP were co-expressed in HE293T cells. Testing of this system with the same dual reporter resulted in a shift in the signal and a strong selectivity for the expression solely of the MS2-tagged mCherry compared to the control PylRS. See Fig. 6F (left side). CMP Sia Tr is a Golgi membrane-targeting signal (Eckhardt M, Gotza B, Gerardy-Schahn R. Membrane topology of the mammalian CMP-sialic acid transporter. J Biot Chem. 1999;274(13):8779-8787. doi:10.1074/jbc.274.13.8779). For these experiments, the AFPconstructs CMP Sia Tr::FUS::PylRS and CMP Sia Tr::MCP were co-expressed in HE293T cells. Testing of this system with the same dual reporter resulted in a shift in the signal and a strong selectivity for the expression solely of the MS2-tagged mCherry compared to the control PylRS. See Fig. 6F (right side).
P450 2C1 I -27 is an ER membrane-targeting signal (Fazal FM, Han S, Parker KR, et al. Atlas of Subcellular RNA Localization Revealed by APEX-Seq. Cell. 2019;178(2):473-490.e26. doi:10.1016/j.cell.2019.05.027). For these experiments, the AFP constructs P450 2C1 i- 27::FUS::PylRS and P450 2C11.27::EWSR1 ::MCP or P450 2C11.29::FUS::MCP::PylRS were co-expressed in HE293T cells. Testing of this system with the same dual reporter resulted in a shift in the signal and a strong selectivity for the expression solely of the MS2-tagged mCherry compared to the control PylRS. See Fig. 6G.
EXAMPLE 4 - Validation of the selectivity gain being specific to the interaction of the mRNA MS2-tag and MCP
To validate that the observed selectivity gain was specific to the interaction of the MCP segment with the MS2-tag of the mRNA, all the OT systems were characterized by expressing the RS assembler fusion of each OT system without MCP. As expected, no selective orthogonal translation of MS2-tagged mRNA was observed in those cases (See Figure 6 A to G). Additionally, a reporter inversion was performed by moving the MS2-tag from the mCherry to the GFP cassette in the dual-color reporter, which as expected inverted selectivity of the system towards dominant GFP expression (data not shown). This established that the OT systems acted selectively on the MS2-tagged RNA.
EXAMPLE 5 - Introduction of multiple ncAAs into the same POI
GCE can also be used to introduce multiple ncAAs into the same POI (see, e.g., Liu et al., Annu Rev Biochem 2010, 79:413-444; Lemke, ChemBioChem 2014, 15:1691-1694; Chin, Nature 2017, 550;53-60). However, only very few publications report on more than one, that is, two- or three-codon suppression in the same protein in eukaryotes, as yields typically suffer compared to single-codon suppression (see Xiao et al., Angew Chem 2013, 52:14080- 14083; Schmied et al., J Am Chem Soc 2014, 136:15577-15583; Zhang et al., Biochem Biophys Res Co 2017, 489:490-496). Notably, even dual- and triple-Amber proteins were still suppressed with the OT organelle (data not shown).
EXAMPLE 6 - OT with 3-iodophenylalanine
To ensure that also other ncAAs can be translated by the OT assembly, another structurally different ncAA (3-iodophenylalanine) was tested which is a phenylalanine derivative instead of a lysine derivative (such as SCO) and is encoded by a different PylRS mutant (N346A, C348A) (see Wang et al., ACS Chem Biol 2013, 8:405-415). Consistent results were also observed for this system (Fig. 2C).
EXAMPLE 7 - OT with different selector codons
As Opal and Ochre codons are highly abundant in eukaryotic genomes (52% Opal, 28% Ochre in the human genome), the Amber codon is by far the most used for GCE in eukaryotes. In addition, genomic approaches to orthogonal translation by removing those codons in the entire eukaryotic genome would be even more challenging then for the Amber codon and are currently beyond the state of the art. However, in the OT systems of the present invention, a simple mutation in the anticodon loop of the tRNAPyl, as well as in the respective codon in the MS2-tagged POI-encoding mRNA allows for orthogonal translation of those codons. FFC analysis revealed that the OT systems of the invention provide freedom of choice with respect to the stop (selector) codon (Fig. 2C, E). In fact, Opal suppression turned out to be the best performing system, showing an 11-fold selectivity increase. Ochre suppression still showed fivefold selectivity increase with 20% efficiency.
EXAMPLE 8 - Orthogonal translation of proteins of various cellular compartments
To visualize the power of the OTK2::P1 system (the best performing Amber suppression OT system in terms of selectivity and efficiency) beyond“simple” reporters, it was intended to show differential expression of human nucleoporin 153 (Nup153) versus cytoskeletal vimentin. Nup153 locates to the nuclear pore complex and is more than 1500 amino acids long. Hence, its mRNA is approximately six-fold larger than those of the fluorescent protein reporters used above. For this experiment a previously described C-terminal GFP fusion, with an Amber mutation (Nup153::EGFP149TAG) was used that gave rise to a characteristic nuclear envelope stain in confocal images only if Amber suppression was successful (see Nikic et al., Angew Chem 2016, 55: 16172-16276). Said Nup153::EGFP149TAG was now tagged at the mRNA level with an MS2-tag (nup153::egfp149TAG::ms2) and co-expressed from the same plasmid with vimentin (a cytoskeletal protein) containing an Amber codon at position 116 fused to mOrange (Vim116TAG::mOrange). Expression in HEK293T cells resulted in production of both proteins in the presence of the cytoplasmic PylRS showing the characteristic nuclear envelope and cytoskeletal staining, respectively. Using the OTK2::P1 assembly only Nup153::GFP was visible (selective nuclear rim stain in confocal images of the co-transfected HEK293T cells). Consistent results were also observed in COS-7 cells. Swapping the MS-tag to vimentin inverted the effect, so that only Vim116TAG::mOrange was visible (observed for both COS-7 and HEK293T cell experiments). This showed that the OiK2::pi worked for dramatically different mRNAs.
EXAMPLE 9 - Orthogonal translation of transmembrane proteins
It was also shown that transmembrane proteins can be selectively expressed using the QJK2::PI assembly. Membrane protein expression represents another layer of translational complexity, as ribosomes need to bind the endoplasmic reticulum during translation, where the proteins are co-translationally inserted into the membrane. In this experiment, a fusion of insulin receptor 1 with an Amber codon at position 676 with mOrange (I NSR676TAG:: mOrange) was used, which locates to the plasma membrane and gives rise to a characteristic plasma membrane stain in HEK293T cells (see Nikic et ai, Angew Chem 2014, 53:2245-2249). This construct was tagged with an MS2-tag in the 3’ UTR and cloned with Nup153::EGFP149TAG into one dual-cassette plasmid. Then the construct was expressed in HEK293T cells either in the presence of the cytoplasmic PylRS system or in the presence of the OTK2::P1 assembly. In the presence of the OTK2::P1 assembly, selective expression of the MS2-tagged protein and the expected plasma membrane localization of INSR676TAG::mOrange were observed (data not shown), indicating the potential of the OT system of the present invention to participate in even more complex membrane-associated translational processes.
EXAMPLE 10 - Spatial distribution of elements of the OT system in the cell The spatial distribution of AFPs and particularly PylRS in cells was assessed using immunofluorescence (IF). Additionally, fluorescence in situ hybridization (FISH) was used for detecting tRNAPyl. In contrast to the dual color reporter used in the FFC experiments above, in all IF/FISH experiments a single color NLS-GFP39TAG reporter that was fused to an MS2- tag (nls-gfp39TAG::ms2) was used to identify cells active in Amber suppression (this yields a green nucleus if Amber suppression is successful and helped to optimize distinguishable color channels). IF and FISH stainings showed that in contrast to cytoplasmic PylRS, the P1 system formed small, intracellular assembler: ylRS droplets (data not shown). This indicated the occurrence of phase separation. The tRNAPyl co-localized well with highly dispersed assembler: ylRS droplets, indicating that it could nicely partition into the assembler: PylRS phase. Additional stainings showed further co-localization with assembler::MCP (data hot shown). Compared to P1 , the P2 system showed larger but still multiple dispersed droplet-like structures (data not shown). With the combination of both assembler strategies (K1 :P1 , K2::P1 , K1 ::P2, K2:P2) the formation of large micron-sized organelle-like structures in the cytoplasm was observed, these structures were in most cases localized to few or even a single position per cell. For the combined assemblers, mRNA::ms2, tRNAPyl, assemblerPyIRS and assembler::MCP all co-localized to organelle like structures. The combination of the two assembler strategies, that is, phase separation paired with spatial targeting by kinesin truncations, yielded the best confinement as determined by FISH and IF and the highest selectivity increase. This is consistent with the hypothesis that the higher spatial segregation and thus higher local concentration of the tRNAPyl, PylRS and mRNA correlates with higher selectivity.
Ribosomes were stained to see whether they co-localize to the OTK2::P1 assembly. IF staining of the ribosomal protein RPL26L1 revealed strong co-localization with the OTK2::P1 organelle (data not shown) demonstrating ribosome recruitment, tentatively due to binding to mRNA::ms2 during translation. High ribosomal mobility can also explain why it was possible to successfully express the membrane protein INSR (construct: INSR676TAG::mOrange::ms2).
Without wishing to be bound by theory, the experimental results strongly suggests that selective orthogonal translation happens within close proximity of, potentially even inside, the OT assemblies, by a set of recruited ribosomes that are near or fully immersed into a concentrated pool of tRNAPyl. tRNAPyl itself is recruited to the OTK2::P1 assembly due to its affinity for assembler: ylRS and can readily co-partition into the droplet to be aminoacylated with its cognate ncAA, while assembler: :MCP recruits MS2-tagged mRNA. This in turn attracts ribosomes to co-partition into the dense phase formed by the dual assembler system (K2::P1 = KIF16B::FUS::PylRS and KIF16B::EWSR1 ::MCP), which maintains access to other translation factors for translation to function. Ribosomes elsewhere in the cytoplasm that are not exposed to tRNAPyl perform their canonical function to terminate translation whenever they encounter a stop codon.
EXAMPLE 11 - Further OT systems In addition to the OT systems described in the preceding examples, a variety of other OT systems were tested and found to allow for selective orthogonal translation of the reporter (i.e. the POI). A summary of these experiments is shown in the Table 1 below. Unless noted otherwise, the cytoplasmic NES-PyIRS system as previously described by Nikic et al. (Angew Chem Int Ed Engl 2016, 55(52):16172-16176) but with the corresponding AF, AA or AAAF mutations was used as a nonspecific reference (negative control). All experiments were performed in presence of the codon-specific tRNAPyl and PylRS mutant corresponding ncAAs.
Table 1 : Tested OT systems
Figure imgf000082_0001
Figure imgf000083_0001
Figure imgf000084_0001
Figure imgf000085_0001
EXAMPLE 12 - Further OT systems In addition to the OT systems described in the preceding examples, a variety of similar OT systems were tested, differing with respect to the mRNA targeting components, and were found to allow for selective orthogonal translation of the reporter (i.e. the POI). A summary of these experiments is shown in the Table 2 below. The results are shown in Figures 7A, B and C. The cytoplasmic NES-PyIRS system as previously described by Nikic et al. (Angew Chem Int Ed Engl 2016, 55(52):16172-16176) was used as a nonspecific reference (negative control). All experiments were performed in presence of the codon-specific tRNAPyl and PylRS mutant corresponding ncAAs.
Table 2: Tested OT systems
Figure imgf000085_0002
Figure imgf000086_0001
The results are shown in Figures 7A, B and C.
EXAMPLE 13 - Further OT fusion constructs tested
In addition to the OT systems described in the preceding examples, a variety of other OT fusion constructs were prepared and tested and found to allow for selective orthogonal translation of the reporter (i.e. the POI). A summary of the tested constructs is shown in the Table 3 below. Unless noted otherwise, the cytoplasmic NES-PyIRS system as previously described by Nikic et al. (Angew Chem Int Ed Engl 2016, 55(52):16172-16176) but with the corresponding AF, AA or AAAF mutations, or one of the Pyl RS mutants CpkRS, CbzRS, IFRS1 und OMeRS was used as a nonspecific reference (negative control).
All experiments were performed in the presence of the codon specific tRNAPyl and PylRS mutant corresponding noncanoncial amino acids [for example CpkRS with cyclopropene-L- Lysine, CbzRS with N(epsilon)-Benzyloxycarbonyl-L-lysine, IFRS-1 with 3-lodo-L- phenylalanine, OMeRS with 4-Methoxy-L-phenylalanine)].
All constructs were tested with a respective reporter ms2-loops for MCP, boxB-loops for AN22, pp7-loops for PCP.
In all fusion constructs the synthetases should be freely interchangeable.
For the SYNZIP constructs it is important to note that SYNZIP1 forms a pair with SYNZIP2 and SYNZIP3 forms a pair with SYNZIP4. In principle all other described SYNZIPs should work similarly (https://pubs.acs.org/doi/pdf/10.1021/ia907617a).
Table 3: Tested OT fusion constructs
Figure imgf000087_0001
Figure imgf000088_0001
Figure imgf000089_0001
Figure imgf000090_0001
AA: amino acid sequence
ABBREVIATIONS
Qr Symbols representing a peptidic linkage
Symbol representing a combination of polypeptides
AP polypeptide segment acting as assembler
AFP assembler fusion protein
BSA bovine serum albumin
BoxB lambda phase RNA stem-loop, specific binding site of AN22
CbzRS Methanosarcina mazei PylRS (Y306M, L309G, C348T)
CDS (en-)coding sequence
CG1 CG1 (Nup42) nucleoporin protein for targeting to nuclear membrane
CMPSiaTr CMP sialic acid transporter for targeting to Golgi membrane
CpkRS Methanosarcina mazei PylRS (A302S
EB1 EB1 protein for targeting to microtubule plus ends
EBAG9 Receptor-binding cancer antigen expressed on SiSo cells
EBAG9FL EBAG9 full length protein for targeting to Golgi membrane
EBAG9I-29 EBAG9 amino acid residues 1 -29 (N-terminal) for targeting to Golgi membrane
Q pp149TAG enhanced green fluorescent protein, amino acid position 149 encoded by Amber codon (TAG)
EP polypeptide segment acting as an effector
ER Endoplasmatic Reticulum
EWSR1 Ewing sarcoma breakpoint region 1 (also termed EWS herein)
FBS fetal bovine serum
FFC fluorescence flow cytometry
FISH fluorescence in situ hybridization
FRB-CD28 synthetic membrane targeting domain derived from transmembrane proteins CD4, FRB (similar to mTOR) and CD28
FSC-A forward scatter area
FUS fused-in sarcoma
FUS-CD28 (synthetic membrane targeting fusion polypeptide derived from CD4, FUS and
CD28
GCE genetic code expansion GFP green fluorescent protein
0 pp39TAA green fluorescent protein, amino acid position 39 encoded by Ochre codon (TAA)
0 pp39TAG green fluorescent protein, amino acid position 39 encoded by Amber codon (TAG)
0 pp39TGA green fluorescent protein, amino acid position 39 encoded by Opal codon (TGA)
0 pp39,149TAG green fluorescent protein, each of amino acid positions 39 and 149 encoded by Amber codon (TAG)
0 pp39,149,182TAG green fluorescent protein, each of amino acid positions 39, 149 and
182 encoded by Amber codon (TAG)
IC-TP intracellular targeting polypeptide
IDP intrinsically disordered protein
IFRS1 Methanosarcina mazei PylRS (L305M, Y306L, L309S, N346S,
C348M)
INSR insulin receptor
| N S R676TAG insulin receptor, amino acid position 676 encoded by Amber codon
(TAG)
iRFP near-infrared fluorescent protein
KIF13A kinesin family member 13A - Unless specified otherwise herein, “KIF13A” specifically refers to the fragment covering amino acid residues 1-411 of KIF13A wherein P390 is deleted (KI F1 3AI-4H ,AP39O).
KIF16B kinesin family member 16B - Unless specified otherwise herein, “KIF16B” specifically refers to the fragment covering amino acid residues 1-400 of KIF16B (KIFI6B1.400).
AN22 22 amino acid RNA-binding domain of lambda phage antiterminator protein N
LcK posttranslational modification site for plasma membrane targeting of lymphocyte-specific protein tyrosine kinase
mCherry185TAG mCherry, amino acid position 185 encoded by Amber codon (TAG)
MCP MS2 bacteriophage coat protein
MLC membrane-less compartment
MS2 Enterobacteria phage MS2 MS2-tag two MS2 RNA stem-loops fused to the 3' untranslated region of the mRNA (or coding sequence therefor)
ms2 MS2-tag
ncAA non-canonical amino acid
NLS nuclear localization sequence
Nup153 nucleoporin 153
O-RS orthogonal aminoacyl tRNA synthetase
OMeRS Methanosarcina mazei PyrRS (A302T, Y384F, N346V, C348W,
V401 L)
OT assembly spacially enriched components of the GCE machinery in a
membrane-less assembly that is able to act as an artificial orthogonally translating (OT) organelle
P450 2C11-27 P450 2C1 residues 1-27 (N-terminal) for targeting of ER membranes
PBS phosphate buffered saline
PCP Bacteriophage coat protein for targeting to pp7 loop tag
PEI polyethylenimine
POI polypeptide of interest (= target polypeptide)
POITAG POI comprising an Amber-(TAG-)encoded amino acid residue (or coding sequence therefor)
PP7 pp7 loop tag from RNA bacteriophage pp7
PSP phase separation polypeptide
PylRS pyrrolysyl tRNA synthetase
PylRS AA mutant M. mazei pyrrolysyl tRNA synthetase comprising amino acid substitutions N346A and C348A
PylRSAF mutant M. mazei pyrrolysyl tRNA synthetase comprising amino acid substitutions Y306A and Y384F
PylRS AAAF mutant M. mazei pyrrolysyl tRNA synthetase comprising amino acid substitutions Y306A, N346A, C348A and Y384F
RNA-TP RNA targeting polypeptide
RS aminoacyl tRNA synthetase
RT room temperature
SCO cyclooctyne lysine
SEM standard error of the mean
SSC saline-sodium citrate (buffer) SSC-A side scatter area
SSC-W side scatter width
SPD5 spindle-defective protein 5
SYNZI P1 Synthetic coiled coil peptide 1
SYNZI P2 Synthetic coiled coil peptide 2
SYNZI P3 Synthetic coiled coil peptide 3
SYNZI P4 Synthetic coiled coil peptide 4
TOM M20 translocase of outer mitochondrial membrane 20
TOM M20I-7O fragment covering amino acid residues 1-70 of TOMM20
tRNAPyl tRNA that is coupled to pyrrolysyl or another non-canonical amino acid residue by a wild-type or modified PylRS and has an anticodon that, for site-specific incorporation of a (non-canonical) amino acid residue into a POI, is preferably the reverse complement of a selector codon. - The tRNAPyl used in the examples carried the anticodon against the stop codon Amber (tRNAPyl cUA), Ochre (tRNAPyl uUA) or
Opal (tRNAPyl uCA), depending on which of these was used as selector codon in the POI-encoding sequence.
3’ UTR 3’ untranslated region
Vim116TAG Vimentin, amino acid position 116 encoded by Amber codon (TAG)
SEQUENCES
The following sections show the sequences of the polypeptides and polynucleotides described herein.
Nucleic acid sequences are stated in 5’ to 3’ orientation, protein sequences are stated from N- to C-terminus. Sequences - Set 1
1. Hybridization probes
Hybridization probe for tRNA^1 labelled at the 5' terminus with Cy5
CTAACCCGGCTGAACGGATTTAGAGTCCATTCGATC (SEQ ID NO : 1 )
Hybridization probe for tRNA^1 labelled at the 5' terminus with digoxigenin CTAACCCGGCTGAACGGATTTAGAGTCCATTCGATC (SEQ ID NO : 2 )
Hybridization probe for the MS2 RNA stem-loop sequence labelled at the 5' terminus with Alexa Fluor 647
CTGCAGACATGGGTGATCCTCATGTTTTCTA (SEQ ID NO : 3 )
2. tKNAs
DNA sequence for tRNApy1 ' CUA (pyrrolysyl tRNA of Methanosarcina mazei for Amber codon; anticodon underlined)
GGAAACCTGATCATGTAGATCGAATGGACTCTAAATCCGTTCAGCCGGGTTAGATTCCCGGGGTTTCCG (SEQ ID NO: 4)
DNA sequence for tRNApy1 ' UCA (pyrrolysyl tRNA of Methanosarcina mazei for Opal codon; anticodon underlined)
GGAAACCTGATCATGTAGATCGAATGGACTTCAAATCCGTTCAGCCGGGTTAGATTCCCGGGGTTTCCG (SEQ ID NO: 5)
DNA sequence for tRNApy1 ' UUA (pyrrolysyl tRNA of Methanosarcina mazei for Ochre codon; anticodon underlined)
GGAAACCTGATCATGTAGATCGAATGGACTTTAAATCCGTTCAGCCGGGTTAGATTCCCGGGGTTTCCG (SEQ ID NO: 6)
3. O-RSs
PylRSAF (Methanosarcina mazei pyrrolysyl tRNA synthetase double mutant: Y306A, Y384F; Uniprot: Q8PWY1/
DNA:
ATGGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACC
CTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGT
TCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGT
GCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTG
ACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATG
CCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAA
TTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGT
ATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTT
CAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATC
AGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAA
ATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGC
TTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAA
CTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTAT
CTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCC
GACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAAC
CTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTG
TTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGAT
CGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGAC
TTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA ( SEQ ID
NO: 7) Protein :
MACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTAR ALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSK FSPAIPVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEI SLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTE LSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTREN LESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHD FKNIKRAARSESYYNGISTNL (SEQ ID NO : 8 )
PylRS^ (Methanosarcina mazei pyrrolysyl tRNA synthetase double mutant: N346A, C348A; Uniprot: Q8PWY1 )
DNA:
ATGGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACC
CTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGT
TCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGT
GCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTG
ACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATG
CCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAA
TTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGT
ATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTT
CAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATC
AGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAA
ATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGC
TTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAA
CTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTAT
CTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCC
GACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAAC
CTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTG
TATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGAC
CGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGAC
TTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA ( SEQ ID
NO: 9)
Protein :
MACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTAR ALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSK FSPAIPVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEI SLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTE LSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTREN LESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHD FKNIKRAARSESYYNGISTNL (SEQ ID NO: 10)
PylRSAAAF (Methanosarcina mazei pyrrolysyl tRNA synthetase quadruple mutant: Y306A, N346A, C348A, Y384F; Uniprot: Q8PWY1 )
DNA:
GCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTG
ATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCG
AAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCA
CTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACA
AAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCG
AAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTC
TCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATT
AGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAA
GCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGC
CTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATC
TATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTT
CTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTG
AGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTG
CGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGAC GGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTG GAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTT GGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGT GAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTC AAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA ( SEQ ID NO: 11)
Protein :
ACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARA LRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKF SPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEIS LNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTEL SKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENL ESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDF KNIKRAARSESYYNGISTNL (SEQ ID NO: 12)
4. KNA-TPs
MCP (coat protein of Enterobacteria phage MS2)
DNA:
GCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTC GCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAG AGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATG GAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAA GATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTAC (SEQ ID NO: 13)
Protein :
ASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNM ELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY (SEQ ID NO: 14)
lN22 (22 amino acid RNA-binding domain of lambda phage antiterminator protein N)
DNA:
ATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAAC ( SEQ ID NO: 15)
Protein :
MDAQTRRRERRAEKQAQWKAAN (SEQ ID NO: 16)
5. TNs
DNA sequence of Enterobacteria phage MS2 RNA stem-loop
ACATGAGGATCACCCATGT (SEQ ID NO: 17)
DNA sequence of BoxB (lambda phase RNA stem-loop, specific binding site of N22 )
GCCCTGAAAAAGGGC (SEQ ID NO: 18)
6. IC-TPs
KIE16B!-4OO ( Homo sapiens kinesin family member 16B fragment covering amino acid residues 1-400; Uniprot: Q96L93)
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA
(SEQ ID NO: 19)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPT (SEQ ID NO: 20)
KIF13A1-4ll &p39o (Homo sapiens kinesin family member 13A fragment covering amino acid residues 1-411 wherein P390 is deleted; Uniprot: Q9H1H9)
DNA:
ATGTCGGATACCAAGGTAAAAGTTGCCGTCCGGGTCCGGCCCATGAACCGACGAGAACTGGAACTGAACACCAAG TGCGTGGTGGAGATGGAAGGGAATCAAACGGTCCTGCACCCTCCTCCTTCTAACACCAAACAGGGAGAAAGGAAA CCTCCCAAGGTATTTGCCTTTGATTATTGCTTTTGGTCCATGGATGAATCTAACACTACAAAATACGCTGGTCAA GAAGTGGTTTTCAAGTGCCTTGGGGAAGGAATTCTTGAAAAAGCCTTTCAGGGGTATAATGCGTGTATTTTTGCA TATGGACAGACAGGTTCGGGAAAATCCTTTTCCATGATGGGCCATGCTGAGCAGCTGGGCCTTATTCCAAGGCTC TGCTGTGCTTTATTTAAAAGGATCTCTTTGGAGCAAAATGAGTCACAGACCTTTAAAGTTGAAGTGTCCTATATG GAAATTTATAATGAGAAAGTTCGGGATCTTTTAGACCCCAAAGGGAGTAGACAGTCTCTTAAAGTTCGAGAACAT AAAGTTTTGGGACCATATGTAGATGGTTTATCTCAACTAGCTGTCACTAGTTTTGAGGATATTGAGTCATTGATG TCTGAGGGAAATAAGTCTCGAACGGTAGCTGCTACCAACATGAACGAAGAAAGCAGCCGCTCCCATGCTGTGTTC AACATCATAATCACACAGACACTTTATGACCTGCAGTCTGGGAATTCCGGGGAGAAAGTCAGTAAGGTCAGCTTG GTAGACCTGGCGGGTAGCGAAAGAGTATCTAAAACAGGAGCTGCAGGAGAGCGACTGAAAGAAGGCAGCAACATT AACAAATCGCTTACAACCTTGGGGTTGGTTATATCATCACTGGCTGACCAGGCAGCTGGCAAGGGTAAAAGCAAA TTTGTGCCTTATCGAGATTCAGTCCTCACTTGGCTGCTTAAGGACAACTTGGGGGGCAACAGCCAAACCTCTATG ATAGCCACAATCAGCCCAGCCGCAGACAACTATGAAGAGACCCTCTCCACATTAAGATATGCAGACCGAGCCAAA AGGATTGTGAACCATGCTGTTGTGAATGAGGACCCCAACGCAAAAGTGATCCGAGAACTGCGGGAGGAAGTCGAG AAACTGAGAGAGCAGCTCTCTCAGGCAGAGGCCATGAAGGCCGAACTGAAGGAGAAGCTCGAAGAGTCTGAAAAG CTGATAAAAGAACTAACAGTGACTTGGGAA (SEQ ID NO: 21)
Protein :
MSDTKVKVAVRVRPMNRRELELNTKCWEMEGNQTVLHPPPSNTKQGERKPPKVFAFDYCFWSMDESNTTKYAGQ EWFKCLGEGILEKAFQGYNACIFAYGQTGSGKSFSMMGHAEQLGLIPRLCCALFKRISLEQNESQTFKVEVSYM EIYNEKVRDLLDPKGSRQSLKVREHKVLGPYVDGLSQLAVTSFEDIESLMSEGNKSRTVAATNMNEESSRSHAVF NIIITQTLYDLQSGNSGEKVSKVSLVDLAGSERVSKTGAAGERLKEGSNINKSLTTLGLVISSLADQAAGKGKSK FVPYRDSVLTWLLKDNLGGNSQTSMIATISPAADNYEETLSTLRYADRAKRIVNHAWNEDPNAKVIRELREEVE KLREQLSQAEAMKAELKEKLEESEKLIKELTVTWE (SEQ ID NO: 22)
TOMM2q!-7o (Homo sapiens translocase of outer mitochondrial membrane 20 fragment covering amino acid residues 1-70; Uniprot: Q15388)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTC (SEQ ID NO: 23) Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFF (SEQ ID NO : 24 )
LcK (posttranslational modification site for plasma membrane targeting of Mus musculus lymphocyte-specific protein tyrosine kinase; Uniprot: P06240) DNA: GGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTC (SEQ ID NO: 25)
Protein: (identical part underlined P06240)
GCVCSSNPEGTEL (SEQ ID NO:26)
FRB-CD28 (synthetic membrane targeting fusion polypeptide derived from Mus musculus CD4 (Uniprot: P06332), FRB (similar to Homo sapiens mTOR; Uniprot: P42345) and Mus musculus CD28 (Uniprot: P31041)
DNA:
ATGTGCCGAGCCATCTCTCTTAGGCGCTTGCTGCTGCTGCTGCTGCAGCTGTCACAACTCCTAGCTGTCACTCAA
GGGATGCTCGAGATGTGGCATGAAGGCCTGGAAGAGGCATCTCGTTTGTACTTTGGGGAAAGGAACGTGAAAGGC
ATGTTTGAGGTGCTGGAGCCCTTGCATGCTATGATGGAACGGGGCCCCCAGACTCTGAAGGAAACATCCTTTAAT
CAGGCCTATGGTCGAGATTTAATGGAGGCCCAAGAGTGGTGCAGGAAGTACATGAAATCAGGGAATGTCAAGGAC
CTCCTCCAAGCCTGGGACCTCTATTATCATGTGTTCCGACGAATCTCAAAGACTAGAACCGGTAAGCTTTTTTGG
GCACTGGTCGTGGTTGCTGGAGTCCTGTTTTGTTATGGCTTGCTAGTGACAGTGGCTCTTTGTGTT ( SEQ ID
NO : 27 )
Protein :
MCRAISLRRLLLLLLQLSQLLAVTQGMLEMWHEGLEEASRLYFGERNVKGMFEVLEPLHAMMERGPQTLKETSFN QAYGRDLMEAQEWCRKYMKSGNVKDLLQAWDLYYHVFRRISKTRTGKLFWALVWAGVLFCYGLLVTVALCV
(SEQ ID NO : 28 )
FUS-CD28 (synthetic membrane targeting fusion polypeptide derived from Mus musculus CD4 (Uniprot: P06332), Homo sapiens fused-in sarcoma (Uniprot: P35637) and Mus musculus CD28 (Uniprot: P31041)
DNA:
ATGTGCCGAGCCATCTCTCTTAGGCGCTTGCTGCTGCTGCTGCTGCAGCTGTCACAACTCCTAGCTGTCACTCAA GGGATGCTCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGG CAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACT TCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCC CAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTAC CCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTAT GGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAA AGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGA GGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAA GACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGT GGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGC CGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGA TCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACA ATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATT AATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCT AAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGG GCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTAT GGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGA GCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGT AAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGT CGTGGTGGCAGAGGAGGCGGCACCGGTAAGCTTTTTTGGGCACTGGTCGTGGTTGCTGGAGTCCTGTTTTGTTAT GGCTTGCTAGTGACAGTGGCTCTTTGTGTT (SEQ ID NO: 29)
Protein :
MCRAISLRRLLLLLLQLSQLLAVTQGMLMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDT SGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSY GQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQ DQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQG SRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSA KAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQR AGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGGTGKLFWALVWAGVLFCY GLLVTVALCV (SEQ ID NO: 30)
7. PSPs S PD5 (Caenorhabditis elegans spindl e-de fe ctive p rotein 5 ; Unip rot : P 9 134 9 ) DNA :
AT G GAG GAC AAC AG C GT G C T GAAC GAG GAC AG C AAC C T G GAG C AC GT G GAG G G C C AG C C C AGAAGAAG CAT GAG C CAGCCCGTGCTGAACGTGGAGGGCGACAAGAGAACCAGCAGCACCAGCGCCACCCAGCAGCAGGTGCTGAGCGGC G C C T T C AG C AG C G C C GAC GT GAGAAG CAT C C C CAT CAT C C AGAC C T G G GAG GAGAAC AAG G C C C T GAAGAC C AAG AT C AC CAT C C T GAGAG G C GAG C T G C AGAT GT AC C AGAGAAGAT AC AG C GAG G C C AAG GAG G C C AG C C AGAAGAGA GTGAAGGAGGTGATGGACGACTACGTGGACCTGAAGCTGGGCCAGGAGAACGTGCAGGAGAAGATGGAGCAGTAC AAG C T GAT G GAG GAG GAC CTGCTGGC CAT G C AGAG C AGAAT C GAGAC C AG C GAG GAC AAC T T C G C C AGAC AGAT G AAG GAGT T C GAG G C C C AGAAG C AC G C CAT G GAG GAGAGAAT C AAG GAG C T G GAG C T GAG C G C C AC C GAC G C C AAC AACACCACCGTGGGCAGCTTCAGAGGCACCCTGGACGACATCCTGAAGAAGAACGACCCCGACTTCACCCTGACC AGCGGCTACGAGGAGAGAAAGATCAACGACCTGGAGGCCAAGCTGCTGAGCGAGATCGACAAGGTGGCCGAGCTG GAG GAC C AC AT C C AG C AG C T GAGAC AG GAG C T G GAC GAC C AGAG C G C C AGAC T G G C C GAC AG C GAGAAC GT GAGA GCCCAGCTGGAGGCCGCCACCGGCCAGGGCATCCTGGGCGCCGCCGGCAACGCCATGGTGCCCAACAGCACCTTC AT GAT C G G C AAC G G C AGAGAGAG C C AGAC C AGAGAC C AG C T GAAC T AC AT C GAC GAC C T G GAGAC C AAG C T G G C C GAC G C C AAGAAG GAGAAC GAC AAG G C C AGAC AG GCCCTGGTG GAGT AC AT GAAC AAGT G C AG C AAG C T G GAG C AC GAGAT C AGAAC CAT G GT GAAGAAC AG C AC C T T C GAC AG C AG C AG CAT GCTGCTGGGCGGC C AGAC C AG C GAC GAG C T GAAGAT C C AGAT C G G C AAG GT GAAC G G C GAG C T GAAC GT G C T GAGAG C C GAGAAC AGAGAG C T GAGAAT C AGA TGCGACCAGCTGACCGGCGGCGACGGCAACCTGAGCATCAGCCTGGGCCAGAGCAGACTGATGGCCGGCATCGCC AC C AAC GAC GT G GAC AG CAT C G G C C AG G G C AAC GAGAC C G G C G G C AC C AG CAT GAGAAT C C T G C C C AGAGAGAG C CAGCTGGACGACCTGGAGGAGAGCAAGCTGCCCCTGATGGACACCAGCAGCGCCGTGAGAAACCAGCAGCAGTTC G C C AG CAT GT G G GAG GAC T T C GAGAG C GT GAAG GAC AG C C T G C AGAAC AAC C AC AAC GAC AC C C T G GAG G G C AG C TTCAACAGCAGCATGCCCCCCCCCGGCAGAGACGCCACCCAGAGCTTCCTGAGCCAGAAGAGCTTCAAGAACAGC CCCATCGTGATGCAGAAGCCCAAGAGCCTGCACCTGCACCTGAAGAGCCACCAGAGCGAGGGCGCCGGCGAGCAG AT C C AGAAC AAC AG C T T C AG C AC C AAGAC C G C C AG C C C C C AC GT GAG C C AGAG C C AC AT C C C CAT C C T G C AC GAC ATGCAGCAGATCCTGGACAGCAGCGCCATGTTCCTGGAGGGCCAGCACGACGTGGCCGTGAACGTGGAGCAGATG C AG GAGAAGAT GAG C C AGAT C AGAGAG GCCCTGGC C AGAC T GT T C GAGAGAC T GAAGAG C AG CGCCGCCCTGTTC GAG GAGAT C C T G GAGAGAAT G G G C AG C AG C GAC C C C AAC G C C GAC AAGAT C AAGAAGAT GAAG CTGGCCTTC GAG ACCAGCATCAACGACAAGCTGAACGTGAGCGCCATCCTGGAGGCCGCCGAGAAGGACCTGCACAACATGAGCCTG AACTTCAGCATCCTGGAGAAGAGCATCGTGAGCCAGGCCGCCGAGGCCAGCAGAAGATTCACCATCGCCCCCGAC GCCGAGGACGTGGCCAGCAGCAGCCTGCTGAACGCCAGCTACAGCCCCCTGTTCAAGTTCACCAGCAACAGCGAC AT C GT G GAGAAG C T G C AGAAC GAG GT GAG C GAG C T GAAGAAC GAG C T G GAGAT G G C C AGAAC C AGAGAC AT GAGA AGCCCCCT GAAC G G C AG C AG C G G C AGAC T GAG C GAC GT G C AGAT C AAC AC C AAC AGAAT GT T C GAG GAC C T G GAG GTGAGCGAGGCCACCCTGCAGAAGGCCAAGGAGGAGAACAGCACCCTGAAGAGCCAGTTCGCCGAGCTGGAGGCC AACCTGCACCAGGTGAACAGCAAGCTGGGCGAGGTGAGATGCGAGCTGAACGAGGCCCTGGCCAGAGTGGACGGC GAG C AG GAGAC C AGAGT GAAG G C C GAGAAC G C C C T G GAG GAG G C C AGAC AG C T GAT C AG C AG C C T GAAG C AC GAG GAGAAC GAG C T GAAGAAGAC CAT C AC C GAC AT G G G CAT GAGAC T GAAC GAG G C C AAGAAGAG C GAC GAGT T C C T G AAGAG C GAG C T GAG C AC C G C C C T G GAG GAG GAGAAGAAGAG C C AGAAC C T G G C C GAC GAG C T GAG C GAG GAG C T G AAC G G C T G GAGAAT GAGAAC C AAG GAG G C C GAGAAC AAG GT G GAG C AC G C C AG C AG C GAGAAGAG C GAGAT G C T G GAGAGAAT C GT G C AC C T G GAGAC C GAGAT G GAGAAG C T GAG C AC C AG C GAGAT C G C C G C C GAC T AC T G C AG C AC C AAGAT GAC C GAGAGAAAGAAG GAGAT C GAG C T G G C C AAGT AC AGAGAG GAC T T C GAGAAC G C C G C CAT C GT G G G C C T G GAGAGAAT C AG C AAG GAGAT C AG C GAG C T GAC C AAGAAGAC C C T GAAG G C C AAGAT CAT C C C C AG C AAC AT C AG C AG CAT C C AG CTGGTGTGC GAC GAG C T GT G C AGAAGAC T GAG C AGAGAGAGAGAG C AG C AG C AC GAGT AC G C C AAG GT GAT GAGAGAC GT GAAC GAGAAGAT C GAGAAG C T G C AG C T G GAGAAG GAC G C C C T G GAG C AC GAG C T GAAG AT GAT GAG C AG C AAC AAC GAGAAC GTGCCCCCCGTGGG C AC C AG C GT GAG C G G CAT G C C C AC C AAGAC C AG C AAC CAGAAGTGCGCCCAGCCCCACTACACCAGCCCCACCAGACAGCTGCTGCACGAGAGCACCATGGCCGTGGACGCC AT C GT G C AGAAG C T GAAGAAGAC C C AC AAC AT GAG C G G CAT G G G C C C C GAG C T GAAG GAGAC CAT C G G C AAC GT G AT C AAC GAGAG C AGAGT G C T GAGAGAC T T C C T G C AC C AGAAG C T GAT C C T GT T C AAG G G CAT C GAC AT GAG C AAC TGGAAGAACGAGACCGTGGACCAGCTGATCACCGACCTGGGCCAGCTGCACCAGGACAACCTGATGCTGGAGGAG C AGAT CAAGAAGT AC AAGAAG GAG C T GAAG C T GAC C AAGAG C G C CAT C C C C AC CCTGGGCGTG GAGT T C C AG GAC AGAAT C AAGAC C GAGAT C G G C AAGAT C G C C AC C GAC AT GGGCGGCGCCGT GAAG GAGAT C AGAAAGAAG ( S EQ I D NO : 3 1 )
P rotein :
MEDNSVLNEDSNLEHVEGQPRRSMSQPVLNVEGDKRT S ST SATQQQVLS GAFS SADVRS I P I I QTWEENKALKTK
I T I LRGELQMYQRRYS EAKEASQKRVKEVMDDYVDLKLGQENVQEKMEQYKLMEEDLLAMQS RI ET S EDNFARQM
KEFEAQKHAMEERI KELELSATDANNTTVGS FRGTLDDI LKKNDPDFTLT S GYEERKINDLEAKLLS EI DKVAEL
EDHI QQLRQELDDQSARLADS ENVRAQLEAATGQGI LGAAGNAMVPNST FMI GNGRESQTRDQLNYI DDLETKLA
DAKKENDKARQALVEYMNKCS KLEHEI RTMVKNST FDS S SMLLGGQT S DELKI QI GKVNGELNVLRAENRELRI R
CDQLTGGDGNLS I S LGQS RLMAGIATNDVDS I GQGNETGGT SMRI LPRESQLDDLEES KLPLMDT S SAVRNQQQF ASMWEDFESVKDSLQNNHNDTLEGSFNSSMPPPGRDATQSFLSQKSFKNSPIVMQKPKSLHLHLKSHQSEGAGEQ IQNNSFSTKTASPHVSQSHIPILHDMQQILDSSAMFLEGQHDVAVNVEQMQEKMSQIREALARLFERLKSSAALF EEILERMGSSDPNADKIKKMKLAFETSINDKLNVSAILEAAEKDLHNMSLNFSILEKSIVSQAAEASRRFTIAPD AEDVASSSLLNASYSPLFKFTSNSDIVEKLQNEVSELKNELEMARTRDMRSPLNGSSGRLSDVQINTNRMFEDLE VSEATLQKAKEENSTLKSQFAELEANLHQVNSKLGEVRCELNEALARVDGEQETRVKAENALEEARQLISSLKHE ENELKKTITDMGMRLNEAKKSDEFLKSELSTALEEEKKSQNLADELSEELNGWRMRTKEAENKVEHASSEKSEML ERIVHLETEMEKLSTSEIAADYCSTKMTERKKEIELAKYREDFENAAIVGLERISKEISELTKKTLKAKIIPSNI SSIQLVCDELCRRLSREREQQHEYAKVMRDVNEKIEKLQLEKDALEHELKMMSSNNENVPPVGTSVSGMPTKTSN QKCAQPHYTSPTRQLLHESTMAVDAIVQKLKKTHNMSGMGPELKETIGNVINESRVLRDFLHQKLILFKGIDMSN WKNETVDQLITDLGQLHQDNLMLEEQIKKYKKELKLTKSAIPTLGVEFQDRIKTEIGKIATDMGGAVKEIRKK
(SEQ ID NO : 32 )
FUS ( Homo sapiens fused-in sarcoma; Uniprot: P35637)
DNA:
ATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTAT TCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATAT GGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATAT GGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTAT GGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCC CAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAAT CCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGT AACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGT GGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGC GGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGC AGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCAT GACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCT GTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTAC ACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCT ATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTT AATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGT GGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGAC TGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCT AAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGC AGAGGAGGC (SEQ ID NO: 33)
Protein :
MASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGY GSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYN PPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGG GGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIES VADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADF NRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAP KPDGPGGGPGGSHMGGNYGDDRRGGRGG (SEQ ID NO: 34)
EWSR1 ( Homo sapiens Ewing sarcoma breakpoint region 1; Uniprot: Q01844)
DNA:
ATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCC ACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTC AGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACT GGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACC ACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCA GCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAA CCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTA CCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCG ACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGC TATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCT CCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATG GGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGG GGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGC AGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCA CCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGAT GATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATC TACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCCACTGCCAAGGCT GCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCA ATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGA GGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGA GGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCC AATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGC TTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGA GGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGT GGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTG ATGGAACAG (SEQ ID NO: 35)
Protein :
MASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPT GYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQ PQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSS YGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNR GRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLD DLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPP MNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCP NPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFR GGRGMDRGGFGGGRRGGPGGPPGPLMEQ (SEQ ID NO: 36)
8. AFPs
EWSR1-MCP
DNA:
ATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCC ACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTC AGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACT GGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACC ACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCA GCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAA CCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTA CCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCG ACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGC TATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCT CCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATG GGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGG GGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGC AGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCA CCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGAT GATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATC TACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCCACTGCCAAGGCT GCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCA ATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGA GGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGA GGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCC AATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGC TTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGA GGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGT GGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTG ATGGAACAGGATTACAAGGATGACGACGATAAGGGTACCGAGCAGAAGCTGATCTCAGAGGAGGACCTGGGCGCC CCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGA ACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCA CAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTG CCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAG CTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGC ATCTACTAA (SEQ ID NO: 37)
Protein :
MASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPT GYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQ PQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSS YGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNR GRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLD DLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPP MNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCP NPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFR GGRGMDRGGFGGGRRGGPGGPPGPLMEQDYKDDDDKGTEQKLISEEDLGAPGSAGSAAGSGASNFTQFVLVDNGG TGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCE LIVKAMQGLLKDGNPIPSAIAANSGIY (SEQ ID NO: 38)
FUS-MCP
DNA:
ATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTAT TCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATAT GGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATAT GGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTAT GGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCC CAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAAT CCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGT AACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGT GGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGC GGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGC AGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCAT GACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCT GTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTAC ACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCT ATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTT AATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGT GGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGAC TGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCT AAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGC AGAGGAGGCGATTACAAGGATGACGACGATAAGGGTACCGAGCAGAAGCTGATCTCAGAGGAGGACCTGGGCGCC CCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGA ACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCA CAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTG CCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAG CTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGC ATCTACTAA (SEQ ID NO: 39)
Protein :
MASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGY GSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYN PPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGG GGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIES VADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADF NRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAP KPDGPGGGPGGSHMGGNYGDDRRGGRGGDYKDDDDKGTEQKLISEEDLGAPGSAGSAAGSGASNFTQFVLVDNGG TGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCE LIVKAMQGLLKDGNPIPSAIAANSGIY (SEQ ID NO: 40)
FUS-PylRSAF
DNA:
ATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTAT
TCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATAT GGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATAT GGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTAT GGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCC CAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAAT CCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGT AACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGT GGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGC GGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGC AGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCAT GACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCT GTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTAC ACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCT ATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTT AATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGT GGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGAC TGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCT AAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGC AGAGGAGGCGATTACAAGGATGACGACGATAAGGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCC GGCGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAAC TTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGT CAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAAT ATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTA AAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGTACCGGCGCCCCCGGCTCCGCC GGCTCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGAT AAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAA CACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGC TCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAG GATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACC CGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCA CAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTG AGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACA AGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTG AATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGT AAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGC TTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGC ATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCA CCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCG TGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCA GGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTG GGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTG GGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTG CTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACT AACCTGTAA (SEQ ID NO: 41)
Protein :
MASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGY GSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYN PPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGG GGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIES VADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADF NRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAP KPDGPGGGPGGSHMGGNYGDDRRGGRGGDYKDDDDKGTGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSN FANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLL KDGNPIPSAIAANSGIYGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIK HHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPT RTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPIT SMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITR FFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGP CYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAW GPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 42)
MCP-PylRSAF
DNA:
ATGGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAAC TTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGT CAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAAT ATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTA AAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGTACCGGCGCCCCCGGCTCCGCC GGCTCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGAT AAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAA CACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGC TCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAG GATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACC CGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCA CAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTG AGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACA AGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTG AATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGT AAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGC TTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGC ATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCA CCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCG TGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCA GGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTG GGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTG GGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTG CTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACT AACCTGTAA (SEQ ID NO: 43)
Protein :
MASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLN MELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDD KKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDE DLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASV STSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRR KKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLA PNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIV GDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGIST NL (SEQ ID NO : 44 )
SPD5-MCP
DNA:
ATGGAGGACAACAGCGTGCTGAACGAGGACAGCAACCTGGAGCACGTGGAGGGCCAGCCCAGAAGAAGCATGAGC CAGCCCGTGCTGAACGTGGAGGGCGACAAGAGAACCAGCAGCACCAGCGCCACCCAGCAGCAGGTGCTGAGCGGC GCCTTCAGCAGCGCCGACGTGAGAAGCATCCCCATCATCCAGACCTGGGAGGAGAACAAGGCCCTGAAGACCAAG ATCACCATCCTGAGAGGCGAGCTGCAGATGTACCAGAGAAGATACAGCGAGGCCAAGGAGGCCAGCCAGAAGAGA GTGAAGGAGGTGATGGACGACTACGTGGACCTGAAGCTGGGCCAGGAGAACGTGCAGGAGAAGATGGAGCAGTAC AAGCTGATGGAGGAGGACCTGCTGGCCATGCAGAGCAGAATCGAGACCAGCGAGGACAACTTCGCCAGACAGATG AAGGAGTTCGAGGCCCAGAAGCACGCCATGGAGGAGAGAATCAAGGAGCTGGAGCTGAGCGCCACCGACGCCAAC AACACCACCGTGGGCAGCTTCAGAGGCACCCTGGACGACATCCTGAAGAAGAACGACCCCGACTTCACCCTGACC AGCGGCTACGAGGAGAGAAAGATCAACGACCTGGAGGCCAAGCTGCTGAGCGAGATCGACAAGGTGGCCGAGCTG GAGGACCACATCCAGCAGCTGAGACAGGAGCTGGACGACCAGAGCGCCAGACTGGCCGACAGCGAGAACGTGAGA GCCCAGCTGGAGGCCGCCACCGGCCAGGGCATCCTGGGCGCCGCCGGCAACGCCATGGTGCCCAACAGCACCTTC ATGATCGGCAACGGCAGAGAGAGCCAGACCAGAGACCAGCTGAACTACATCGACGACCTGGAGACCAAGCTGGCC GACGCCAAGAAGGAGAACGACAAGGCCAGACAGGCCCTGGTGGAGTACATGAACAAGTGCAGCAAGCTGGAGCAC GAGATCAGAACCATGGTGAAGAACAGCACCTTCGACAGCAGCAGCATGCTGCTGGGCGGCCAGACCAGCGACGAG CTGAAGATCCAGATCGGCAAGGTGAACGGCGAGCTGAACGTGCTGAGAGCCGAGAACAGAGAGCTGAGAATCAGA TGCGACCAGCTGACCGGCGGCGACGGCAACCTGAGCATCAGCCTGGGCCAGAGCAGACTGATGGCCGGCATCGCC ACCAACGACGTGGACAGCATCGGCCAGGGCAACGAGACCGGCGGCACCAGCATGAGAATCCTGCCCAGAGAGAGC CAGCTGGACGACCTGGAGGAGAGCAAGCTGCCCCTGATGGACACCAGCAGCGCCGTGAGAAACCAGCAGCAGTTC GCCAGCATGTGGGAGGACTTCGAGAGCGTGAAGGACAGCCTGCAGAACAACCACAACGACACCCTGGAGGGCAGC TTCAACAGCAGCATGCCCCCCCCCGGCAGAGACGCCACCCAGAGCTTCCTGAGCCAGAAGAGCTTCAAGAACAGC CCCATCGTGATGCAGAAGCCCAAGAGCCTGCACCTGCACCTGAAGAGCCACCAGAGCGAGGGCGCCGGCGAGCAG ATCCAGAACAACAGCTTCAGCACCAAGACCGCCAGCCCCCACGTGAGCCAGAGCCACATCCCCATCCTGCACGAC ATGCAGCAGATCCTGGACAGCAGCGCCATGTTCCTGGAGGGCCAGCACGACGTGGCCGTGAACGTGGAGCAGATG CAGGAGAAGATGAGCCAGATCAGAGAGGCCCTGGCCAGACTGTTCGAGAGACTGAAGAGCAGCGCCGCCCTGTTC GAGGAGATCCTGGAGAGAATGGGCAGCAGCGACCCCAACGCCGACAAGATCAAGAAGATGAAGCTGGCCTTCGAG ACCAGCATCAACGACAAGCTGAACGTGAGCGCCATCCTGGAGGCCGCCGAGAAGGACCTGCACAACATGAGCCTG AACTTCAGCATCCTGGAGAAGAGCATCGTGAGCCAGGCCGCCGAGGCCAGCAGAAGATTCACCATCGCCCCCGAC GCCGAGGACGTGGCCAGCAGCAGCCTGCTGAACGCCAGCTACAGCCCCCTGTTCAAGTTCACCAGCAACAGCGAC ATCGTGGAGAAGCTGCAGAACGAGGTGAGCGAGCTGAAGAACGAGCTGGAGATGGCCAGAACCAGAGACATGAGA AGCCCCCTGAACGGCAGCAGCGGCAGACTGAGCGACGTGCAGATCAACACCAACAGAATGTTCGAGGACCTGGAG GTGAGCGAGGCCACCCTGCAGAAGGCCAAGGAGGAGAACAGCACCCTGAAGAGCCAGTTCGCCGAGCTGGAGGCC AACCTGCACCAGGTGAACAGCAAGCTGGGCGAGGTGAGATGCGAGCTGAACGAGGCCCTGGCCAGAGTGGACGGC GAGCAGGAGACCAGAGTGAAGGCCGAGAACGCCCTGGAGGAGGCCAGACAGCTGATCAGCAGCCTGAAGCACGAG GAGAACGAGCTGAAGAAGACCATCACCGACATGGGCATGAGACTGAACGAGGCCAAGAAGAGCGACGAGTTCCTG AAGAGCGAGCTGAGCACCGCCCTGGAGGAGGAGAAGAAGAGCCAGAACCTGGCCGACGAGCTGAGCGAGGAGCTG AACGGCTGGAGAATGAGAACCAAGGAGGCCGAGAACAAGGTGGAGCACGCCAGCAGCGAGAAGAGCGAGATGCTG GAGAGAATCGTGCACCTGGAGACCGAGATGGAGAAGCTGAGCACCAGCGAGATCGCCGCCGACTACTGCAGCACC AAGATGACCGAGAGAAAGAAGGAGATCGAGCTGGCCAAGTACAGAGAGGACTTCGAGAACGCCGCCATCGTGGGC CTGGAGAGAATCAGCAAGGAGATCAGCGAGCTGACCAAGAAGACCCTGAAGGCCAAGATCATCCCCAGCAACATC AGCAGCATCCAGCTGGTGTGCGACGAGCTGTGCAGAAGACTGAGCAGAGAGAGAGAGCAGCAGCACGAGTACGCC AAGGTGATGAGAGACGTGAACGAGAAGATCGAGAAGCTGCAGCTGGAGAAGGACGCCCTGGAGCACGAGCTGAAG ATGATGAGCAGCAACAACGAGAACGTGCCCCCCGTGGGCACCAGCGTGAGCGGCATGCCCACCAAGACCAGCAAC CAGAAGTGCGCCCAGCCCCACTACACCAGCCCCACCAGACAGCTGCTGCACGAGAGCACCATGGCCGTGGACGCC ATCGTGCAGAAGCTGAAGAAGACCCACAACATGAGCGGCATGGGCCCCGAGCTGAAGGAGACCATCGGCAACGTG ATCAACGAGAGCAGAGTGCTGAGAGACTTCCTGCACCAGAAGCTGATCCTGTTCAAGGGCATCGACATGAGCAAC TGGAAGAACGAGACCGTGGACCAGCTGATCACCGACCTGGGCCAGCTGCACCAGGACAACCTGATGCTGGAGGAG CAGATCAAGAAGTACAAGAAGGAGCTGAAGCTGACCAAGAGCGCCATCCCCACCCTGGGCGTGGAGTTCCAGGAC AGAATCAAGACCGAGATCGGCAAGATCGCCACCGACATGGGCGGCGCCGTGAAGGAGATCAGAAAGAAGGGTACC GAGCAGAAGCTGATCTCAGAGGAGGACCTGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCTTCT AACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAAC GGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCT GCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTA ACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGA AACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 45)
Protein :
MEDNSVLNEDSNLEHVEGQPRRSMSQPVLNVEGDKRTSSTSATQQQVLSGAFSSADVRSIPIIQTWEENKALKTK
ITILRGELQMYQRRYSEAKEASQKRVKEVMDDYVDLKLGQENVQEKMEQYKLMEEDLLAMQSRIETSEDNFARQM
KEFEAQKHAMEERIKELELSATDANNTTVGSFRGTLDDILKKNDPDFTLTSGYEERKINDLEAKLLSEIDKVAEL
EDHIQQLRQELDDQSARLADSENVRAQLEAATGQGILGAAGNAMVPNSTFMIGNGRESQTRDQLNYIDDLETKLA
DAKKENDKARQALVEYMNKCSKLEHEIRTMVKNSTFDSSSMLLGGQTSDELKIQIGKVNGELNVLRAENRELRIR
CDQLTGGDGNLSISLGQSRLMAGIATNDVDSIGQGNETGGTSMRILPRESQLDDLEESKLPLMDTSSAVRNQQQF
ASMWEDFESVKDSLQNNHNDTLEGSFNSSMPPPGRDATQSFLSQKSFKNSPIVMQKPKSLHLHLKSHQSEGAGEQ
IQNNSFSTKTASPHVSQSHIPILHDMQQILDSSAMFLEGQHDVAVNVEQMQEKMSQIREALARLFERLKSSAALF
EEILERMGSSDPNADKIKKMKLAFETSINDKLNVSAILEAAEKDLHNMSLNFSILEKSIVSQAAEASRRFTIAPD
AEDVASSSLLNASYSPLFKFTSNSDIVEKLQNEVSELKNELEMARTRDMRSPLNGSSGRLSDVQINTNRMFEDLE
VSEATLQKAKEENSTLKSQFAELEANLHQVNSKLGEVRCELNEALARVDGEQETRVKAENALEEARQLISSLKHE
ENELKKTITDMGMRLNEAKKSDEFLKSELSTALEEEKKSQNLADELSEELNGWRMRTKEAENKVEHASSEKSEML
ERIVHLETEMEKLSTSEIAADYCSTKMTERKKEIELAKYREDFENAAIVGLERISKEISELTKKTLKAKIIPSNI
SSIQLVCDELCRRLSREREQQHEYAKVMRDVNEKIEKLQLEKDALEHELKMMSSNNENVPPVGTSVSGMPTKTSN
QKCAQPHYTSPTRQLLHESTMAVDAIVQKLKKTHNMSGMGPELKETIGNVINESRVLRDFLHQKLILFKGIDMSN
WKNETVDQLITDLGQLHQDNLMLEEQIKKYKKELKLTKSAIPTLGVEFQDRIKTEIGKIATDMGGAVKEIRKKGT
EQKLISEEDLGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSS AQNRKYT I KVEVPKGAWRSYLNMELT I P I FATNS DCELIVKAMQGLLKDGNP I P SAIAANS GI Y ( S EQ I D NO : 4 6 )
S PD5 - Pyl RSAF
DNA :
AT G GAG GAC AAC AG C GT G C T GAAC GAG GAC AG C AAC C T G GAG C AC GT G GAG G G C C AG C C C AGAAGAAG CAT GAG C CAGCCCGTGCTGAACGTGGAGGGCGACAAGAGAACCAGCAGCACCAGCGCCACCCAGCAGCAGGTGCTGAGCGGC G C C T T C AG C AG C G C C GAC GT GAGAAG CAT C C C CAT CAT C C AGAC C T G G GAG GAGAAC AAG G C C C T GAAGAC C AAG AT C AC CAT C C T GAGAG G C GAG C T G C AGAT GT AC C AGAGAAGAT AC AG C GAG G C C AAG GAG G C C AG C C AGAAGAGA GTGAAGGAGGTGATGGACGACTACGTGGACCTGAAGCTGGGCCAGGAGAACGTGCAGGAGAAGATGGAGCAGTAC AAG C T GAT G GAG GAG GAC CTGCTGGC CAT G C AGAG C AGAAT C GAGAC C AG C GAG GAC AAC T T C G C C AGAC AGAT G AAG GAGT T C GAG G C C C AGAAG C AC G C CAT G GAG GAGAGAAT C AAG GAG C T G GAG C T GAG C G C C AC C GAC G C C AAC AACACCACCGTGGGCAGCTTCAGAGGCACCCTGGACGACATCCTGAAGAAGAACGACCCCGACTTCACCCTGACC AGCGGCTACGAGGAGAGAAAGATCAACGACCTGGAGGCCAAGCTGCTGAGCGAGATCGACAAGGTGGCCGAGCTG GAG GAC C AC AT C C AG C AG C T GAGAC AG GAG C T G GAC GAC C AGAG C G C C AGAC T G G C C GAC AG C GAGAAC GT GAGA GCCCAGCTGGAGGCCGCCACCGGCCAGGGCATCCTGGGCGCCGCCGGCAACGCCATGGTGCCCAACAGCACCTTC AT GAT C G G C AAC G G C AGAGAGAG C C AGAC C AGAGAC C AG C T GAAC T AC AT C GAC GAC C T G GAGAC C AAG C T G G C C GAC G C C AAGAAG GAGAAC GAC AAG G C C AGAC AG GCCCTGGTG GAGT AC AT GAAC AAGT G C AG C AAG C T G GAG C AC GAGAT C AGAAC CAT G GT GAAGAAC AG C AC C T T C GAC AG C AG C AG CAT GCTGCTGGGCGGC C AGAC C AG C GAC GAG C T GAAGAT C C AGAT C G G C AAG GT GAAC G G C GAG C T GAAC GT G C T GAGAG C C GAGAAC AGAGAG C T GAGAAT C AGA TGCGACCAGCTGACCGGCGGCGACGGCAACCTGAGCATCAGCCTGGGCCAGAGCAGACTGATGGCCGGCATCGCC AC C AAC GAC GT G GAC AG CAT C G G C C AG G G C AAC GAGAC C G G C G G C AC C AG CAT GAGAAT C C T G C C C AGAGAGAG C CAGCTGGACGACCTGGAGGAGAGCAAGCTGCCCCTGATGGACACCAGCAGCGCCGTGAGAAACCAGCAGCAGTTC G C C AG CAT GT G G GAG GAC T T C GAGAG C GT GAAG GAC AG C C T G C AGAAC AAC C AC AAC GAC AC C C T G GAG G G C AG C TTCAACAGCAGCATGCCCCCCCCCGGCAGAGACGCCACCCAGAGCTTCCTGAGCCAGAAGAGCTTCAAGAACAGC CCCATCGTGATGCAGAAGCCCAAGAGCCTGCACCTGCACCTGAAGAGCCACCAGAGCGAGGGCGCCGGCGAGCAG AT C C AGAAC AAC AG C T T C AG C AC C AAGAC C G C C AG C C C C C AC GT GAG C C AGAG C C AC AT C C C CAT C C T G C AC GAC ATGCAGCAGATCCTGGACAGCAGCGCCATGTTCCTGGAGGGCCAGCACGACGTGGCCGTGAACGTGGAGCAGATG C AG GAGAAGAT GAG C C AGAT C AGAGAG GCCCTGGC C AGAC T GT T C GAGAGAC T GAAGAG C AG CGCCGCCCTGTTC GAG GAGAT C C T G GAGAGAAT G G G C AG C AG C GAC C C C AAC G C C GAC AAGAT C AAGAAGAT GAAG CTGGCCTTC GAG ACCAGCATCAACGACAAGCTGAACGTGAGCGCCATCCTGGAGGCCGCCGAGAAGGACCTGCACAACATGAGCCTG AACTTCAGCATCCTGGAGAAGAGCATCGTGAGCCAGGCCGCCGAGGCCAGCAGAAGATTCACCATCGCCCCCGAC GCCGAGGACGTGGCCAGCAGCAGCCTGCTGAACGCCAGCTACAGCCCCCTGTTCAAGTTCACCAGCAACAGCGAC AT C GT G GAGAAG C T G C AGAAC GAG GT GAG C GAG C T GAAGAAC GAG C T G GAGAT G G C C AGAAC C AGAGAC AT GAGA AGCCCCCT GAAC G G C AG C AG C G G C AGAC T GAG C GAC GT G C AGAT C AAC AC C AAC AGAAT GT T C GAG GAC C T G GAG GTGAGCGAGGCCACCCTGCAGAAGGCCAAGGAGGAGAACAGCACCCTGAAGAGCCAGTTCGCCGAGCTGGAGGCC AACCTGCACCAGGTGAACAGCAAGCTGGGCGAGGTGAGATGCGAGCTGAACGAGGCCCTGGCCAGAGTGGACGGC GAG C AG GAGAC C AGAGT GAAG G C C GAGAAC G C C C T G GAG GAG G C C AGAC AG C T GAT C AG C AG C C T GAAG C AC GAG GAGAAC GAG C T GAAGAAGAC CAT C AC C GAC AT G G G CAT GAGAC T GAAC GAG G C C AAGAAGAG C GAC GAGT T C C T G AAGAG C GAG C T GAG C AC C G C C C T G GAG GAG GAGAAGAAGAG C C AGAAC C T G G C C GAC GAG C T GAG C GAG GAG C T G AAC G G C T G GAGAAT GAGAAC C AAG GAG G C C GAGAAC AAG GT G GAG C AC G C C AG C AG C GAGAAGAG C GAGAT G C T G GAGAGAAT C GT G C AC C T G GAGAC C GAGAT G GAGAAG C T GAG C AC C AG C GAGAT C G C C G C C GAC T AC T G C AG C AC C AAGAT GAC C GAGAGAAAGAAG GAGAT C GAG C T G G C C AAGT AC AGAGAG GAC T T C GAGAAC G C C G C CAT C GT G G G C C T G GAGAGAAT C AG C AAG GAGAT C AG C GAG C T GAC C AAGAAGAC C C T GAAG G C C AAGAT CAT C C C C AG C AAC AT C AG C AG CAT C C AG CTGGTGTGC GAC GAG C T GT G C AGAAGAC T GAG C AGAGAGAGAGAG C AG C AG C AC GAGT AC G C C AAG GT GAT GAGAGAC GT GAAC GAGAAGAT C GAGAAG C T G C AG C T G GAGAAG GAC G C C C T G GAG C AC GAG C T GAAG AT GAT GAG C AG C AAC AAC GAGAAC GTGCCCCCCGTGGG C AC C AG C GT GAG C G G CAT G C C C AC C AAGAC C AG C AAC CAGAAGTGCGCCCAGCCCCACTACACCAGCCCCACCAGACAGCTGCTGCACGAGAGCACCATGGCCGTGGACGCC AT C GT G C AGAAG C T GAAGAAGAC C C AC AAC AT GAG C G G CAT G G G C C C C GAG C T GAAG GAGAC CAT C G G C AAC GT G AT C AAC GAGAG C AGAGT G C T GAGAGAC T T C C T G C AC C AGAAG C T GAT C C T GT T C AAG G G CAT C GAC AT GAG C AAC TGGAAGAACGAGACCGTGGACCAGCTGATCACCGACCTGGGCCAGCTGCACCAGGACAACCTGATGCTGGAGGAG C AGAT CAAGAAGT AC AAGAAG GAG C T GAAG C T GAC C AAGAG C G C CAT C C C C AC CCTGGGCGTG GAGT T C C AG GAC AGAAT C AAGAC C GAGAT C G G C AAGAT C G C C AC C GAC AT GGGCGGCGCCGT GAAG GAGAT C AGAAAGAAG G GT AC C GGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAA CGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGA AC CAT T C AT AAAAT C AAAC AC C AC GAG GT TAGCCGTTC GAAAAT C TAT AT T GAGAT GGCGTGTGGC GAT CAT C T G GTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGT TGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAA GTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAAC ACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTT TCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGC AATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGAT CGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGC GAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTG GAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAG TATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGT CTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAA ATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAAC TTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGC ATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAA CTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGT TTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTAT TACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 47)
Protein :
MEDNSVLNEDSNLEHVEGQPRRSMSQPVLNVEGDKRTSSTSATQQQVLSGAFSSADVRSIPIIQTWEENKALKTK ITILRGELQMYQRRYSEAKEASQKRVKEVMDDYVDLKLGQENVQEKMEQYKLMEEDLLAMQSRIETSEDNFARQM KEFEAQKHAMEERIKELELSATDANNTTVGSFRGTLDDILKKNDPDFTLTSGYEERKINDLEAKLLSEIDKVAEL EDHIQQLRQELDDQSARLADSENVRAQLEAATGQGILGAAGNAMVPNSTFMIGNGRESQTRDQLNYIDDLETKLA DAKKENDKARQALVEYMNKCSKLEHEIRTMVKNSTFDSSSMLLGGQTSDELKIQIGKVNGELNVLRAENRELRIR CDQLTGGDGNLSISLGQSRLMAGIATNDVDSIGQGNETGGTSMRILPRESQLDDLEESKLPLMDTSSAVRNQQQF ASMWEDFESVKDSLQNNHNDTLEGSFNSSMPPPGRDATQSFLSQKSFKNSPIVMQKPKSLHLHLKSHQSEGAGEQ IQNNSFSTKTASPHVSQSHIPILHDMQQILDSSAMFLEGQHDVAVNVEQMQEKMSQIREALARLFERLKSSAALF EEILERMGSSDPNADKIKKMKLAFETSINDKLNVSAILEAAEKDLHNMSLNFSILEKSIVSQAAEASRRFTIAPD AEDVASSSLLNASYSPLFKFTSNSDIVEKLQNEVSELKNELEMARTRDMRSPLNGSSGRLSDVQINTNRMFEDLE VSEATLQKAKEENSTLKSQFAELEANLHQVNSKLGEVRCELNEALARVDGEQETRVKAENALEEARQLISSLKHE ENELKKTITDMGMRLNEAKKSDEFLKSELSTALEEEKKSQNLADELSEELNGWRMRTKEAENKVEHASSEKSEML ERIVHLETEMEKLSTSEIAADYCSTKMTERKKEIELAKYREDFENAAIVGLERISKEISELTKKTLKAKIIPSNI SSIQLVCDELCRRLSREREQQHEYAKVMRDVNEKIEKLQLEKDALEHELKMMSSNNENVPPVGTSVSGMPTKTSN QKCAQPHYTSPTRQLLHESTMAVDAIVQKLKKTHNMSGMGPELKETIGNVINESRVLRDFLHQKLILFKGIDMSN WKNETVDQLITDLGQLHQDNLMLEEQIKKYKKELKLTKSAIPTLGVEFQDRIKTEIGKIATDMGGAVKEIRKKGT GAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHL WNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLEN TEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTD RLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLE YIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLN FCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAG FGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 48)
KIF16B-FUS-PylRSAF
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA TATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATG
GCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCC
CAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGC
CAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGC
TCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGC
CAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAG
AGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCC
CCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAAC
TATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGT
GGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGC
GGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGA
GGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGAC
TCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTG
GCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACA
GACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATT
GACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAAT
CGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGC
AGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGG
AAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAA
CCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGA
GGAGGCGATTACAAGGATGACGACGATAAGGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGC
GCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTG
ATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCG
AAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCA
CTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACA
AAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCG
AAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTC
TCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATT
AGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAA
GCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGC
CTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATC
TATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTT
CTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTG
AGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTG
CGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGAC
GGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTG
GAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTT
GGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGT
GAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTC
AAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA ( SEQ ID
NO: 49)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTYTDIEMNRLGKGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYS QQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYG QQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGN YGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGR GGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYT DRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGG SGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGR GGDYKDDDDKGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRS KIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMP KSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQ ASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGF LEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESD GKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDR EWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 50)
KIF16B-VSV-G-FUS-PylRSAF
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA TATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATG GCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCC CAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGC CAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGC TCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGC CAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAG AGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCC CCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAAC TATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGT GGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGC GGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGA GGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGAC TCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTG GCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACA GACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATT GACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAAT CGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGC AGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGG AAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAA CCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGA GGTGGTGCGATCGCAGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAG CTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGG ATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCG TGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGT AAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACA AGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCT AAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCT ACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGC GCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACA AAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTT CGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAAC TATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATT CTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTG GATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTG CCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAG TTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTT CTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATG CACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCT TGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCA CGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 51)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTYTDIEMNRLGKGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYS QQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYG QQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGN YGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGR GGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYT DRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGG SGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGR GGAIAGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMA CGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAP KPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALT KSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPI LIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEE FTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKP WIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 52)
Figure imgf000111_0001
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA TATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATG GCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCC CAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGC CAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGC TCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGC CAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAG AGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCC CCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAAC TATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGT GGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGC GGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGA GGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGAC TCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTG GCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACA GACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATT GACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAAT CGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGC AGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGG AAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAA CCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGA GGAGGCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCGTGCCCGGTGCCGCTGCAGCTGCCG CCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGT CGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGC GATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACC TGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTG AAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCA CTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAG GAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTG GTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCC CAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAA CTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTG GGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATT CCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAA AACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGAT CCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACC ATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAAC CACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGC GACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATC GGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCT GAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 53)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTYTDIEMNRLGKGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYS QQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYG QQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGN YGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGR GGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYT DRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGG SGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGR GGGAPGSAGSAAGSGMACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACG DHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKP LENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKS QTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILI PLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFT MLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWI GAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 54)
Figure imgf000112_0001
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT
AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT
GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA
AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC
TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG
CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA
TATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATG
GCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCC
CAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGC
CAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGC
TCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGC
CAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAG
AGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCC
CCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAAC
TATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGT
GGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGC
GGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGA
GGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGAC
TCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTG
GCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACA
GACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATT
GACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAAT
CGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGC
AGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGG
AAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAA
CCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGA
GGAGGCGATTACAAGGATGACGACGATAAGGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGC
GCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTG
ATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCG
AAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCA
CTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACA
AAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCG
AAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTC
TCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATT
AGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAA
GCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGC
CTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATC
TATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTT
CTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTG
AGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTG
CGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGAC
GGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTG
GAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTT
GGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGT
GAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTC
AAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA ( SEQ ID
NO : 55 )
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTYTDIEMNRLGKGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYS QQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYG QQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGN YGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGR GGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYT DRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGG SGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGR GGDYKDDDDKGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRS KIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMP KSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQ ASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGF LEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESD GKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDR EWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 56)
KIF16B-EWSR1-MCP
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA ATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCC ACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTC AGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACT GGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACC ACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCA GCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAA CCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTA CCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCG ACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGC TATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCT CCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATG GGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGG GGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGC AGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCA CCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGAT GATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATC TACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCT GCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCA ATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGA GGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGA GGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCC AATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGC TTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGA GGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGT GGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTG ATGGAACAGGATTACAAGGATGACGACGATAAGGGTACCGAGCAGAAGCTGATCTCAGAGGAGGACCTGGGCGCC CCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGA ACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCA CAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTG CCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAG CTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGC ATCTACTAA (SEQ ID NO: 57)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDV SYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYP AYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQP TSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSM GVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGP PVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKA AVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPR GPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGR GGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQDYKDDDDKGTEQKLISEEDLGA PGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEV PKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAA SGIY (SEQ ID NO: 58)
Figure imgf000115_0001
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA TATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATG GCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCC CAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGC CAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGC TCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGC CAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAG AGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCC CCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAAC TATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGT GGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGC GGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGA GGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGAC TCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTG GCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACA GACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATT GACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAAT CGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGC AGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGG AAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAA CCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGA GGAGGCGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCA AACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCA CAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCC GGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGT CGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGA GCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAA TGGAAAGCTGCAAACCCACCGCTCGATTACAAGGATGACGACGATAAGGGTACCGGCGCCCCCGGCTCCGCCGGC TCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAA AAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACAC CACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCT TCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGAT CTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGT ACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAG CCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGC ACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGC ATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAAT CCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAA AAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTT TTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATC GACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCA AATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGT TATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGT TGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGC GACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGC CCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTG AAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAAC CTGTAA (SEQ ID NO: 59)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTYTDIEMNRLGKGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYS QQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYG QQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGN YGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGR GGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYT DRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGG SGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGR GGATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGA GAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQ WKAANPPLDYKDDDDKGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKH HEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTR TKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITS MSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRF FVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPC YRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWG PIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 60)
KIF16B-FUS-MCP-PylRSAF
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA TATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATG GCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCC CAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGC CAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGC TCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGC CAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAG AGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCC CCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAAC TATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGT GGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGC GGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGA GGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGAC TCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTG GCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACA GACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATT GACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAAT CGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGC AGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGG AAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAA CCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGA GGAGGCGATTACAAGGATGACGACGATAAGGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGC GCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTC GCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAG AGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATG GAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAA GATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGTACCGGCGCCCCCGGCTCCGCCGGC TCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAA AAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACAC CACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCT TCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGAT CTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGT ACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAG CCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGC ACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGC ATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAAT CCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAA AAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTT TTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATC GACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCA AATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGT TATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGT TGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGC GACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGC CCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTG AAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAAC CTGTAA (SEQ ID NO: 61)
Protein : MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTYTDIEMNRLGKGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYS QQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYG QQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGN YGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGR GGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYT DRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGG SGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGR GGDYKDDDDKGTGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQ SSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGTGAPGSAG SAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRS SRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQ PSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLN PKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGI DNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSG CTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLL KVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 62)
KIF16B-PylRSAF
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA TATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGCTCCGCCGGCTCCGCCGGCTCCGCCGCCGGC TCCGGCATGGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTG AATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTT AGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACA GCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAA TTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAA GCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGA AGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATT AGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCC CCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGAC GAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTG CAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGAT CGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGAT ACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCT AACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAA GAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGT GAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGT ATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCG CTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAA CACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA
(SEQ ID NO: 63)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTYTDIEMNRLGKGAPGSAGSAGSAAGSGMACPVPLQLPPLERLTLDDKKPL NTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNK FLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSI SSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDL QQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLA NYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSC MVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL
(SEQ ID NO: 64)
KIF16B-MCP
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA TATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATG GCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTC GCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAG AGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATG GAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAA GATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 65)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTYTDIEMNRLGKGAPGSAGSAAGSGMASNFTQFVLVDNGGTGDVTVAPSNF ANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLK DGNPIPSAIAANSGIY (SEQ ID NO: 66)
KIF16B-MCP-PylRSAF
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA TATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATG GCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTC GCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAG AGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATG GAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAA GATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGTACCGGCGCCCCCGGCTCCGCCGGC TCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAA AAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACAC CACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCT TCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGAT CTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGT ACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAG CCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGC ACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGC ATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAAT CCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAA AAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTT TTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATC GACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCA AATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGT TATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGT TGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGC GACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGC CCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTG AAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAAC CTGTAA (SEQ ID NO: 67)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTYTDIEMNRLGKGAPGSAGSAAGSGMASNFTQFVLVDNGGTGDVTVAPSNF ANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLK DGNPIPSAIAANSGIYGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKH HEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTR TKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITS MSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRF FVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPC YRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWG PIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 68)
KIF16B-SPD5-PylRSAF
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAG CAT C C C AAAGAAG G C C C T TAT GT T GAG GAT T TAT C C AAAC AT T T AGT AC AGAAT TAT G GT GAC GT AGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCT cGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TAT G C AAAT AGAG C C AAAAAC AT CAT C AAC AAG C C T AC CAT T AAT GAG GAT G C C AAC GT C AAAC T TAT C C GT GAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA AT G GAG GAC AAC AG C GT G C T GAAC GAG GAC AG C AAC C T G GAG C AC GT G GAG G G C C AG C C C AGAAGAAG CAT GAG C CAGCCCGTGCTGAACGTGGAGGGCGACAAGAGAACCAGCAGCACCAGCGCCACCCAGCAGCAGGTGCTGAGCGGC G C C T T C AG C AG C G C C GAC GT GAGAAG CAT C C C CAT CAT C C AGAC C T G G GAG GAGAAC AAG G C C C T GAAGAC C AAG AT C AC CAT C C T GAGAG G C GAG C T G C AGAT GT AC C AGAGAAGAT AC AG C GAG G C C AAG GAG G C C AG C C AGAAGAGA GTGAAGGAGGTGATGGACGACTACGTGGACCTGAAGCTGGGCCAGGAGAACGTGCAGGAGAAGATGGAGCAGTAC AAG C T GAT G GAG GAG GAC CTGCTGGC CAT G C AGAG C AGAAT C GAGAC C AG C GAG GAC AAC T T C G C C AGAC AGAT G AAG GAGT T C GAG G C C C AGAAG C AC G C CAT G GAG GAGAGAAT C AAG GAG C T G GAG C T GAG C G C C AC C GAC G C C AAC AACACCACCGTGGGCAGCTTCAGAGGCACCCTGGACGACATCCTGAAGAAGAACGACCCCGACTTCACCCTGACC AGCGGCTACGAGGAGAGAAAGATCAACGACCTGGAGGCCAAGCTGCTGAGCGAGATCGACAAGGTGGCCGAGCTG GAG GAC C AC AT C C AG C AG C T GAGAC AG GAG C T G GAC GAC C AGAG C G C C AGAC T G G C C GAC AG C GAGAAC GT GAGA GCCCAGCTGGAGGCCGCCACCGGCCAGGGCATCCTGGGCGCCGCCGGCAACGCCATGGTGCCCAACAGCACCTTC AT GAT C G G C AAC G G C AGAGAGAG C C AGAC C AGAGAC C AG C T GAAC T AC AT C GAC GAC C T G GAGAC C AAG C T G G C C GAC G C C AAGAAG GAGAAC GAC AAG G C C AGAC AG GCCCTGGTG GAGT AC AT GAAC AAGT G C AG C AAG C T G GAG C AC GAGAT C AGAAC CAT G GT GAAGAAC AG C AC C T T C GAC AG C AG C AG CAT GCTGCTGGGCGGC C AGAC C AG C GAC GAG C T GAAGAT C C AGAT C G G C AAG GT GAAC G G C GAG C T GAAC GT G C T GAGAG C C GAGAAC AGAGAG C T GAGAAT C AGA TGCGACCAGCTGACCGGCGGCGACGGCAACCTGAGCATCAGCCTGGGCCAGAGCAGACTGATGGCCGGCATCGCC AC C AAC GAC GT G GAC AG CAT C G G C C AG G G C AAC GAGAC C G G C G G C AC C AG CAT GAGAAT C C T G C C C AGAGAGAG C CAGCTGGACGACCTGGAGGAGAGCAAGCTGCCCCTGATGGACACCAGCAGCGCCGTGAGAAACCAGCAGCAGTTC G C C AG CAT GT G G GAG GAC T T C GAGAG C GT GAAG GAC AG C C T G C AGAAC AAC C AC AAC GAC AC C C T G GAG G G C AG C TTCAACAGCAGCATGCCCCCCCCCGGCAGAGACGCCACCCAGAGCTTCCTGAGCCAGAAGAGCTTCAAGAACAGC CCCATCGTGATGCAGAAGCCCAAGAGCCTGCACCTGCACCTGAAGAGCCACCAGAGCGAGGGCGCCGGCGAGCAG AT C C AGAAC AAC AG C T T C AG C AC C AAGAC C G C C AG C C C C C AC GT GAG C C AGAG C C AC AT C C C CAT C C T G C AC GAC ATGCAGCAGATCCTGGACAGCAGCGCCATGTTCCTGGAGGGCCAGCACGACGTGGCCGTGAACGTGGAGCAGATG C AG GAGAAGAT GAG C C AGAT C AGAGAG GCCCTGGC C AGAC T GT T C GAGAGAC T GAAGAG C AG CGCCGCCCTGTTC GAG GAGAT C C T G GAGAGAAT G G G C AG C AG C GAC C C C AAC G C C GAC AAGAT C AAGAAGAT GAAG CTGGCCTTC GAG ACCAGCATCAACGACAAGCTGAACGTGAGCGCCATCCTGGAGGCCGCCGAGAAGGACCTGCACAACATGAGCCTG AACTTCAGCATCCTGGAGAAGAGCATCGTGAGCCAGGCCGCCGAGGCCAGCAGAAGATTCACCATCGCCCCCGAC GCCGAGGACGTGGCCAGCAGCAGCCTGCTGAACGCCAGCTACAGCCCCCTGTTCAAGTTCACCAGCAACAGCGAC AT C GT G GAGAAG C T G C AGAAC GAG GT GAG C GAG C T GAAGAAC GAG C T G GAGAT G G C C AGAAC C AGAGAC AT GAGA AGCCCCCT GAAC G G C AG C AG C G G C AGAC T GAG C GAC GT G C AGAT C AAC AC C AAC AGAAT GT T C GAG GAC C T G GAG GTGAGCGAGGCCACCCTGCAGAAGGCCAAGGAGGAGAACAGCACCCTGAAGAGCCAGTTCGCCGAGCTGGAGGCC AACCTGCACCAGGTGAACAGCAAGCTGGGCGAGGTGAGATGCGAGCTGAACGAGGCCCTGGCCAGAGTGGACGGC GAG C AG GAGAC C AGAGT GAAG G C C GAGAAC G C C C T G GAG GAG G C C AGAC AG C T GAT C AG C AG C C T GAAG C AC GAG GAGAAC GAG C T GAAGAAGAC CAT C AC C GAC AT G G G CAT GAGAC T GAAC GAG G C C AAGAAGAG C GAC GAGT T C C T G AAGAG C GAG C T GAG C AC C G C C C T G GAG GAG GAGAAGAAGAG C C AGAAC C T G G C C GAC GAG C T GAG C GAG GAG C T G AAC G G C T G GAGAAT GAGAAC C AAG GAG G C C GAGAAC AAG GT G GAG C AC G C C AG C AG C GAGAAGAG C GAGAT G C T G GAGAGAAT C GT G C AC C T G GAGAC C GAGAT G GAGAAG C T GAG C AC C AG C GAGAT C G C C G C C GAC T AC T G C AG C AC C AAGAT GAC C GAGAGAAAGAAG GAGAT C GAG C T G G C C AAGT AC AGAGAG GAC T T C GAGAAC G C C G C CAT C GT G G G C C T G GAGAGAAT C AG C AAG GAGAT C AG C GAG C T GAC C AAGAAGAC C C T GAAG G C C AAGAT CAT C C C C AG C AAC AT C AG C AG CAT C C AG CTGGTGTGC GAC GAG C T GT G C AGAAGAC T GAG C AGAGAGAGAGAG C AG C AG C AC GAGT AC G C C AAG GT GAT GAGAGAC GT GAAC GAGAAGAT C GAGAAG C T G C AG C T G GAGAAG GAC G C C C T G GAG C AC GAG C T GAAG AT GAT GAG C AG C AAC AAC GAGAAC GTGCCCCCCGTGGG C AC C AG C GT GAG C G G CAT G C C C AC C AAGAC C AG C AAC CAGAAGTGCGCCCAGCCCCACTACACCAGCCCCACCAGACAGCTGCTGCACGAGAGCACCATGGCCGTGGACGCC AT C GT G C AGAAG C T GAAGAAGAC C C AC AAC AT GAG C G G CAT G G G C C C C GAG C T GAAG GAGAC CAT C G G C AAC GT G ATCAACGAGAGCAGAGTGCTGAGAGACTTCCTGCACCAGAAGCTGATCCTGTTCAAGGGCATCGACATGAGCAAC TGGAAGAACGAGACCGTGGACCAGCTGATCACCGACCTGGGCCAGCTGCACCAGGACAACCTGATGCTGGAGGAG CAGATCAAGAAGTACAAGAAGGAGCTGAAGCTGACCAAGAGCGCCATCCCCACCCTGGGCGTGGAGTTCCAGGAC AGAATCAAGACCGAGATCGGCAAGATCGCCACCGACATGGGCGGCGCCGTGAAGGAGATCAGAAAGAAGGGTACC GGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAA CGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGA ACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTG GTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGT TGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAA GTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAAC ACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTT TCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGC AATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGAT CGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGC GAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTG GAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAG TATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGT CTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAA ATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAAC TTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGC ATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAA CTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGT TTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTAT TACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 69)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTMEDNSVLNEDSNLEHVEGQPRRSMSQPVLNVEGDKRTSSTSATQQQVLSG AFSSADVRSIPIIQTWEENKALKTKITILRGELQMYQRRYSEAKEASQKRVKEVMDDYVDLKLGQENVQEKMEQY KLMEEDLLAMQSRIETSEDNFARQMKEFEAQKHAMEERIKELELSATDANNTTVGSFRGTLDDILKKNDPDFTLT SGYEERKINDLEAKLLSEIDKVAELEDHIQQLRQELDDQSARLADSENVRAQLEAATGQGILGAAGNAMVPNSTF MIGNGRESQTRDQLNYIDDLETKLADAKKENDKARQALVEYMNKCSKLEHEIRTMVKNSTFDSSSMLLGGQTSDE LKIQIGKVNGELNVLRAENRELRIRCDQLTGGDGNLSISLGQSRLMAGIATNDVDSIGQGNETGGTSMRILPRES QLDDLEESKLPLMDTSSAVRNQQQFASMWEDFESVKDSLQNNHNDTLEGSFNSSMPPPGRDATQSFLSQKSFKNS PIVMQKPKSLHLHLKSHQSEGAGEQIQNNSFSTKTASPHVSQSHIPILHDMQQILDSSAMFLEGQHDVAVNVEQM QEKMSQIREALARLFERLKSSAALFEEILERMGSSDPNADKIKKMKLAFETSINDKLNVSAILEAAEKDLHNMSL NFSILEKSIVSQAAEASRRFTIAPDAEDVASSSLLNASYSPLFKFTSNSDIVEKLQNEVSELKNELEMARTRDMR SPLNGSSGRLSDVQINTNRMFEDLEVSEATLQKAKEENSTLKSQFAELEANLHQVNSKLGEVRCELNEALARVDG EQETRVKAENALEEARQLISSLKHEENELKKTITDMGMRLNEAKKSDEFLKSELSTALEEEKKSQNLADELSEEL NGWRMRTKEAENKVEHASSEKSEMLERIVHLETEMEKLSTSEIAADYCSTKMTERKKEIELAKYREDFENAAIVG LERISKEISELTKKTLKAKIIPSNISSIQLVCDELCRRLSREREQQHEYAKVMRDVNEKIEKLQLEKDALEHELK MMSSNNENVPPVGTSVSGMPTKTSNQKCAQPHYTSPTRQLLHESTMAVDAIVQKLKKTHNMSGMGPELKETIGNV INESRVLRDFLHQKLILFKGIDMSNWKNETVDQLITDLGQLHQDNLMLEEQIKKYKKELKLTKSAIPTLGVEFQD RIKTEIGKIATDMGGAVKEIRKKGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTG TIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVK WSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKG NTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKL EREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIK IFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLE LSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 70)
KIF16B-SPD5-MCP
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAG CAT C C C AAAGAAG G C C C T TAT GT T GAG GAT T TAT C C AAAC AT T T AGT AC AGAAT TAT G GT GAC GT AGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TAT G C AAAT AGAG C C AAAAAC AT CAT C AAC AAG C C T AC CAT T AAT GAG GAT G C C AAC GT C AAAC T TAT C C GT GAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA AT G GAG GAC AAC AG C GT G C T GAAC GAG GAC AG C AAC C T G GAG C AC GT G GAG G G C C AG C C C AGAAGAAG CAT GAG C CAGCCCGTGCTGAACGTGGAGGGCGACAAGAGAACCAGCAGCACCAGCGCCACCCAGCAGCAGGTGCTGAGCGGC G C C T T C AG C AG C G C C GAC GT GAGAAG CAT C C C CAT CAT C C AGAC C T G G GAG GAGAAC AAG G C C C T GAAGAC C AAG AT C AC CAT C C T GAGAG G C GAG C T G C AGAT GT AC C AGAGAAGAT AC AG C GAG G C C AAG GAG G C C AG C C AGAAGAGA GTGAAGGAGGTGATGGACGACTACGTGGACCTGAAGCTGGGCCAGGAGAACGTGCAGGAGAAGATGGAGCAGTAC AAG C T GAT G GAG GAG GAC CTGCTGGC CAT G C AGAG C AGAAT C GAGAC C AG C GAG GAC AAC T T C G C C AGAC AGAT G AAG GAGT T C GAG G C C C AGAAG C AC G C CAT G GAG GAGAGAAT C AAG GAG C T G GAG C T GAG C G C C AC C GAC G C C AAC AACACCACCGTGGGCAGCTTCAGAGGCACCCTGGACGACATCCTGAAGAAGAACGACCCCGACTTCACCCTGACC AGCGGCTACGAGGAGAGAAAGATCAACGACCTGGAGGCCAAGCTGCTGAGCGAGATCGACAAGGTGGCCGAGCTG GAG GAC C AC AT C C AG C AG C T GAGAC AG GAG C T G GAC GAC C AGAG C G C C AGAC T G G C C GAC AG C GAGAAC GT GAGA GCCCAGCTGGAGGCCGCCACCGGCCAGGGCATCCTGGGCGCCGCCGGCAACGCCATGGTGCCCAACAGCACCTTC AT GAT C G G C AAC G G C AGAGAGAG C C AGAC C AGAGAC C AG C T GAAC T AC AT C GAC GAC C T G GAGAC C AAG C T G G C C GAC G C C AAGAAG GAGAAC GAC AAG G C C AGAC AG GCCCTGGTG GAGT AC AT GAAC AAGT G C AG C AAG C T G GAG C AC GAGAT C AGAAC CAT G GT GAAGAAC AG C AC C T T C GAC AG C AG C AG CAT GCTGCTGGGCGGC C AGAC C AG C GAC GAG C T GAAGAT C C AGAT C G G C AAG GT GAAC G G C GAG C T GAAC GT G C T GAGAG C C GAGAAC AGAGAG C T GAGAAT C AGA TGCGACCAGCTGACCGGCGGCGACGGCAACCTGAGCATCAGCCTGGGCCAGAGCAGACTGATGGCCGGCATCGCC AC C AAC GAC GT G GAC AG CAT C G G C C AG G G C AAC GAGAC C G G C G G C AC C AG CAT GAGAAT C C T G C C C AGAGAGAG C CAGCTGGACGACCTGGAGGAGAGCAAGCTGCCCCTGATGGACACCAGCAGCGCCGTGAGAAACCAGCAGCAGTTC G C C AG CAT GT G G GAG GAC T T C GAGAG C GT GAAG GAC AG C C T G C AGAAC AAC C AC AAC GAC AC C C T G GAG G G C AG C TTCAACAGCAGCATGCCCCCCCCCGGCAGAGACGCCACCCAGAGCTTCCTGAGCCAGAAGAGCTTCAAGAACAGC CCCATCGTGATGCAGAAGCCCAAGAGCCTGCACCTGCACCTGAAGAGCCACCAGAGCGAGGGCGCCGGCGAGCAG AT C C AGAAC AAC AG C T T C AG C AC C AAGAC C G C C AG C C C C C AC GT GAG C C AGAG C C AC AT C C C CAT C C T G C AC GAC ATGCAGCAGATCCTGGACAGCAGCGCCATGTTCCTGGAGGGCCAGCACGACGTGGCCGTGAACGTGGAGCAGATG C AG GAGAAGAT GAG C C AGAT C AGAGAG GCCCTGGC C AGAC T GT T C GAGAGAC T GAAGAG C AG CGCCGCCCTGTTC GAG GAGAT C C T G GAGAGAAT G G G C AG C AG C GAC C C C AAC G C C GAC AAGAT C AAGAAGAT GAAG CTGGCCTTC GAG ACCAGCATCAACGACAAGCTGAACGTGAGCGCCATCCTGGAGGCCGCCGAGAAGGACCTGCACAACATGAGCCTG AACTTCAGCATCCTGGAGAAGAGCATCGTGAGCCAGGCCGCCGAGGCCAGCAGAAGATTCACCATCGCCCCCGAC GCCGAGGACGTGGCCAGCAGCAGCCTGCTGAACGCCAGCTACAGCCCCCTGTTCAAGTTCACCAGCAACAGCGAC AT C GT G GAGAAG C T G C AGAAC GAG GT GAG C GAG C T GAAGAAC GAG C T G GAGAT G G C C AGAAC C AGAGAC AT GAGA AGCCCCCT GAAC G G C AG C AG C G G C AGAC T GAG C GAC GT G C AGAT C AAC AC C AAC AGAAT GT T C GAG GAC C T G GAG GTGAGCGAGGCCACCCTGCAGAAGGCCAAGGAGGAGAACAGCACCCTGAAGAGCCAGTTCGCCGAGCTGGAGGCC AACCTGCACCAGGTGAACAGCAAGCTGGGCGAGGTGAGATGCGAGCTGAACGAGGCCCTGGCCAGAGTGGACGGC GAG C AG GAGAC C AGAGT GAAG G C C GAGAAC G C C C T G GAG GAG G C C AGAC AG C T GAT C AG C AG C C T GAAG C AC GAG GAGAAC GAG C T GAAGAAGAC CAT C AC C GAC AT G G G CAT GAGAC T GAAC GAG G C C AAGAAGAG C GAC GAGT T C C T G AAGAG C GAG C T GAG C AC C G C C C T G GAG GAG GAGAAGAAGAG C C AGAAC C T G G C C GAC GAG C T GAG C GAG GAG C T G AAC G G C T G GAGAAT GAGAAC C AAG GAG G C C GAGAAC AAG GT G GAG C AC G C C AG C AG C GAGAAGAG C GAGAT G C T G GAGAGAAT C GT G C AC C T G GAGAC C GAGAT G GAGAAG C T GAG C AC C AG C GAGAT C G C C G C C GAC T AC T G C AG C AC C AAGAT GAC C GAGAGAAAGAAG GAGAT C GAG C T G G C C AAGT AC AGAGAG GAC T T C GAGAAC G C C G C CAT C GT G G G C C T G GAGAGAAT C AG C AAG GAGAT C AG C GAG C T GAC C AAGAAGAC C C T GAAG G C C AAGAT CAT C C C C AG C AAC AT C AG C AG CAT C C AG CTGGTGTGC GAC GAG C T GT G C AGAAGAC T GAG C AGAGAGAGAGAG C AG C AG C AC GAGT AC G C C AAG GT GAT GAGAGAC GT GAAC GAGAAGAT C GAGAAG C T G C AG C T G GAGAAG GAC G C C C T G GAG C AC GAG C T GAAG AT GAT GAG C AG C AAC AAC GAGAAC GTGCCCCCCGTGGG C AC C AG C GT GAG C G G CAT G C C C AC C AAGAC C AG C AAC CAGAAGTGCGCCCAGCCCCACTACACCAGCCCCACCAGACAGCTGCTGCACGAGAGCACCATGGCCGTGGACGCC AT C GT G C AGAAG C T GAAGAAGAC C C AC AAC AT GAG C G G CAT G G G C C C C GAG C T GAAG GAGAC CAT C G G C AAC GT G ATCAACGAGAGCAGAGTGCTGAGAGACTTCCTGCACCAGAAGCTGATCCTGTTCAAGGGCATCGACATGAGCAAC TGGAAGAACGAGACCGTGGACCAGCTGATCACCGACCTGGGCCAGCTGCACCAGGACAACCTGATGCTGGAGGAG CAGATCAAGAAGTACAAGAAGGAGCTGAAGCTGACCAAGAGCGCCATCCCCACCCTGGGCGTGGAGTTCCAGGAC AGAATCAAGACCGAGATCGGCAAGATCGCCACCGACATGGGCGGCGCCGTGAAGGAGATCAGAAAGAAGGGTACC GAGCAGAAGCTGATCTCAGAGGAGGACCTGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCTTCT AACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAAC GGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCT GCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTA ACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGA AACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 71)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTMEDNSVLNEDSNLEHVEGQPRRSMSQPVLNVEGDKRTSSTSATQQQVLSG AFSSADVRSIPIIQTWEENKALKTKITILRGELQMYQRRYSEAKEASQKRVKEVMDDYVDLKLGQENVQEKMEQY KLMEEDLLAMQSRIETSEDNFARQMKEFEAQKHAMEERIKELELSATDANNTTVGSFRGTLDDILKKNDPDFTLT SGYEERKINDLEAKLLSEIDKVAELEDHIQQLRQELDDQSARLADSENVRAQLEAATGQGILGAAGNAMVPNSTF MIGNGRESQTRDQLNYIDDLETKLADAKKENDKARQALVEYMNKCSKLEHEIRTMVKNSTFDSSSMLLGGQTSDE LKIQIGKVNGELNVLRAENRELRIRCDQLTGGDGNLSISLGQSRLMAGIATNDVDSIGQGNETGGTSMRILPRES QLDDLEESKLPLMDTSSAVRNQQQFASMWEDFESVKDSLQNNHNDTLEGSFNSSMPPPGRDATQSFLSQKSFKNS PIVMQKPKSLHLHLKSHQSEGAGEQIQNNSFSTKTASPHVSQSHIPILHDMQQILDSSAMFLEGQHDVAVNVEQM QEKMSQIREALARLFERLKSSAALFEEILERMGSSDPNADKIKKMKLAFETSINDKLNVSAILEAAEKDLHNMSL NFSILEKSIVSQAAEASRRFTIAPDAEDVASSSLLNASYSPLFKFTSNSDIVEKLQNEVSELKNELEMARTRDMR SPLNGSSGRLSDVQINTNRMFEDLEVSEATLQKAKEENSTLKSQFAELEANLHQVNSKLGEVRCELNEALARVDG EQETRVKAENALEEARQLISSLKHEENELKKTITDMGMRLNEAKKSDEFLKSELSTALEEEKKSQNLADELSEEL NGWRMRTKEAENKVEHASSEKSEMLERIVHLETEMEKLSTSEIAADYCSTKMTERKKEIELAKYREDFENAAIVG LERISKEISELTKKTLKAKIIPSNISSIQLVCDELCRRLSREREQQHEYAKVMRDVNEKIEKLQLEKDALEHELK MMSSNNENVPPVGTSVSGMPTKTSNQKCAQPHYTSPTRQLLHESTMAVDAIVQKLKKTHNMSGMGPELKETIGNV INESRVLRDFLHQKLILFKGIDMSNWKNETVDQLITDLGQLHQDNLMLEEQIKKYKKELKLTKSAIPTLGVEFQD RIKTEIGKIATDMGGAVKEIRKKGTEQKLISEEDLGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFAN GIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDG NPIPSAIAANSGIY (SEQ ID NO: 72)
KIF16B-SPD5-MCP-PylRSAF
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA ATGGAGGACAACAGCGTGCTGAACGAGGACAGCAACCTGGAGCACGTGGAGGGCCAGCCCAGAAGAAGCATGAGC CAGCCCGTGCTGAACGTGGAGGGCGACAAGAGAACCAGCAGCACCAGCGCCACCCAGCAGCAGGTGCTGAGCGGC GCCTTCAGCAGCGCCGACGTGAGAAGCATCCCCATCATCCAGACCTGGGAGGAGAACAAGGCCCTGAAGACCAAG ATCACCATCCTGAGAGGCGAGCTGCAGATGTACCAGAGAAGATACAGCGAGGCCAAGGAGGCCAGCCAGAAGAGA GTGAAGGAGGTGATGGACGACTACGTGGACCTGAAGCTGGGCCAGGAGAACGTGCAGGAGAAGATGGAGCAGTAC AAG C T GAT G GAG GAG GAC CTGCTGGC CAT G C AGAG C AGAAT C GAGAC C AG C GAG GAC AAC T T C G C C AGAC AGAT G AAG GAGT T C GAG G C C C AGAAG C AC G C CAT G GAG GAGAGAAT C AAG GAG C T G GAG C T GAG C G C C AC C GAC G C C AAC AACACCACCGTGGGCAGCTTCAGAGGCACCCTGGACGACATCCTGAAGAAGAACGACCCCGACTTCACCCTGACC AGCGGCTACGAGGAGAGAAAGATCAACGACCTGGAGGCCAAGCTGCTGAGCGAGATCGACAAGGTGGCCGAGCTG GAG GAC C AC AT C C AG C AG C T GAGAC AG GAG C T G GAC GAC C AGAG C G C C AGAC T G G C C GAC AG C GAGAAC GT GAGA GCCCAGCTGGAGGCCGCCACCGGCCAGGGCATCCTGGGCGCCGCCGGCAACGCCATGGTGCCCAACAGCACCTTC AT GAT C G G C AAC G G C AGAGAGAG C C AGAC C AGAGAC C AG C T GAAC T AC AT C GAC GAC C T G GAGAC C AAG C T G G C C GAC G C C AAGAAG GAGAAC GAC AAG G C C AGAC AG GCCCTGGTG GAGT AC AT GAAC AAGT G C AG C AAG C T G GAG C AC GAGAT C AGAAC CAT G GT GAAGAAC AG C AC C T T C GAC AG C AG C AG CAT GCTGCTGGGCGGC C AGAC C AG C GAC GAG C T GAAGAT C C AGAT C G G C AAG GT GAAC G G C GAG C T GAAC GT G C T GAGAG C C GAGAAC AGAGAG C T GAGAAT C AGA TGCGACCAGCTGACCGGCGGCGACGGCAACCTGAGCATCAGCCTGGGCCAGAGCAGACTGATGGCCGGCATCGCC AC C AAC GAC GT G GAC AG CAT C G G C C AG G G C AAC GAGAC C G G C G G C AC C AG CAT GAGAAT C C T G C C C AGAGAGAG C CAGCTGGACGACCTGGAGGAGAGCAAGCTGCCCCTGATGGACACCAGCAGCGCCGTGAGAAACCAGCAGCAGTTC G C C AG CAT GT G G GAG GAC T T C GAGAG C GT GAAG GAC AG C C T G C AGAAC AAC C AC AAC GAC AC C C T G GAG G G C AG C TTCAACAGCAGCATGCCCCCCCCCGGCAGAGACGCCACCCAGAGCTTCCTGAGCCAGAAGAGCTTCAAGAACAGC CCCATCGTGATGCAGAAGCCCAAGAGCCTGCACCTGCACCTGAAGAGCCACCAGAGCGAGGGCGCCGGCGAGCAG AT C C AGAAC AAC AG C T T C AG C AC C AAGAC C G C C AG C C C C C AC GT GAG C C AGAG C C AC AT C C C CAT C C T G C AC GAC ATGCAGCAGATCCTGGACAGCAGCGCCATGTTCCTGGAGGGCCAGCACGACGTGGCCGTGAACGTGGAGCAGATG C AG GAGAAGAT GAG C C AGAT C AGAGAG GCCCTGGC C AGAC T GT T C GAGAGAC T GAAGAG C AG CGCCGCCCTGTTC GAG GAGAT C C T G GAGAGAAT G G G C AG C AG C GAC C C C AAC G C C GAC AAGAT C AAGAAGAT GAAG CTGGCCTTC GAG ACCAGCATCAACGACAAGCTGAACGTGAGCGCCATCCTGGAGGCCGCCGAGAAGGACCTGCACAACATGAGCCTG AACTTCAGCATCCTGGAGAAGAGCATCGTGAGCCAGGCCGCCGAGGCCAGCAGAAGATTCACCATCGCCCCCGAC GCCGAGGACGTGGCCAGCAGCAGCCTGCTGAACGCCAGCTACAGCCCCCTGTTCAAGTTCACCAGCAACAGCGAC AT C GT G GAGAAG C T G C AGAAC GAG GT GAG C GAG C T GAAGAAC GAG C T G GAGAT G G C C AGAAC C AGAGAC AT GAGA AGCCCCCT GAAC G G C AG C AG C G G C AGAC T GAG C GAC GT G C AGAT C AAC AC C AAC AGAAT GT T C GAG GAC C T G GAG GTGAGCGAGGCCACCCTGCAGAAGGCCAAGGAGGAGAACAGCACCCTGAAGAGCCAGTTCGCCGAGCTGGAGGCC AACCTGCACCAGGTGAACAGCAAGCTGGGCGAGGTGAGATGCGAGCTGAACGAGGCCCTGGCCAGAGTGGACGGC GAG C AG GAGAC C AGAGT GAAG G C C GAGAAC G C C C T G GAG GAG G C C AGAC AG C T GAT C AG C AG C C T GAAG C AC GAG GAGAAC GAG C T GAAGAAGAC CAT C AC C GAC AT G G G CAT GAGAC T GAAC GAG G C C AAGAAGAG C GAC GAGT T C C T G AAGAG C GAG C T GAG C AC C G C C C T G GAG GAG GAGAAGAAGAG C C AGAAC C T G G C C GAC GAG C T GAG C GAG GAG C T G AAC G G C T G GAGAAT GAGAAC C AAG GAG G C C GAGAAC AAG GT G GAG C AC G C C AG C AG C GAGAAGAG C GAGAT G C T G GAGAGAAT C GT G C AC C T G GAGAC C GAGAT G GAGAAG C T GAG C AC C AG C GAGAT C G C C G C C GAC T AC T G C AG C AC C AAGAT GAC C GAGAGAAAGAAG GAGAT C GAG C T G G C C AAGT AC AGAGAG GAC T T C GAGAAC G C C G C CAT C GT G G G C C T G GAGAGAAT C AG C AAG GAGAT C AG C GAG C T GAC C AAGAAGAC C C T GAAG G C C AAGAT CAT C C C C AG C AAC AT C AG C AG CAT C C AG CTGGTGTGC GAC GAG C T GT G C AGAAGAC T GAG C AGAGAGAGAGAG C AG C AG C AC GAGT AC G C C AAG GT GAT GAGAGAC GT GAAC GAGAAGAT C GAGAAG C T G C AG C T G GAGAAG GAC G C C C T G GAG C AC GAG C T GAAG AT GAT GAG C AG C AAC AAC GAGAAC GTGCCCCCCGTGGG C AC C AG C GT GAG C G G CAT G C C C AC C AAGAC C AG C AAC CAGAAGTGCGCCCAGCCCCACTACACCAGCCCCACCAGACAGCTGCTGCACGAGAGCACCATGGCCGTGGACGCC AT C GT G C AGAAG C T GAAGAAGAC C C AC AAC AT GAG C G G CAT G G G C C C C GAG C T GAAG GAGAC CAT C G G C AAC GT G AT C AAC GAGAG C AGAGT G C T GAGAGAC T T C C T G C AC C AGAAG C T GAT C C T GT T C AAG G G CAT C GAC AT GAG C AAC TGGAAGAACGAGACCGTGGACCAGCTGATCACCGACCTGGGCCAGCTGCACCAGGACAACCTGATGCTGGAGGAG C AGAT CAAGAAGT AC AAGAAG GAG C T GAAG C T GAC C AAGAG C G C CAT C C C C AC CCTGGGCGTG GAGT T C C AG GAC AGAAT C AAGAC C GAGAT C G G C AAGAT C G C C AC C GAC AT GGGCGGCGCCGT GAAG GAGAT C AGAAAGAAG G GT AC C GAGCAGAAGCTGATCTCAGAGGAGGACCTGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCTTCT AACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAAC GGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCT GCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTA ACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGA AACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCC GCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCG CTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAG GTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGT ACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAAC AAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAA AAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCT GGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGC AT TAG C AGT AT TAG C AC C G GT G C C AC CGCTAGCGCCCTGGT T AAAG G C AAT AC C AAT C C GAT T AC AAG CAT GT C T GCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAA GACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGAC CTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTG GATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAAT GATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTG GCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGT AAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACT CGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGC TGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATC CCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTA AAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA
(SEQ ID NO : 73 )
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTMEDNSVLNEDSNLEHVEGQPRRSMSQPVLNVEGDKRTSSTSATQQQVLSG AFSSADVRSIPIIQTWEENKALKTKITILRGELQMYQRRYSEAKEASQKRVKEVMDDYVDLKLGQENVQEKMEQY KLMEEDLLAMQSRIETSEDNFARQMKEFEAQKHAMEERIKELELSATDANNTTVGSFRGTLDDILKKNDPDFTLT SGYEERKINDLEAKLLSEIDKVAELEDHIQQLRQELDDQSARLADSENVRAQLEAATGQGILGAAGNAMVPNSTF MIGNGRESQTRDQLNYIDDLETKLADAKKENDKARQALVEYMNKCSKLEHEIRTMVKNSTFDSSSMLLGGQTSDE LKIQIGKVNGELNVLRAENRELRIRCDQLTGGDGNLSISLGQSRLMAGIATNDVDSIGQGNETGGTSMRILPRES QLDDLEESKLPLMDTSSAVRNQQQFASMWEDFESVKDSLQNNHNDTLEGSFNSSMPPPGRDATQSFLSQKSFKNS PIVMQKPKSLHLHLKSHQSEGAGEQIQNNSFSTKTASPHVSQSHIPILHDMQQILDSSAMFLEGQHDVAVNVEQM QEKMSQIREALARLFERLKSSAALFEEILERMGSSDPNADKIKKMKLAFETSINDKLNVSAILEAAEKDLHNMSL NFSILEKSIVSQAAEASRRFTIAPDAEDVASSSLLNASYSPLFKFTSNSDIVEKLQNEVSELKNELEMARTRDMR SPLNGSSGRLSDVQINTNRMFEDLEVSEATLQKAKEENSTLKSQFAELEANLHQVNSKLGEVRCELNEALARVDG EQETRVKAENALEEARQLISSLKHEENELKKTITDMGMRLNEAKKSDEFLKSELSTALEEEKKSQNLADELSEEL NGWRMRTKEAENKVEHASSEKSEMLERIVHLETEMEKLSTSEIAADYCSTKMTERKKEIELAKYREDFENAAIVG LERISKEISELTKKTLKAKIIPSNISSIQLVCDELCRRLSREREQQHEYAKVMRDVNEKIEKLQLEKDALEHELK MMSSNNENVPPVGTSVSGMPTKTSNQKCAQPHYTSPTRQLLHESTMAVDAIVQKLKKTHNMSGMGPELKETIGNV INESRVLRDFLHQKLILFKGIDMSNWKNETVDQLITDLGQLHQDNLMLEEQIKKYKKELKLTKSAIPTLGVEFQD RIKTEIGKIATDMGGAVKEIRKKGTEQKLISEEDLGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFAN GIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDG NPIPSAIAANSGIYGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHE VSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTK KAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMS APVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFV DRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYR KESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPI PLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 74)
KIF16B-SPD5-4xAN22-PylRSAF
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TAT G C AAAT AGAG C C AAAAAC AT CAT C AAC AAG C C T AC CAT T AAT GAG GAT G C C AAC GT C AAAC T TAT C C GT GAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA AT G GAG GAC AAC AG C GT G C T GAAC GAG GAC AG C AAC C T G GAG C AC GT G GAG G G C C AG C C C AGAAGAAG CAT GAG C CAGCCCGTGCTGAACGTGGAGGGCGACAAGAGAACCAGCAGCACCAGCGCCACCCAGCAGCAGGTGCTGAGCGGC G C C T T C AG C AG C G C C GAC GT GAGAAG CAT C C C CAT CAT C C AGAC C T G G GAG GAGAAC AAG G C C C T GAAGAC C AAG AT C AC CAT C C T GAGAG G C GAG C T G C AGAT GT AC C AGAGAAGAT AC AG C GAG G C C AAG GAG G C C AG C C AGAAGAGA GTGAAGGAGGTGATGGACGACTACGTGGACCTGAAGCTGGGCCAGGAGAACGTGCAGGAGAAGATGGAGCAGTAC AAG C T GAT G GAG GAG GAC CTGCTGGC CAT G C AGAG C AGAAT C GAGAC C AG C GAG GAC AAC T T C G C C AGAC AGAT G AAG GAGT T C GAG G C C C AGAAG C AC G C CAT G GAG GAGAGAAT C AAG GAG C T G GAG C T GAG C G C C AC C GAC G C C AAC AACACCACCGTGGGCAGCTTCAGAGGCACCCTGGACGACATCCTGAAGAAGAACGACCCCGACTTCACCCTGACC AGCGGCTACGAGGAGAGAAAGATCAACGACCTGGAGGCCAAGCTGCTGAGCGAGATCGACAAGGTGGCCGAGCTG GAG GAC C AC AT C C AG C AG C T GAGAC AG GAG C T G GAC GAC C AGAG C G C C AGAC T G G C C GAC AG C GAGAAC GT GAGA GCCCAGCTGGAGGCCGCCACCGGCCAGGGCATCCTGGGCGCCGCCGGCAACGCCATGGTGCCCAACAGCACCTTC AT GAT C G G C AAC G G C AGAGAGAG C C AGAC C AGAGAC C AG C T GAAC T AC AT C GAC GAC C T G GAGAC C AAG C T G G C C GAC G C C AAGAAG GAGAAC GAC AAG G C C AGAC AG GCCCTGGTG GAGT AC AT GAAC AAGT G C AG C AAG C T G GAG C AC GAGAT C AGAAC CAT G GT GAAGAAC AG C AC C T T C GAC AG C AG C AG CAT GCTGCTGGGCGGC C AGAC C AG C GAC GAG C T GAAGAT C C AGAT C G G C AAG GT GAAC G G C GAG C T GAAC GT G C T GAGAG C C GAGAAC AGAGAG C T GAGAAT C AGA TGCGACCAGCTGACCGGCGGCGACGGCAACCTGAGCATCAGCCTGGGCCAGAGCAGACTGATGGCCGGCATCGCC AC C AAC GAC GT G GAC AG CAT C G G C C AG G G C AAC GAGAC C G G C G G C AC C AG CAT GAGAAT C C T G C C C AGAGAGAG C CAGCTGGACGACCTGGAGGAGAGCAAGCTGCCCCTGATGGACACCAGCAGCGCCGTGAGAAACCAGCAGCAGTTC G C C AG CAT GT G G GAG GAC T T C GAGAG C GT GAAG GAC AG C C T G C AGAAC AAC C AC AAC GAC AC C C T G GAG G G C AG C TTCAACAGCAGCATGCCCCCCCCCGGCAGAGACGCCACCCAGAGCTTCCTGAGCCAGAAGAGCTTCAAGAACAGC CCCATCGTGATGCAGAAGCCCAAGAGCCTGCACCTGCACCTGAAGAGCCACCAGAGCGAGGGCGCCGGCGAGCAG AT C C AGAAC AAC AG C T T C AG C AC C AAGAC C G C C AG C C C C C AC GT GAG C C AGAG C C AC AT C C C CAT C C T G C AC GAC ATGCAGCAGATCCTGGACAGCAGCGCCATGTTCCTGGAGGGCCAGCACGACGTGGCCGTGAACGTGGAGCAGATG C AG GAGAAGAT GAG C C AGAT C AGAGAG GCCCTGGC C AGAC T GT T C GAGAGAC T GAAGAG C AG CGCCGCCCTGTTC GAG GAGAT C C T G GAGAGAAT G G G C AG C AG C GAC C C C AAC G C C GAC AAGAT C AAGAAGAT GAAG CTGGCCTTC GAG ACCAGCATCAACGACAAGCTGAACGTGAGCGCCATCCTGGAGGCCGCCGAGAAGGACCTGCACAACATGAGCCTG AACTTCAGCATCCTGGAGAAGAGCATCGTGAGCCAGGCCGCCGAGGCCAGCAGAAGATTCACCATCGCCCCCGAC GCCGAGGACGTGGCCAGCAGCAGCCTGCTGAACGCCAGCTACAGCCCCCTGTTCAAGTTCACCAGCAACAGCGAC AT C GT G GAGAAG C T G C AGAAC GAG GT GAG C GAG C T GAAGAAC GAG C T G GAGAT G G C C AGAAC C AGAGAC AT GAGA AGCCCCCT GAAC G G C AG C AG C G G C AGAC T GAG C GAC GT G C AGAT C AAC AC C AAC AGAAT GT T C GAG GAC C T G GAG GTGAGCGAGGCCACCCTGCAGAAGGCCAAGGAGGAGAACAGCACCCTGAAGAGCCAGTTCGCCGAGCTGGAGGCC AACCTGCACCAGGTGAACAGCAAGCTGGGCGAGGTGAGATGCGAGCTGAACGAGGCCCTGGCCAGAGTGGACGGC GAG C AG GAGAC C AGAGT GAAG G C C GAGAAC G C C C T G GAG GAG G C C AGAC AG C T GAT C AG C AG C C T GAAG C AC GAG GAGAAC GAG C T GAAGAAGAC CAT C AC C GAC AT G G G CAT GAGAC T GAAC GAG G C C AAGAAGAG C GAC GAGT T C C T G AAGAG C GAG C T GAG C AC C G C C C T G GAG GAG GAGAAGAAGAG C C AGAAC C T G G C C GAC GAG C T GAG C GAG GAG C T G AAC G G C T G GAGAAT GAGAAC C AAG GAG G C C GAGAAC AAG GT G GAG C AC G C C AG C AG C GAGAAGAG C GAGAT G C T G GAGAGAAT C GT G C AC C T G GAGAC C GAGAT G GAGAAG C T GAG C AC C AG C GAGAT C G C C G C C GAC T AC T G C AG C AC C AAGAT GAC C GAGAGAAAGAAG GAGAT C GAG C T G G C C AAGT AC AGAGAG GAC T T C GAGAAC G C C G C CAT C GT G G G C C T G GAGAGAAT C AG C AAG GAGAT C AG C GAG C T GAC C AAGAAGAC C C T GAAG G C C AAGAT CAT C C C C AG C AAC AT C AG C AG CAT C C AG CTGGTGTGC GAC GAG C T GT G C AGAAGAC T GAG C AGAGAGAGAGAG C AG C AG C AC GAGT AC G C C AAG GT GAT GAGAGAC GT GAAC GAGAAGAT C GAGAAG C T G C AG C T G GAGAAG GAC G C C C T G GAG C AC GAG C T GAAG AT GAT GAG C AG C AAC AAC GAGAAC GTGCCCCCCGTGGG C AC C AG C GT GAG C G G CAT G C C C AC C AAGAC C AG C AAC CAGAAGTGCGCCCAGCCCCACTACACCAGCCCCACCAGACAGCTGCTGCACGAGAGCACCATGGCCGTGGACGCC AT C GT G C AGAAG C T GAAGAAGAC C C AC AAC AT GAG C G G CAT G G G C C C C GAG C T GAAG GAGAC CAT C G G C AAC GT G AT C AAC GAGAG C AGAGT G C T GAGAGAC T T C C T G C AC C AGAAG C T GAT C C T GT T C AAG G G CAT C GAC AT GAG C AAC TGGAAGAACGAGACCGTGGACCAGCTGATCACCGACCTGGGCCAGCTGCACCAGGACAACCTGATGCTGGAGGAG C AGAT CAAGAAGT AC AAGAAG GAG C T GAAG C T GAC C AAGAG C G C CAT C C C C AC CCTGGGCGTG GAGT T C C AG GAC AGAAT C AAGAC C GAGAT C G G C AAGAT C G C C AC C GAC AT GGGCGGCGCCGT GAAG GAGAT C AGAAAGAAGT C C G GA TATCCCTATGATGTGCCGGATTATGCTTCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAA CAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGC GGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCA AACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCA CAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCC GGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGT CGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGGTACCGGCGCCCCCGGCTCCGCCGGCTCC GCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAA CCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCAC GAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCT CGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTG AACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACT AAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCG TCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACC AGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATG TCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCG AAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAA GACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTC GTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGAC AATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAAT CTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTAT CGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGT ACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGAC AGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCA ATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAA GTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTG TAA (SEQ ID NO : 75 )
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTMEDNSVLNEDSNLEHVEGQPRRSMSQPVLNVEGDKRTSSTSATQQQVLSG AFSSADVRSIPIIQTWEENKALKTKITILRGELQMYQRRYSEAKEASQKRVKEVMDDYVDLKLGQENVQEKMEQY KLMEEDLLAMQSRIETSEDNFARQMKEFEAQKHAMEERIKELELSATDANNTTVGSFRGTLDDILKKNDPDFTLT SGYEERKINDLEAKLLSEIDKVAELEDHIQQLRQELDDQSARLADSENVRAQLEAATGQGILGAAGNAMVPNSTF MIGNGRESQTRDQLNYIDDLETKLADAKKENDKARQALVEYMNKCSKLEHEIRTMVKNSTFDSSSMLLGGQTSDE LKIQIGKVNGELNVLRAENRELRIRCDQLTGGDGNLSISLGQSRLMAGIATNDVDSIGQGNETGGTSMRILPRES QLDDLEESKLPLMDTSSAVRNQQQFASMWEDFESVKDSLQNNHNDTLEGSFNSSMPPPGRDATQSFLSQKSFKNS PIVMQKPKSLHLHLKSHQSEGAGEQIQNNSFSTKTASPHVSQSHIPILHDMQQILDSSAMFLEGQHDVAVNVEQM QEKMSQIREALARLFERLKSSAALFEEILERMGSSDPNADKIKKMKLAFETSINDKLNVSAILEAAEKDLHNMSL NFSILEKSIVSQAAEASRRFTIAPDAEDVASSSLLNASYSPLFKFTSNSDIVEKLQNEVSELKNELEMARTRDMR SPLNGSSGRLSDVQINTNRMFEDLEVSEATLQKAKEENSTLKSQFAELEANLHQVNSKLGEVRCELNEALARVDG EQETRVKAENALEEARQLISSLKHEENELKKTITDMGMRLNEAKKSDEFLKSELSTALEEEKKSQNLADELSEEL NGWRMRTKEAENKVEHASSEKSEMLERIVHLETEMEKLSTSEIAADYCSTKMTERKKEIELAKYREDFENAAIVG LERISKEISELTKKTLKAKIIPSNISSIQLVCDELCRRLSREREQQHEYAKVMRDVNEKIEKLQLEKDALEHELK MMSSNNENVPPVGTSVSGMPTKTSNQKCAQPHYTSPTRQLLHESTMAVDAIVQKLKKTHNMSGMGPELKETIGNV INESRVLRDFLHQKLILFKGIDMSNWKNETVDQLITDLGQLHQDNLMLEEQIKKYKKELKLTKSAIPTLGVEFQD RIKTEIGKIATDMGGAVKEIRKKSGYPYDVPDYASTMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAG GLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGA GAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKK PLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDL NKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVST SISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKK DLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPN LANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGD SCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL
(SEQ ID NO : 76 )
KIF13A- FUS-PylRSAF
DNA:
ATGTCGGATACCAAGGTAAAAGTTGCCGTCCGGGTCCGGCCCATGAACCGACGAGAACTGGAACTGAACACCAAG
TGCGTGGTGGAGATGGAAGGGAATCAAACGGTCCTGCACCCTCCTCCTTCTAACACCAAACAGGGAGAAAGGAAA CCTCCCAAGGTATTTGCCTTTGATTATTGCTTTTGGTCCATGGATGAATCTAACACTACAAAATACGCTGGTCAA GAAGTGGTTTTCAAGTGCCTTGGGGAAGGAATTCTTGAAAAAGCCTTTCAGGGGTATAATGCGTGTATTTTTGCA TATGGACAGACAGGTTCGGGAAAATCCTTTTCCATGATGGGCCATGCTGAGCAGCTGGGCCTTATTCCAAGGCTC TGCTGTGCTTTATTTAAAAGGATCTCTTTGGAGCAAAATGAGTCACAGACCTTTAAAGTTGAAGTGTCCTATATG GAAATTTATAATGAGAAAGTTCGGGATCTTTTAGACCCCAAAGGGAGTAGACAGTCTCTTAAAGTTCGAGAACAT AAAGTTTTGGGACCATATGTAGATGGTTTATCTCAACTAGCTGTCACTAGTTTTGAGGATATTGAGTCATTGATG TCTGAGGGAAATAAGTCTCGAACGGTAGCTGCTACCAACATGAACGAAGAAAGCAGCCGCTCCCATGCTGTGTTC AACATCATAATCACACAGACACTTTATGACCTGCAGTCTGGGAATTCCGGGGAGAAAGTCAGTAAGGTCAGCTTG GTAGACCTGGCGGGTAGCGAAAGAGTATCTAAAACAGGAGCTGCAGGAGAGCGACTGAAAGAAGGCAGCAACATT AACAAATCGCTTACAACCTTGGGGTTGGTTATATCATCACTGGCTGACCAGGCAGCTGGCAAGGGTAAAAGCAAA TTTGTGCCTTATCGAGATTCAGTCCTCACTTGGCTGCTTAAGGACAACTTGGGGGGCAACAGCCAAACCTCTATG ATAGCCACAATCAGCCCAGCCGCAGACAACTATGAAGAGACCCTCTCCACATTAAGATATGCAGACCGAGCCAAA AGGATTGTGAACCATGCTGTTGTGAATGAGGACCCCAACGCAAAAGTGATCCGAGAACTGCGGGAGGAAGTCGAG AAACTGAGAGAGCAGCTCTCTCAGGCAGAGGCCATGAAGGCCGAACTGAAGGAGAAGCTCGAAGAGTCTGAAAAG CTGATAAAAGAACTAACAGTGACTTGGGAATATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGC TCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCC TACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTAT AGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTAT GGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTAC GGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGT TCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAA AGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGT GGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGT GGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGC GGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCC AGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGT GGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGC CTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAA ACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTT GATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTC TCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCC ATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGC GGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGG AATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGT AACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGATTACAAGGATGACGACGATAAGGGTACCGGCGCCCCC GGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACC CTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGC GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTG GAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGT ATTTCTACTAACCTGTAA (SEQ ID NO: 77)
Protein :
MSDTKVKVAVRVRPMNRRELELNTKCWEMEGNQTVLHPPPSNTKQGERKPPKVFAFDYCFWSMDESNTTKYAGQ
EWFKCLGEGILEKAFQGYNACIFAYGQTGSGKSFSMMGHAEQLGLIPRLCCALFKRISLEQNESQTFKVEVSYM EIYNEKVRDLLDPKGSRQSLKVREHKVLGPYVDGLSQLAVTSFEDIESLMSEGNKSRTVAATNMNEESSRSHAVF NIIITQTLYDLQSGNSGEKVSKVSLVDLAGSERVSKTGAAGERLKEGSNINKSLTTLGLVISSLADQAAGKGKSK FVPYRDSVLTWLLKDNLGGNSQTSMIATISPAADNYEETLSTLRYADRAKRIWHAVWEDPNAKVIRELREEVE KLREQLSQAEAMKAELKEKLEESEKLIKELTVTWEYTDIEMNRLGKGAPGSAGSAAGSGMASNDYTQQATQSYGA YPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSY GQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSG GGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEP RGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKK TGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGP MGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGG NYGDDRRGGRGGDYKDDDDKGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIH KIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWS APTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTN PITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLERE ITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFE IGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSS AWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 78)
Figure imgf000130_0001
ATGTCGGATACCAAGGTAAAAGTTGCCGTCCGGGTCCGGCCCATGAACCGACGAGAACTGGAACTGAACACCAAG TGCGTGGTGGAGATGGAAGGGAATCAAACGGTCCTGCACCCTCCTCCTTCTAACACCAAACAGGGAGAAAGGAAA CCTCCCAAGGTATTTGCCTTTGATTATTGCTTTTGGTCCATGGATGAATCTAACACTACAAAATACGCTGGTCAA GAAGTGGTTTTCAAGTGCCTTGGGGAAGGAATTCTTGAAAAAGCCTTTCAGGGGTATAATGCGTGTATTTTTGCA TATGGACAGACAGGTTCGGGAAAATCCTTTTCCATGATGGGCCATGCTGAGCAGCTGGGCCTTATTCCAAGGCTC TGCTGTGCTTTATTTAAAAGGATCTCTTTGGAGCAAAATGAGTCACAGACCTTTAAAGTTGAAGTGTCCTATATG GAAATTTATAATGAGAAAGTTCGGGATCTTTTAGACCCCAAAGGGAGTAGACAGTCTCTTAAAGTTCGAGAACAT AAAGTTTTGGGACCATATGTAGATGGTTTATCTCAACTAGCTGTCACTAGTTTTGAGGATATTGAGTCATTGATG TCTGAGGGAAATAAGTCTCGAACGGTAGCTGCTACCAACATGAACGAAGAAAGCAGCCGCTCCCATGCTGTGTTC AACATCATAATCACACAGACACTTTATGACCTGCAGTCTGGGAATTCCGGGGAGAAAGTCAGTAAGGTCAGCTTG GTAGACCTGGCGGGTAGCGAAAGAGTATCTAAAACAGGAGCTGCAGGAGAGCGACTGAAAGAAGGCAGCAACATT AACAAATCGCTTACAACCTTGGGGTTGGTTATATCATCACTGGCTGACCAGGCAGCTGGCAAGGGTAAAAGCAAA TTTGTGCCTTATCGAGATTCAGTCCTCACTTGGCTGCTTAAGGACAACTTGGGGGGCAACAGCCAAACCTCTATG ATAGCCACAATCAGCCCAGCCGCAGACAACTATGAAGAGACCCTCTCCACATTAAGATATGCAGACCGAGCCAAA AGGATTGTGAACCATGCTGTTGTGAATGAGGACCCCAACGCAAAAGTGATCCGAGAACTGCGGGAGGAAGTCGAG AAACTGAGAGAGCAGCTCTCTCAGGCAGAGGCCATGAAGGCCGAACTGAAGGAGAAGCTCGAAGAGTCTGAAAAG CTGATAAAAGAACTAACAGTGACTTGGGAATATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGC TCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCC TACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTAT AGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTAT GGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTAC GGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGT TCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAA AGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGT GGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGT GGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGC GGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCC AGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGT GGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGC CTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAA ACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTT GATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTC TCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCC ATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGC GGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGG AATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGT AACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGC ATGGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACC CTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGT
TCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGT
GCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTG
ACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATG
CCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAA
TTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGT
ATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTT
CAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATC
AGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAA
ATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGC
TTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAA
CTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTAT
CTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCC
GACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAAC
CTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTG
TATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGAC
CGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGAC
TTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA ( SEQ ID
NO: 79)
Protein :
MSDTKVKVAVRVRPMNRRELELNTKCWEMEGNQTVLHPPPSNTKQGERKPPKVFAFDYCFWSMDESNTTKYAGQ EWFKCLGEGILEKAFQGYNACIFAYGQTGSGKSFSMMGHAEQLGLIPRLCCALFKRISLEQNESQTFKVEVSYM EIYNEKVRDLLDPKGSRQSLKVREHKVLGPYVDGLSQLAVTSFEDIESLMSEGNKSRTVAATNMNEESSRSHAVF NIIITQTLYDLQSGNSGEKVSKVSLVDLAGSERVSKTGAAGERLKEGSNINKSLTTLGLVISSLADQAAGKGKSK FVPYRDSVLTWLLKDNLGGNSQTSMIATISPAADNYEETLSTLRYADRAKRIVNHAWNEDPNAKVIRELREEVE KLREQLSQAEAMKAELKEKLEESEKLIKELTVTWEYTDIEMNRLGKGAPGSAGSAAGSGMASNDYTQQATQSYGA YPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSY GQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSG GGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEP RGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKK TGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGP MGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGG NYGDDRRGGRGGGAPGSAGSAAGSGMACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSR SKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAM PKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPV QASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRG FLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKES DGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLD REWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 80)
Figure imgf000131_0001
ATGTCGGATACCAAGGTAAAAGTTGCCGTCCGGGTCCGGCCCATGAACCGACGAGAACTGGAACTGAACACCAAG TGCGTGGTGGAGATGGAAGGGAATCAAACGGTCCTGCACCCTCCTCCTTCTAACACCAAACAGGGAGAAAGGAAA CCTCCCAAGGTATTTGCCTTTGATTATTGCTTTTGGTCCATGGATGAATCTAACACTACAAAATACGCTGGTCAA GAAGTGGTTTTCAAGTGCCTTGGGGAAGGAATTCTTGAAAAAGCCTTTCAGGGGTATAATGCGTGTATTTTTGCA TATGGACAGACAGGTTCGGGAAAATCCTTTTCCATGATGGGCCATGCTGAGCAGCTGGGCCTTATTCCAAGGCTC TGCTGTGCTTTATTTAAAAGGATCTCTTTGGAGCAAAATGAGTCACAGACCTTTAAAGTTGAAGTGTCCTATATG GAAATTTATAATGAGAAAGTTCGGGATCTTTTAGACCCCAAAGGGAGTAGACAGTCTCTTAAAGTTCGAGAACAT AAAGTTTTGGGACCATATGTAGATGGTTTATCTCAACTAGCTGTCACTAGTTTTGAGGATATTGAGTCATTGATG TCTGAGGGAAATAAGTCTCGAACGGTAGCTGCTACCAACATGAACGAAGAAAGCAGCCGCTCCCATGCTGTGTTC AACATCATAATCACACAGACACTTTATGACCTGCAGTCTGGGAATTCCGGGGAGAAAGTCAGTAAGGTCAGCTTG GTAGACCTGGCGGGTAGCGAAAGAGTATCTAAAACAGGAGCTGCAGGAGAGCGACTGAAAGAAGGCAGCAACATT AACAAATCGCTTACAACCTTGGGGTTGGTTATATCATCACTGGCTGACCAGGCAGCTGGCAAGGGTAAAAGCAAA TTTGTGCCTTATCGAGATTCAGTCCTCACTTGGCTGCTTAAGGACAACTTGGGGGGCAACAGCCAAACCTCTATG ATAGCCACAATCAGCCCAGCCGCAGACAACTATGAAGAGACCCTCTCCACATTAAGATATGCAGACCGAGCCAAA AGGATTGTGAACCATGCTGTTGTGAATGAGGACCCCAACGCAAAAGTGATCCGAGAACTGCGGGAGGAAGTCGAG AAACTGAGAGAGCAGCTCTCTCAGGCAGAGGCCATGAAGGCCGAACTGAAGGAGAAGCTCGAAGAGTCTGAAAAG CTGATAAAAGAACTAACAGTGACTTGGGAATATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGC TCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCC TACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTAT AGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTAT GGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTAC GGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGT TCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAA AGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGT GGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGT GGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGC GGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCC AGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGT GGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGC CTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAA ACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTT GATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTC TCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCC ATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGC GGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGG AATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGT AACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGATTACAAGGATGACGACGATAAGGGTACCGGCGCCCCC GGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACC CTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGC GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTG GAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGT ATTTCTACTAACCTGTAA (SEQ ID NO: 81)
Protein :
MSDTKVKVAVRVRPMNRRELELNTKCWEMEGNQTVLHPPPSNTKQGERKPPKVFAFDYCFWSMDESNTTKYAGQ EWFKCLGEGILEKAFQGYNACIFAYGQTGSGKSFSMMGHAEQLGLIPRLCCALFKRISLEQNESQTFKVEVSYM EIYNEKVRDLLDPKGSRQSLKVREHKVLGPYVDGLSQLAVTSFEDIESLMSEGNKSRTVAATNMNEESSRSHAVF NIIITQTLYDLQSGNSGEKVSKVSLVDLAGSERVSKTGAAGERLKEGSNINKSLTTLGLVISSLADQAAGKGKSK FVPYRDSVLTWLLKDNLGGNSQTSMIATISPAADNYEETLSTLRYADRAKRIVNHAWNEDPNAKVIRELREEVE KLREQLSQAEAMKAELKEKLEESEKLIKELTVTWEYTDIEMNRLGKGAPGSAGSAAGSGMASNDYTQQATQSYGA YPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSY GQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSG GGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEP RGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKK TGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGP MGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGG NYGDDRRGGRGGDYKDDDDKGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIH KIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWS APTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTN PITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLERE ITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFE IGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSS AWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 82)
KIF13A-EWSR1-MCP
DNA:
ATGTCGGATACCAAGGTAAAAGTTGCCGTCCGGGTCCGGCCCATGAACCGACGAGAACTGGAACTGAACACCAAG TGCGTGGTGGAGATGGAAGGGAATCAAACGGTCCTGCACCCTCCTCCTTCTAACACCAAACAGGGAGAAAGGAAA CCTCCCAAGGTATTTGCCTTTGATTATTGCTTTTGGTCCATGGATGAATCTAACACTACAAAATACGCTGGTCAA GAAGTGGTTTTCAAGTGCCTTGGGGAAGGAATTCTTGAAAAAGCCTTTCAGGGGTATAATGCGTGTATTTTTGCA TATGGACAGACAGGTTCGGGAAAATCCTTTTCCATGATGGGCCATGCTGAGCAGCTGGGCCTTATTCCAAGGCTC TGCTGTGCTTTATTTAAAAGGATCTCTTTGGAGCAAAATGAGTCACAGACCTTTAAAGTTGAAGTGTCCTATATG GAAATTTATAATGAGAAAGTTCGGGATCTTTTAGACCCCAAAGGGAGTAGACAGTCTCTTAAAGTTCGAGAACAT AAAGTTTTGGGACCATATGTAGATGGTTTATCTCAACTAGCTGTCACTAGTTTTGAGGATATTGAGTCATTGATG TCTGAGGGAAATAAGTCTCGAACGGTAGCTGCTACCAACATGAACGAAGAAAGCAGCCGCTCCCATGCTGTGTTC AACATCATAATCACACAGACACTTTATGACCTGCAGTCTGGGAATTCCGGGGAGAAAGTCAGTAAGGTCAGCTTG GTAGACCTGGCGGGTAGCGAAAGAGTATCTAAAACAGGAGCTGCAGGAGAGCGACTGAAAGAAGGCAGCAACATT AACAAATCGCTTACAACCTTGGGGTTGGTTATATCATCACTGGCTGACCAGGCAGCTGGCAAGGGTAAAAGCAAA TTTGTGCCTTATCGAGATTCAGTCCTCACTTGGCTGCTTAAGGACAACTTGGGGGGCAACAGCCAAACCTCTATG ATAGCCACAATCAGCCCAGCCGCAGACAACTATGAAGAGACCCTCTCCACATTAAGATATGCAGACCGAGCCAAA AGGATTGTGAACCATGCTGTTGTGAATGAGGACCCCAACGCAAAAGTGATCCGAGAACTGCGGGAGGAAGTCGAG AAACTGAGAGAGCAGCTCTCTCAGGCAGAGGCCATGAAGGCCGAACTGAAGGAGAAGCTCGAAGAGTCTGAAAAG CTGATAAAAGAACTAACAGTGACTTGGGAAATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAG CAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGC TATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCC TATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTC CAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAG TCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAG GATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATAT GGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTAC CCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGG CAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTAC CCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGT TCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAG AACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGT GGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATG GATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTA CAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAAC AAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTG TCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTT AAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGC ATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGT GGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTC CAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGC AACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGA GGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGT GGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGA GGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGATTACAAGGATGACGACGATAAGGGTACCGAGCAG AAGCTGATCTCAGAGGAGGACCTGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCTTCTAACTTT ACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATC GCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAG AATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATT CCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCG ATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 83)
Protein :
MSDTKVKVAVRVRPMNRRELELNTKCWEMEGNQTVLHPPPSNTKQGERKPPKVFAFDYCFWSMDESNTTKYAGQ
EWFKCLGEGILEKAFQGYNACIFAYGQTGSGKSFSMMGHAEQLGLIPRLCCALFKRISLEQNESQTFKVEVSYM
EIYNEKVRDLLDPKGSRQSLKVREHKVLGPYVDGLSQLAVTSFEDIESLMSEGNKSRTVAATNMNEESSRSHAVF NIIITQTLYDLQSGNSGEKVSKVSLVDLAGSERVSKTGAAGERLKEGSNINKSLTTLGLVISSLADQAAGKGKSK
FVPYRDSVLTWLLKDNLGGNSQTSMIATISPAADNYEETLSTLRYADRAKRIWHAVWEDPNAKVIRELREEVE
KLREQLSQAEAMKAELKEKLEESEKLIKELTVTWEMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQS
YGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQ
SAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSY
PPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQS
SFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPM
DEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATV
SYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGR
GGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGR
GGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQDYKDDDDKGTEQ
KLISEEDLGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQ
NRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY (SEQ ID
NO: 84)
KIF13A-SPD5-PylRSAF
DNA:
ATGTCGGATACCAAGGTAAAAGTTGCCGTCCGGGTCCGGCCCATGAACCGACGAGAACTGGAACTGAACACCAAG TGCGTGGTGGAGATGGAAGGGAATCAAACGGTCCTGCACCCTCCTCCTTCTAACACCAAACAGGGAGAAAGGAAA CCTCCCAAGGTATTTGCCTTTGATTATTGCTTTTGGTCCATGGATGAATCTAACACTACAAAATACGCTGGTCAA GAAGTGGTTTTCAAGTGCCTTGGGGAAGGAATTCTTGAAAAAGCCTTTCAGGGGTATAATGCGTGTATTTTTGCA TATGGACAGACAGGTTCGGGAAAATCCTTTTCCATGATGGGCCATGCTGAGCAGCTGGGCCTTATTCCAAGGCTC TGCTGTGCTTTATTTAAAAGGATCTCTTTGGAGCAAAATGAGTCACAGACCTTTAAAGTTGAAGTGTCCTATATG GAAATTTATAATGAGAAAGTTCGGGATCTTTTAGACCCCAAAGGGAGTAGACAGTCTCTTAAAGTTCGAGAACAT AAAGTTTTGGGACCATATGTAGATGGTTTATCTCAACTAGCTGTCACTAGTTTTGAGGATATTGAGTCATTGATG TCTGAGGGAAATAAGTCTCGAACGGTAGCTGCTACCAACATGAACGAAGAAAGCAGCCGCTCCCATGCTGTGTTC AACATCATAATCACACAGACACTTTATGACCTGCAGTCTGGGAATTCCGGGGAGAAAGTCAGTAAGGTCAGCTTG GTAGACCTGGCGGGTAGCGAAAGAGTATCTAAAACAGGAGCTGCAGGAGAGCGACTGAAAGAAGGCAGCAACATT AACAAATCGCTTACAACCTTGGGGTTGGTTATATCATCACTGGCTGACCAGGCAGCTGGCAAGGGTAAAAGCAAA TTTGTGCCTTATCGAGATTCAGTCCTCACTTGGCTGCTTAAGGACAACTTGGGGGGCAACAGCCAAACCTCTATG ATAGCCACAATCAGCCCAGCCGCAGACAACTATGAAGAGACCCTCTCCACATTAAGATATGCAGACCGAGCCAAA AGGATTGTGAACCATGCTGTTGTGAATGAGGACCCCAACGCAAAAGTGATCCGAGAACTGCGGGAGGAAGTCGAG AAACTGAGAGAGCAGCTCTCTCAGGCAGAGGCCATGAAGGCCGAACTGAAGGAGAAGCTCGAAGAGTCTGAAAAG CTGATAAAAGAACTAACAGTGACTTGGGAAATGGAGGACAACAGCGTGCTGAACGAGGACAGCAACCTGGAGCAC GTGGAGGGCCAGCCCAGAAGAAGCATGAGCCAGCCCGTGCTGAACGTGGAGGGCGACAAGAGAACCAGCAGCACC AGCGCCACCCAGCAGCAGGTGCTGAGCGGCGCCTTCAGCAGCGCCGACGTGAGAAGCATCCCCATCATCCAGACC TGGGAGGAGAACAAGGCCCTGAAGACCAAGATCACCATCCTGAGAGGCGAGCTGCAGATGTACCAGAGAAGATAC AGCGAGGCCAAGGAGGCCAGCCAGAAGAGAGTGAAGGAGGTGATGGACGACTACGTGGACCTGAAGCTGGGCCAG GAGAACGTGCAGGAGAAGATGGAGCAGTACAAGCTGATGGAGGAGGACCTGCTGGCCATGCAGAGCAGAATCGAG ACCAGCGAGGACAACTTCGCCAGACAGATGAAGGAGTTCGAGGCCCAGAAGCACGCCATGGAGGAGAGAATCAAG GAGCTGGAGCTGAGCGCCACCGACGCCAACAACACCACCGTGGGCAGCTTCAGAGGCACCCTGGACGACATCCTG AAGAAGAACGACCCCGACTTCACCCTGACCAGCGGCTACGAGGAGAGAAAGATCAACGACCTGGAGGCCAAGCTG CTGAGCGAGATCGACAAGGTGGCCGAGCTGGAGGACCACATCCAGCAGCTGAGACAGGAGCTGGACGACCAGAGC GCCAGACTGGCCGACAGCGAGAACGTGAGAGCCCAGCTGGAGGCCGCCACCGGCCAGGGCATCCTGGGCGCCGCC GGCAACGCCATGGTGCCCAACAGCACCTTCATGATCGGCAACGGCAGAGAGAGCCAGACCAGAGACCAGCTGAAC TACATCGACGACCTGGAGACCAAGCTGGCCGACGCCAAGAAGGAGAACGACAAGGCCAGACAGGCCCTGGTGGAG TACATGAACAAGTGCAGCAAGCTGGAGCACGAGATCAGAACCATGGTGAAGAACAGCACCTTCGACAGCAGCAGC ATGCTGCTGGGCGGCCAGACCAGCGACGAGCTGAAGATCCAGATCGGCAAGGTGAACGGCGAGCTGAACGTGCTG AGAGCCGAGAACAGAGAGCTGAGAATCAGATGCGACCAGCTGACCGGCGGCGACGGCAACCTGAGCATCAGCCTG GGCCAGAGCAGACTGATGGCCGGCATCGCCACCAACGACGTGGACAGCATCGGCCAGGGCAACGAGACCGGCGGC ACCAGCATGAGAATCCTGCCCAGAGAGAGCCAGCTGGACGACCTGGAGGAGAGCAAGCTGCCCCTGATGGACACC AGCAGCGCCGTGAGAAACCAGCAGCAGTTCGCCAGCATGTGGGAGGACTTCGAGAGCGTGAAGGACAGCCTGCAG AACAACCACAACGACACCCTGGAGGGCAGCTTCAACAGCAGCATGCCCCCCCCCGGCAGAGACGCCACCCAGAGC TTCCTGAGCCAGAAGAGCTTCAAGAACAGCCCCATCGTGATGCAGAAGCCCAAGAGCCTGCACCTGCACCTGAAG AGCCACCAGAGCGAGGGCGCCGGCGAGCAGATCCAGAACAACAGCTTCAGCACCAAGACCGCCAGCCCCCACGTG AGCCAGAGCCACATCCCCATCCTGCACGACATGCAGCAGATCCTGGACAGCAGCGCCATGTTCCTGGAGGGCCAG CACGACGTGGCCGTGAACGTGGAGCAGATGCAGGAGAAGATGAGCCAGATCAGAGAGGCCCTGGCCAGACTGTTC GAGAGACTGAAGAGCAGCGCCGCCCTGTTCGAGGAGATCCTGGAGAGAATGGGCAGCAGCGACCCCAACGCCGAC AAGATCAAGAAGATGAAGCTGGCCTTCGAGACCAGCATCAACGACAAGCTGAACGTGAGCGCCATCCTGGAGGCC
GCCGAGAAGGACCTGCACAACATGAGCCTGAACTTCAGCATCCTGGAGAAGAGCATCGTGAGCCAGGCCGCCGAG
GCCAGCAGAAGATTCACCATCGCCCCCGACGCCGAGGACGTGGCCAGCAGCAGCCTGCTGAACGCCAGCTACAGC
CCCCTGTTCAAGTTCACCAGCAACAGCGACATCGTGGAGAAGCTGCAGAACGAGGTGAGCGAGCTGAAGAACGAG
CTGGAGATGGCCAGAACCAGAGACATGAGAAGCCCCCTGAACGGCAGCAGCGGCAGACTGAGCGACGTGCAGATC
AACACCAACAGAATGTTCGAGGACCTGGAGGTGAGCGAGGCCACCCTGCAGAAGGCCAAGGAGGAGAACAGCACC
CTGAAGAGCCAGTTCGCCGAGCTGGAGGCCAACCTGCACCAGGTGAACAGCAAGCTGGGCGAGGTGAGATGCGAG
CTGAACGAGGCCCTGGCCAGAGTGGACGGCGAGCAGGAGACCAGAGTGAAGGCCGAGAACGCCCTGGAGGAGGCC
AGACAGCTGATCAGCAGCCTGAAGCACGAGGAGAACGAGCTGAAGAAGACCATCACCGACATGGGCATGAGACTG
AACGAGGCCAAGAAGAGCGACGAGTTCCTGAAGAGCGAGCTGAGCACCGCCCTGGAGGAGGAGAAGAAGAGCCAG
AACCTGGCCGACGAGCTGAGCGAGGAGCTGAACGGCTGGAGAATGAGAACCAAGGAGGCCGAGAACAAGGTGGAG
CACGCCAGCAGCGAGAAGAGCGAGATGCTGGAGAGAATCGTGCACCTGGAGACCGAGATGGAGAAGCTGAGCACC
AGCGAGATCGCCGCCGACTACTGCAGCACCAAGATGACCGAGAGAAAGAAGGAGATCGAGCTGGCCAAGTACAGA
GAGGACTTCGAGAACGCCGCCATCGTGGGCCTGGAGAGAATCAGCAAGGAGATCAGCGAGCTGACCAAGAAGACC
CTGAAGGCCAAGATCATCCCCAGCAACATCAGCAGCATCCAGCTGGTGTGCGACGAGCTGTGCAGAAGACTGAGC
AGAGAGAGAGAGCAGCAGCACGAGTACGCCAAGGTGATGAGAGACGTGAACGAGAAGATCGAGAAGCTGCAGCTG
GAGAAGGACGCCCTGGAGCACGAGCTGAAGATGATGAGCAGCAACAACGAGAACGTGCCCCCCGTGGGCACCAGC
GTGAGCGGCATGCCCACCAAGACCAGCAACCAGAAGTGCGCCCAGCCCCACTACACCAGCCCCACCAGACAGCTG
CTGCACGAGAGCACCATGGCCGTGGACGCCATCGTGCAGAAGCTGAAGAAGACCCACAACATGAGCGGCATGGGC
CCCGAGCTGAAGGAGACCATCGGCAACGTGATCAACGAGAGCAGAGTGCTGAGAGACTTCCTGCACCAGAAGCTG
ATCCTGTTCAAGGGCATCGACATGAGCAACTGGAAGAACGAGACCGTGGACCAGCTGATCACCGACCTGGGCCAG
CTGCACCAGGACAACCTGATGCTGGAGGAGCAGATCAAGAAGTACAAGAAGGAGCTGAAGCTGACCAAGAGCGCC
ATCCCCACCCTGGGCGTGGAGTTCCAGGACAGAATCAAGACCGAGATCGGCAAGATCGCCACCGACATGGGCGGC
GCCGTGAAGGAGATCAGAAAGAAGGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCGTGC
CCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCT
GCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATC
TATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGT
CACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCC
AATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCC
GTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCG
GCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACC
GGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCA
GCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAAT
TCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCC
GAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAG
ATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAA
CAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAA
CTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAA
GAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGC
ATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGAC
ACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGG
GGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAAC
ATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 85)
Protein :
MSDTKVKVAVRVRPMNRRELELNTKCWEMEGNQTVLHPPPSNTKQGERKPPKVFAFDYCFWSMDESNTTKYAGQ
EWFKCLGEGILEKAFQGYNACIFAYGQTGSGKSFSMMGHAEQLGLIPRLCCALFKRISLEQNESQTFKVEVSYM
EIYNEKVRDLLDPKGSRQSLKVREHKVLGPYVDGLSQLAVTSFEDIESLMSEGNKSRTVAATNMNEESSRSHAVF
NIIITQTLYDLQSGNSGEKVSKVSLVDLAGSERVSKTGAAGERLKEGSNINKSLTTLGLVISSLADQAAGKGKSK
FVPYRDSVLTWLLKDNLGGNSQTSMIATISPAADNYEETLSTLRYADRAKRIVNHAWNEDPNAKVIRELREEVE
KLREQLSQAEAMKAELKEKLEESEKLIKELTVTWEMEDNSVLNEDSNLEHVEGQPRRSMSQPVLNVEGDKRTSST
SATQQQVLSGAFSSADVRSIPIIQTWEENKALKTKITILRGELQMYQRRYSEAKEASQKRVKEVMDDYVDLKLGQ
ENVQEKMEQYKLMEEDLLAMQSRIETSEDNFARQMKEFEAQKHAMEERIKELELSATDANNTTVGSFRGTLDDIL
KKNDPDFTLTSGYEERKINDLEAKLLSEIDKVAELEDHIQQLRQELDDQSARLADSENVRAQLEAATGQGILGAA
GNAMVPNSTFMIGNGRESQTRDQLNYIDDLETKLADAKKENDKARQALVEYMNKCSKLEHEIRTMVKNSTFDSSS
MLLGGQTSDELKIQIGKVNGELNVLRAENRELRIRCDQLTGGDGNLSISLGQSRLMAGIATNDVDSIGQGNETGG
TSMRILPRESQLDDLEESKLPLMDTSSAVRNQQQFASMWEDFESVKDSLQNNHNDTLEGSFNSSMPPPGRDATQS
FLSQKSFKNSPIVMQKPKSLHLHLKSHQSEGAGEQIQNNSFSTKTASPHVSQSHIPILHDMQQILDSSAMFLEGQ
HDVAVNVEQMQEKMSQIREALARLFERLKSSAALFEEILERMGSSDPNADKIKKMKLAFETSINDKLNVSAILEA AEKDLHNMSLNFSILEKSIVSQAAEASRRFTIAPDAEDVASSSLLNASYSPLFKFTSNSDIVEKLQNEVSELKNE
LEMARTRDMRSPLNGSSGRLSDVQINTNRMFEDLEVSEATLQKAKEENSTLKSQFAELEANLHQVNSKLGEVRCE
LNEALARVDGEQETRVKAENALEEARQLISSLKHEENELKKTITDMGMRLNEAKKSDEFLKSELSTALEEEKKSQ
NLADELSEELNGWRMRTKEAENKVEHASSEKSEMLERIVHLETEMEKLSTSEIAADYCSTKMTERKKEIELAKYR
EDFENAAIVGLERISKEISELTKKTLKAKIIPSNISSIQLVCDELCRRLSREREQQHEYAKVMRDWEKIEKLQL
EKDALEHELKMMSSNNENVPPVGTSVSGMPTKTSNQKCAQPHYTSPTRQLLHESTMAVDAIVQKLKKTHNMSGMG
PELKETIGNVINESRVLRDFLHQKLILFKGIDMSNWKNETVDQLITDLGQLHQDNLMLEEQIKKYKKELKLTKSA
IPTLGVEFQDRIKTEIGKIATDMGGAVKEIRKKGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLIS
ATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKA
NEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSIST
GATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYA
EERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRK
LDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGD
TLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ
ID NO: 86)
KIF13A-SPD5-MCP
DNA:
ATGTCGGATACCAAGGTAAAAGTTGCCGTCCGGGTCCGGCCCATGAACCGACGAGAACTGGAACTGAACACCAAG TGCGTGGTGGAGATGGAAGGGAATCAAACGGTCCTGCACCCTCCTCCTTCTAACACCAAACAGGGAGAAAGGAAA CCTCCCAAGGTATTTGCCTTTGATTATTGCTTTTGGTCCATGGATGAATCTAACACTACAAAATACGCTGGTCAA GAAGTGGTTTTCAAGTGCCTTGGGGAAGGAATTCTTGAAAAAGCCTTTCAGGGGTATAATGCGTGTATTTTTGCA TATGGACAGACAGGTTCGGGAAAATCCTTTTCCATGATGGGCCATGCTGAGCAGCTGGGCCTTATTCCAAGGCTC TGCTGTGCTTTATTTAAAAGGATCTCTTTGGAGCAAAATGAGTCACAGACCTTTAAAGTTGAAGTGTCCTATATG GAAATTTATAATGAGAAAGTTCGGGATCTTTTAGACCCCAAAGGGAGTAGACAGTCTCTTAAAGTTCGAGAACAT AAAGTTTTGGGACCATATGTAGATGGTTTATCTCAACTAGCTGTCACTAGTTTTGAGGATATTGAGTCATTGATG TCTGAGGGAAATAAGTCTCGAACGGTAGCTGCTACCAACATGAACGAAGAAAGCAGCCGCTCCCATGCTGTGTTC AACATCATAATCACACAGACACTTTATGACCTGCAGTCTGGGAATTCCGGGGAGAAAGTCAGTAAGGTCAGCTTG GTAGACCTGGCGGGTAGCGAAAGAGTATCTAAAACAGGAGCTGCAGGAGAGCGACTGAAAGAAGGCAGCAACATT AACAAATCGCTTACAACCTTGGGGTTGGTTATATCATCACTGGCTGACCAGGCAGCTGGCAAGGGTAAAAGCAAA TTTGTGCCTTATCGAGATTCAGTCCTCACTTGGCTGCTTAAGGACAACTTGGGGGGCAACAGCCAAACCTCTATG ATAGCCACAATCAGCCCAGCCGCAGACAACTATGAAGAGACCCTCTCCACATTAAGATATGCAGACCGAGCCAAA AGGATTGTGAACCATGCTGTTGTGAATGAGGACCCCAACGCAAAAGTGATCCGAGAACTGCGGGAGGAAGTCGAG AAACTGAGAGAGCAGCTCTCTCAGGCAGAGGCCATGAAGGCCGAACTGAAGGAGAAGCTCGAAGAGTCTGAAAAG CTGATAAAAGAACTAACAGTGACTTGGGAAATGGAGGACAACAGCGTGCTGAACGAGGACAGCAACCTGGAGCAC GTGGAGGGCCAGCCCAGAAGAAGCATGAGCCAGCCCGTGCTGAACGTGGAGGGCGACAAGAGAACCAGCAGCACC AGCGCCACCCAGCAGCAGGTGCTGAGCGGCGCCTTCAGCAGCGCCGACGTGAGAAGCATCCCCATCATCCAGACC TGGGAGGAGAACAAGGCCCTGAAGACCAAGATCACCATCCTGAGAGGCGAGCTGCAGATGTACCAGAGAAGATAC AGCGAGGCCAAGGAGGCCAGCCAGAAGAGAGTGAAGGAGGTGATGGACGACTACGTGGACCTGAAGCTGGGCCAG GAGAACGTGCAGGAGAAGATGGAGCAGTACAAGCTGATGGAGGAGGACCTGCTGGCCATGCAGAGCAGAATCGAG ACCAGCGAGGACAACTTCGCCAGACAGATGAAGGAGTTCGAGGCCCAGAAGCACGCCATGGAGGAGAGAATCAAG GAGCTGGAGCTGAGCGCCACCGACGCCAACAACACCACCGTGGGCAGCTTCAGAGGCACCCTGGACGACATCCTG AAGAAGAACGACCCCGACTTCACCCTGACCAGCGGCTACGAGGAGAGAAAGATCAACGACCTGGAGGCCAAGCTG CTGAGCGAGATCGACAAGGTGGCCGAGCTGGAGGACCACATCCAGCAGCTGAGACAGGAGCTGGACGACCAGAGC GCCAGACTGGCCGACAGCGAGAACGTGAGAGCCCAGCTGGAGGCCGCCACCGGCCAGGGCATCCTGGGCGCCGCC GGCAACGCCATGGTGCCCAACAGCACCTTCATGATCGGCAACGGCAGAGAGAGCCAGACCAGAGACCAGCTGAAC TACATCGACGACCTGGAGACCAAGCTGGCCGACGCCAAGAAGGAGAACGACAAGGCCAGACAGGCCCTGGTGGAG TACATGAACAAGTGCAGCAAGCTGGAGCACGAGATCAGAACCATGGTGAAGAACAGCACCTTCGACAGCAGCAGC ATGCTGCTGGGCGGCCAGACCAGCGACGAGCTGAAGATCCAGATCGGCAAGGTGAACGGCGAGCTGAACGTGCTG AGAGCCGAGAACAGAGAGCTGAGAATCAGATGCGACCAGCTGACCGGCGGCGACGGCAACCTGAGCATCAGCCTG GGCCAGAGCAGACTGATGGCCGGCATCGCCACCAACGACGTGGACAGCATCGGCCAGGGCAACGAGACCGGCGGC ACCAGCATGAGAATCCTGCCCAGAGAGAGCCAGCTGGACGACCTGGAGGAGAGCAAGCTGCCCCTGATGGACACC AGCAGCGCCGTGAGAAACCAGCAGCAGTTCGCCAGCATGTGGGAGGACTTCGAGAGCGTGAAGGACAGCCTGCAG AACAACCACAACGACACCCTGGAGGGCAGCTTCAACAGCAGCATGCCCCCCCCCGGCAGAGACGCCACCCAGAGC TTCCTGAGCCAGAAGAGCTTCAAGAACAGCCCCATCGTGATGCAGAAGCCCAAGAGCCTGCACCTGCACCTGAAG AGCCACCAGAGCGAGGGCGCCGGCGAGCAGATCCAGAACAACAGCTTCAGCACCAAGACCGCCAGCCCCCACGTG AGCCAGAGCCACATCCCCATCCTGCACGACATGCAGCAGATCCTGGACAGCAGCGCCATGTTCCTGGAGGGCCAG CACGACGTGGCCGTGAACGTGGAGCAGATGCAGGAGAAGATGAGCCAGATCAGAGAGGCCCTGGCCAGACTGTTC GAGAGACTGAAGAGCAGCGCCGCCCTGTTCGAGGAGATCCTGGAGAGAATGGGCAGCAGCGACCCCAACGCCGAC AAGATCAAGAAGATGAAGCTGGCCTTCGAGACCAGCATCAACGACAAGCTGAACGTGAGCGCCATCCTGGAGGCC GCCGAGAAGGACCTGCACAACATGAGCCTGAACTTCAGCATCCTGGAGAAGAGCATCGTGAGCCAGGCCGCCGAG GCCAGCAGAAGATTCACCATCGCCCCCGACGCCGAGGACGTGGCCAGCAGCAGCCTGCTGAACGCCAGCTACAGC CCCCTGTTCAAGTTCACCAGCAACAGCGACATCGTGGAGAAGCTGCAGAACGAGGTGAGCGAGCTGAAGAACGAG CTGGAGATGGCCAGAACCAGAGACATGAGAAGCCCCCTGAACGGCAGCAGCGGCAGACTGAGCGACGTGCAGATC AACACCAACAGAATGTTCGAGGACCTGGAGGTGAGCGAGGCCACCCTGCAGAAGGCCAAGGAGGAGAACAGCACC CTGAAGAGCCAGTTCGCCGAGCTGGAGGCCAACCTGCACCAGGTGAACAGCAAGCTGGGCGAGGTGAGATGCGAG CTGAACGAGGCCCTGGCCAGAGTGGACGGCGAGCAGGAGACCAGAGTGAAGGCCGAGAACGCCCTGGAGGAGGCC AGACAGCTGATCAGCAGCCTGAAGCACGAGGAGAACGAGCTGAAGAAGACCATCACCGACATGGGCATGAGACTG AACGAGGCCAAGAAGAGCGACGAGTTCCTGAAGAGCGAGCTGAGCACCGCCCTGGAGGAGGAGAAGAAGAGCCAG AACCTGGCCGACGAGCTGAGCGAGGAGCTGAACGGCTGGAGAATGAGAACCAAGGAGGCCGAGAACAAGGTGGAG CACGCCAGCAGCGAGAAGAGCGAGATGCTGGAGAGAATCGTGCACCTGGAGACCGAGATGGAGAAGCTGAGCACC AGCGAGATCGCCGCCGACTACTGCAGCACCAAGATGACCGAGAGAAAGAAGGAGATCGAGCTGGCCAAGTACAGA GAGGACTTCGAGAACGCCGCCATCGTGGGCCTGGAGAGAATCAGCAAGGAGATCAGCGAGCTGACCAAGAAGACC CTGAAGGCCAAGATCATCCCCAGCAACATCAGCAGCATCCAGCTGGTGTGCGACGAGCTGTGCAGAAGACTGAGC AGAGAGAGAGAGCAGCAGCACGAGTACGCCAAGGTGATGAGAGACGTGAACGAGAAGATCGAGAAGCTGCAGCTG GAGAAGGACGCCCTGGAGCACGAGCTGAAGATGATGAGCAGCAACAACGAGAACGTGCCCCCCGTGGGCACCAGC GTGAGCGGCATGCCCACCAAGACCAGCAACCAGAAGTGCGCCCAGCCCCACTACACCAGCCCCACCAGACAGCTG CTGCACGAGAGCACCATGGCCGTGGACGCCATCGTGCAGAAGCTGAAGAAGACCCACAACATGAGCGGCATGGGC CCCGAGCTGAAGGAGACCATCGGCAACGTGATCAACGAGAGCAGAGTGCTGAGAGACTTCCTGCACCAGAAGCTG ATCCTGTTCAAGGGCATCGACATGAGCAACTGGAAGAACGAGACCGTGGACCAGCTGATCACCGACCTGGGCCAG CTGCACCAGGACAACCTGATGCTGGAGGAGCAGATCAAGAAGTACAAGAAGGAGCTGAAGCTGACCAAGAGCGCC ATCCCCACCCTGGGCGTGGAGTTCCAGGACAGAATCAAGACCGAGATCGGCAAGATCGCCACCGACATGGGCGGC GCCGTGAAGGAGATCAGAAAGAAGGGTACCGAGCAGAAGCTGATCTCAGAGGAGGACCTGGGCGCCCCCGGCTCC GCCGGCTCCGCCGCCGGCTCCGGCGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGAC GTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTAC AAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGC GCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTT AAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA
(SEQ ID NO: 87)
Protein :
MSDTKVKVAVRVRPMNRRELELNTKCWEMEGNQTVLHPPPSNTKQGERKPPKVFAFDYCFWSMDESNTTKYAGQ EWFKCLGEGILEKAFQGYNACIFAYGQTGSGKSFSMMGHAEQLGLIPRLCCALFKRISLEQNESQTFKVEVSYM EIYNEKVRDLLDPKGSRQSLKVREHKVLGPYVDGLSQLAVTSFEDIESLMSEGNKSRTVAATNMNEESSRSHAVF NIIITQTLYDLQSGNSGEKVSKVSLVDLAGSERVSKTGAAGERLKEGSNINKSLTTLGLVISSLADQAAGKGKSK FVPYRDSVLTWLLKDNLGGNSQTSMIATISPAADNYEETLSTLRYADRAKRIVNHAWNEDPNAKVIRELREEVE KLREQLSQAEAMKAELKEKLEESEKLIKELTVTWEMEDNSVLNEDSNLEHVEGQPRRSMSQPVLNVEGDKRTSST SATQQQVLSGAFSSADVRSIPIIQTWEENKALKTKITILRGELQMYQRRYSEAKEASQKRVKEVMDDYVDLKLGQ ENVQEKMEQYKLMEEDLLAMQSRIETSEDNFARQMKEFEAQKHAMEERIKELELSATDANNTTVGSFRGTLDDIL KKNDPDFTLTSGYEERKINDLEAKLLSEIDKVAELEDHIQQLRQELDDQSARLADSENVRAQLEAATGQGILGAA GNAMVPNSTFMIGNGRESQTRDQLNYIDDLETKLADAKKENDKARQALVEYMNKCSKLEHEIRTMVKNSTFDSSS MLLGGQTSDELKIQIGKVNGELNVLRAENRELRIRCDQLTGGDGNLSISLGQSRLMAGIATNDVDSIGQGNETGG TSMRILPRESQLDDLEESKLPLMDTSSAVRNQQQFASMWEDFESVKDSLQNNHNDTLEGSFNSSMPPPGRDATQS FLSQKSFKNSPIVMQKPKSLHLHLKSHQSEGAGEQIQNNSFSTKTASPHVSQSHIPILHDMQQILDSSAMFLEGQ HDVAVNVEQMQEKMSQIREALARLFERLKSSAALFEEILERMGSSDPNADKIKKMKLAFETSINDKLNVSAILEA AEKDLHNMSLNFSILEKSIVSQAAEASRRFTIAPDAEDVASSSLLNASYSPLFKFTSNSDIVEKLQNEVSELKNE LEMARTRDMRSPLNGSSGRLSDVQINTNRMFEDLEVSEATLQKAKEENSTLKSQFAELEANLHQVNSKLGEVRCE LNEALARVDGEQETRVKAENALEEARQLISSLKHEENELKKTITDMGMRLNEAKKSDEFLKSELSTALEEEKKSQ NLADELSEELNGWRMRTKEAENKVEHASSEKSEMLERIVHLETEMEKLSTSEIAADYCSTKMTERKKEIELAKYR EDFENAAIVGLERISKEISELTKKTLKAKIIPSNISSIQLVCDELCRRLSREREQQHEYAKVMRDVNEKIEKLQL EKDALEHELKMMSSNNENVPPVGTSVSGMPTKTSNQKCAQPHYTSPTRQLLHESTMAVDAIVQKLKKTHNMSGMG PELKETIGNVINESRVLRDFLHQKLILFKGIDMSNWKNETVDQLITDLGQLHQDNLMLEEQIKKYKKELKLTKSA IPTLGVEFQDRIKTEIGKIATDMGGAVKEIRKKGTEQKLISEEDLGAPGSAGSAAGSGASNFTQFVLVDNGGTGD VTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIV KAMQGLLKDGNPIPSAIAANSGIY (SEQ ID NO: 88)
KI F13A- FUS-MCP- Py1RSAF DNA :
ATGTCGGATACCAAGGTAAAAGTTGCCGTCCGGGTCCGGCCCATGAACCGACGAGAACTGGAACTGAACACCAAG TGCGTGGTGGAGATGGAAGGGAATCAAACGGTCCTGCACCCTCCTCCTTCTAACACCAAACAGGGAGAAAGGAAA CCTCCCAAGGTATTTGCCTTTGATTATTGCTTTTGGTCCATGGATGAATCTAACACTACAAAATACGCTGGTCAA GAAGTGGTTTTCAAGTGCCTTGGGGAAGGAATTCTTGAAAAAGCCTTTCAGGGGTATAATGCGTGTATTTTTGCA TATGGACAGACAGGTTCGGGAAAATCCTTTTCCATGATGGGCCATGCTGAGCAGCTGGGCCTTATTCCAAGGCTC TGCTGTGCTTTATTTAAAAGGATCTCTTTGGAGCAAAATGAGTCACAGACCTTTAAAGTTGAAGTGTCCTATATG GAAAT T T AT AAT GAGAAAGT T C G G GAT C T T T T AGAC C C C AAAG G GAGT AGAC AGT C T C T T AAAGT T C GAGAAC AT AAAGTTTTGGGACCATATGTAGATGGTTTATCTCAACTAGCTGTCACTAGTTTTGAGGATATTGAGTCATTGATG TCTGAGGGAAATAAGTCTCGAACGGTAGCTGCTACCAACATGAACGAAGAAAGCAGCCGCTCCCATGCTGTGTTC AAC AT CAT AAT C AC AC AGAC AC T T TAT GAC C T G C AGT C T G G GAAT T C C G G G GAGAAAGT C AGT AAG GT C AG C T T G GT AGAC CTGGCGGGTAGC GAAAGAGT AT C T AAAAC AG GAG C T G C AG GAGAG C GAC T GAAAGAAG G C AG C AAC AT T AACAAATCGCTTACAACCTTGGGGTTGGTTATATCATCACTGGCTGACCAGGCAGCTGGCAAGGGTAAAAGCAAA TTTGTGCCTTATCGAGATTCAGTCCTCACTTGGCTGCTTAAGGACAACTTGGGGGGCAACAGCCAAACCTCTATG AT AG C C AC AAT C AG C C C AG C C G C AGAC AAC TAT GAAGAGAC C C T C T C C AC AT T AAGAT AT G C AGAC C GAG C C AAA AGGATTGTGAACCATGCTGTTGTGAATGAGGACCCCAACGCAAAAGTGATCCGAGAACTGCGGGAGGAAGTCGAG AAAC T GAGAGAG C AG C T C T C T C AG G C AGAG G C CAT GAAG G C C GAAC T GAAG GAGAAG C T C GAAGAGT C T GAAAAG C T GAT AAAAGAAC T AAC AGT GAC T T G G GAAT AT AC AGAT AT T GAAAT GAAC AGAT T G G GAAAG GGCGCCCCCGGC TCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCC T AC C C C AC C C AG C C C G G G C AG G G C TAT T C C C AG C AGAG C AGT C AG C C C T AC G GAC AG C AGAGT T AC AGT G GT TAT AG C C AGT C C AC G GAC AC T T C AG GAT AT G G C C AGAG C AG C TAT T C T T C T TAT G G C C AGAG C C AGAAC AC AG G C TAT GGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTAC GGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGT T C T C AGAG C AG C AG C TAT G G G C AG C C C C AGAGT G G GAG C T AC AG C C AG C AG C C T AG C TAT G GT G GAC AG C AG C AA AG C TAT G GAC AG C AG C AAAG C TAT AAT C C C C C T C AG G G C TAT G GAC AG C AGAAC C AGT AC AAC AG C AG C AGT G GT GGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGT GGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGC GGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCC AGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGT GGCCCTCGG GAC C AAG GAT C AC GT CAT GAC T C C GAAC AG GAT AAT T C AGAC AAC AAC AC CAT C T T T GT G C AAG G C CTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAA AC G G GAC AG C C CAT GAT T AAT T T GT AC AC AGAC AG G GAAAC T G G C AAG C T GAAG G GAGAG G C AAC GGTCTCTTTT GATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTC TCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCC ATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGC GGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGG AATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGT AACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGATTACAAGGATGACGACGATAAGGGTACCGGCGCCCCC GGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACT GGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAG GCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCT AAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTT ATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATC TACGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCG CCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGT CGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGC GATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACC TGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTG AAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCA CTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAG GAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTG GT T AAAG G C AAT AC C AAT C C GAT T AC AAG CAT GTCTGCCCCGGTT C AAG CAT C AG C T C C AG C AC T GAC AAAAT C C CAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAA CTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTG GGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATT C C T C T G GAGT AT AT C GAG C GT AT G G G CAT C GAC AAT GAT AC C GAAC T GAG C AAAC AAAT TTTCCGTGTG GAT AAA AACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGAT CCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACC ATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAAC CACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGC GACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATC GGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCC GAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 89)
Protein :
MSDTKVKVAVRVRPMNRRELELNTKCWEMEGNQTVLHPPPSNTKQGERKPPKVFAFDYCFWSMDESNTTKYAGQ EWFKCLGEGILEKAFQGYNACIFAYGQTGSGKSFSMMGHAEQLGLIPRLCCALFKRISLEQNESQTFKVEVSYM EIYNEKVRDLLDPKGSRQSLKVREHKVLGPYVDGLSQLAVTSFEDIESLMSEGNKSRTVAATNMNEESSRSHAVF NIIITQTLYDLQSGNSGEKVSKVSLVDLAGSERVSKTGAAGERLKEGSNINKSLTTLGLVISSLADQAAGKGKSK FVPYRDSVLTWLLKDNLGGNSQTSMIATISPAADNYEETLSTLRYADRAKRIVNHAWNEDPNAKVIRELREEVE KLREQLSQAEAMKAELKEKLEESEKLIKELTVTWEYTDIEMNRLGKGAPGSAGSAAGSGMASNDYTQQATQSYGA YPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSY GQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSG GGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEP RGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKK TGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGP MGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGG NYGDDRRGGRGGDYKDDDDKGTGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQ AYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAA SGI YGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACG DHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKP LENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKS QTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILI PLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFT MLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWI GAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 90)
KIF13A-PylRSAF
DNA:
ATGTCGGATACCAAGGTAAAAGTTGCCGTCCGGGTCCGGCCCATGAACCGACGAGAACTGGAACTGAACACCAAG TGCGTGGTGGAGATGGAAGGGAATCAAACGGTCCTGCACCCTCCTCCTTCTAACACCAAACAGGGAGAAAGGAAA CCTCCCAAGGTATTTGCCTTTGATTATTGCTTTTGGTCCATGGATGAATCTAACACTACAAAATACGCTGGTCAA GAAGTGGTTTTCAAGTGCCTTGGGGAAGGAATTCTTGAAAAAGCCTTTCAGGGGTATAATGCGTGTATTTTTGCA TATGGACAGACAGGTTCGGGAAAATCCTTTTCCATGATGGGCCATGCTGAGCAGCTGGGCCTTATTCCAAGGCTC TGCTGTGCTTTATTTAAAAGGATCTCTTTGGAGCAAAATGAGTCACAGACCTTTAAAGTTGAAGTGTCCTATATG GAAATTTATAATGAGAAAGTTCGGGATCTTTTAGACCCCAAAGGGAGTAGACAGTCTCTTAAAGTTCGAGAACAT AAAGTTTTGGGACCATATGTAGATGGTTTATCTCAACTAGCTGTCACTAGTTTTGAGGATATTGAGTCATTGATG TCTGAGGGAAATAAGTCTCGAACGGTAGCTGCTACCAACATGAACGAAGAAAGCAGCCGCTCCCATGCTGTGTTC AACATCATAATCACACAGACACTTTATGACCTGCAGTCTGGGAATTCCGGGGAGAAAGTCAGTAAGGTCAGCTTG GTAGACCTGGCGGGTAGCGAAAGAGTATCTAAAACAGGAGCTGCAGGAGAGCGACTGAAAGAAGGCAGCAACATT AACAAATCGCTTACAACCTTGGGGTTGGTTATATCATCACTGGCTGACCAGGCAGCTGGCAAGGGTAAAAGCAAA TTTGTGCCTTATCGAGATTCAGTCCTCACTTGGCTGCTTAAGGACAACTTGGGGGGCAACAGCCAAACCTCTATG ATAGCCACAATCAGCCCAGCCGCAGACAACTATGAAGAGACCCTCTCCACATTAAGATATGCAGACCGAGCCAAA AGGATTGTGAACCATGCTGTTGTGAATGAGGACCCCAACGCAAAAGTGATCCGAGAACTGCGGGAGGAAGTCGAG AAACTGAGAGAGCAGCTCTCTCAGGCAGAGGCCATGAAGGCCGAACTGAAGGAGAAGCTCGAAGAGTCTGAAAAG CTGATAAAAGAACTAACAGTGACTTGGGAATATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGC TCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACC CTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGC GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTG GAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGT ATTTCTACTAACCTGTAA (SEQ ID NO: 91)
Protein :
MSDTKVKVAVRVRPMNRRELELNTKCWEMEGNQTVLHPPPSNTKQGERKPPKVFAFDYCFWSMDESNTTKYAGQ EWFKCLGEGILEKAFQGYNACIFAYGQTGSGKSFSMMGHAEQLGLIPRLCCALFKRISLEQNESQTFKVEVSYM EIYNEKVRDLLDPKGSRQSLKVREHKVLGPYVDGLSQLAVTSFEDIESLMSEGNKSRTVAATNMNEESSRSHAVF NIIITQTLYDLQSGNSGEKVSKVSLVDLAGSERVSKTGAAGERLKEGSNINKSLTTLGLVISSLADQAAGKGKSK FVPYRDSVLTWLLKDNLGGNSQTSMIATISPAADNYEETLSTLRYADRAKRIVNHAWNEDPNAKVIRELREEVE KLREQLSQAEAMKAELKEKLEESEKLIKELTVTWEYTDIEMNRLGKGAPGSAGSAAGSGMACPVPLQLPPLERLT LDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRV SDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVP ASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELL SRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRP MLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDF KIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNG ISTNL (SEQ ID NO: 92)
KIF13A-MCP
DNA:
ATGTCGGATACCAAGGTAAAAGTTGCCGTCCGGGTCCGGCCCATGAACCGACGAGAACTGGAACTGAACACCAAG TGCGTGGTGGAGATGGAAGGGAATCAAACGGTCCTGCACCCTCCTCCTTCTAACACCAAACAGGGAGAAAGGAAA CCTCCCAAGGTATTTGCCTTTGATTATTGCTTTTGGTCCATGGATGAATCTAACACTACAAAATACGCTGGTCAA GAAGTGGTTTTCAAGTGCCTTGGGGAAGGAATTCTTGAAAAAGCCTTTCAGGGGTATAATGCGTGTATTTTTGCA TATGGACAGACAGGTTCGGGAAAATCCTTTTCCATGATGGGCCATGCTGAGCAGCTGGGCCTTATTCCAAGGCTC TGCTGTGCTTTATTTAAAAGGATCTCTTTGGAGCAAAATGAGTCACAGACCTTTAAAGTTGAAGTGTCCTATATG GAAATTTATAATGAGAAAGTTCGGGATCTTTTAGACCCCAAAGGGAGTAGACAGTCTCTTAAAGTTCGAGAACAT AAAGTTTTGGGACCATATGTAGATGGTTTATCTCAACTAGCTGTCACTAGTTTTGAGGATATTGAGTCATTGATG TCTGAGGGAAATAAGTCTCGAACGGTAGCTGCTACCAACATGAACGAAGAAAGCAGCCGCTCCCATGCTGTGTTC AACATCATAATCACACAGACACTTTATGACCTGCAGTCTGGGAATTCCGGGGAGAAAGTCAGTAAGGTCAGCTTG GTAGACCTGGCGGGTAGCGAAAGAGTATCTAAAACAGGAGCTGCAGGAGAGCGACTGAAAGAAGGCAGCAACATT AACAAATCGCTTACAACCTTGGGGTTGGTTATATCATCACTGGCTGACCAGGCAGCTGGCAAGGGTAAAAGCAAA TTTGTGCCTTATCGAGATTCAGTCCTCACTTGGCTGCTTAAGGACAACTTGGGGGGCAACAGCCAAACCTCTATG ATAGCCACAATCAGCCCAGCCGCAGACAACTATGAAGAGACCCTCTCCACATTAAGATATGCAGACCGAGCCAAA AGGATTGTGAACCATGCTGTTGTGAATGAGGACCCCAACGCAAAAGTGATCCGAGAACTGCGGGAGGAAGTCGAG AAACTGAGAGAGCAGCTCTCTCAGGCAGAGGCCATGAAGGCCGAACTGAAGGAGAAGCTCGAAGAGTCTGAAAAG CTGATAAAAGAACTAACAGTGACTTGGGAATATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGC TCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACT GGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAG GCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCT AAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTT ATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATC TACTAA (SEQ ID NO: 93)
Protein :
MSDTKVKVAVRVRPMNRRELELNTKCWEMEGNQTVLHPPPSNTKQGERKPPKVFAFDYCFWSMDESNTTKYAGQ EWFKCLGEGILEKAFQGYNACIFAYGQTGSGKSFSMMGHAEQLGLIPRLCCALFKRISLEQNESQTFKVEVSYM EIYNEKVRDLLDPKGSRQSLKVREHKVLGPYVDGLSQLAVTSFEDIESLMSEGNKSRTVAATNMNEESSRSHAVF NIIITQTLYDLQSGNSGEKVSKVSLVDLAGSERVSKTGAAGERLKEGSNINKSLTTLGLVISSLADQAAGKGKSK FVPYRDSVLTWLLKDNLGGNSQTSMIATISPAADNYEETLSTLRYADRAKRIVNHAWNEDPNAKVIRELREEVE KLREQLSQAEAMKAELKEKLEESEKLIKELTVTWEYTDIEMNRLGKGAPGSAGSAAGSGMASNFTQFVLVDNGGT GDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCEL IVKAMQGLLKDGNPIPSAIAANSGIY (SEQ ID NO: 94) TOMM20-EWSRl-MCP
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC
CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG
AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAGTTCTTCATGGCGTCCACGGAT
TACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCA
CAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCT
CAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCA
ACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTC
ACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAG
CCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACA
GGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCC
ATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAG
AGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGC
AGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGC
CAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAG
GAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGG
GGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGA
GGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGAT
GAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTC
TTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAA
ACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTT
GATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGG
GGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCT
GGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCC
CGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGA
AACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCC
TTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGAT
CGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATG
GACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGATTAC
AAGGATGACGACGATAAGGGTACCGAGCAGAAGCTGATCTCAGAGGAGGACCTGGGCGCCCCCGGCTCCGCCGGC
TCCGCCGCCGGCTCCGGCGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACT
GTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTA
ACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGG
CGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCA
ATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ
ID NO: 95)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASTD YSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTP TAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSST GGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQS SYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRG GFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADF FKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMR GGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCG NQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGM DRGGFGGGRRGGPGGPPGPLMEQDYKDDDDKGTEQKLISEEDLGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVT VAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKA MQGLLKDGNPIPSAIAANSGIY (SEQ ID NO: 96)
TOMM20-EWSRl-HA-MCP
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAGTTCTTCATGGCGTCCACGGAT TACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCA CAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCT CAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCA ACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTC ACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAG CCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACA GGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCC ATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAG AGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGC AGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGC CAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAG GAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGG GGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGA GGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGAT GAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTC TTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAA ACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTT GATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGG GGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCT GGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCC CGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGA AACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCC TTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGAT CGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATG GACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATC GCATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCT AACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAAC GGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCT GCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTA ACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGA AACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 97)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASTD YSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTP TAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSST GGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQS SYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRG GFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADF FKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMR GGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCG NQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGM DRGGFGGGRRGGPGGPPGPLMEQAIAYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFAN GIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDG NPIPSAIAANSGIY (SEQ ID NO: 98)
TOMM20-FUS-Py1RSAF
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAACACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGATTAC AAGGATGACGACGATAAGGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCGTGCCCGGTG CCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACT GGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATT GAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCAC AAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAG GACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCT CGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATT CCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCC ACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCA GCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGC AAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAA CGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAA TCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATT TTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGAC CGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACAT CTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATC ACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTG GATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATC GACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAA CGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 99)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND
YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG
YGSSQSSQSSYGQQSSYPGYGQQPAPSNTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY
GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG
GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF
KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG
NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP
GGGPGGSHMGGNYGDDRRGGRGGDYKDDDDKGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISAT
GLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANE
DQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGA
TASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEE
RENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLD
RALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTL
DVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID
NO: 100)
TOMM20-FUS-V5-Py1RSAF
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAACACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT
CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC
GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT
GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC
GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT
AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC
AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT
GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT
GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC
AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT
GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT
CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA
GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC
GCAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCT
GGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTG
AATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTT
AGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACA
GCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAA
TTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAA
GCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGA
AGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATT
AGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCC
CCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGAC
GAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTG
CAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGAT
CGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGAT
ACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCT
AACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAA
GAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGT
GAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGT
ATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCG
CTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAA
CACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA
(SEQ ID NO: 101)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSNTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAGKPIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPL NTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNK FLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSI SSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDL QQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLA NYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSC MVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL
(SEQ ID NO: 102)
Figure imgf000144_0001
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAGTTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGGCGCC CCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGC CTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACC ATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTT GTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGC CGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTC GTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACT GAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCT GTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAAT ACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGT CTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAA CTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAA CGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTAT ATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTG CGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATC TTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTT GCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATT GACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTG TCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTC GGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTAT AACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 103)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGGAPGSAGSAAGSGMACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGT IHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKV VSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGN TNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLE REITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKI FEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLEL SSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 104)
TOMM20-FUS-V5-PylRSAA
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT
CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT
TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC
TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT
CCCAGCAACACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC
AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT
GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT
CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC
GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT
GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC
GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT
AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC
AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT
GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT
GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC
AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT
GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT
CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA
GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC
GCAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCT
GGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTG
AATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTT
AGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACA
GCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAA
TTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAA
GCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGA
AGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATT
AGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCC
CCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGAC
GAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTG
CAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGAT
CGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGAT
ACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTAT
AACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAA
GAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGT
GAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGT
ATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCG
CTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAA
CACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA
(SEQ ID NO: 105)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSNTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAGKPIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPL NTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNK FLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSI SSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDL QQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLY NYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSC MVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL
(SEQ ID NO: 106)
TOMM20-FUS-PylRS*7^ DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAACACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGATTAC AAGGATGACGACGATAAGGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCGTGCCCGGTG CCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACT GGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATT GAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCAC AAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAG GACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCT CGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATT CCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCC ACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCA GCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGC AAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAA CGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAA TCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATT TTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGAC CGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACAT CTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATC ACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTG GATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATC GACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAA CGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 107)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSNTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGDYKDDDDKGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISAT GLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANE DQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGA TASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEE RENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLD RALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTL DVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 108)
Figure imgf000148_0001
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC
CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG
AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT
TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT
CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT
TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC
TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT
CCCAGCAACACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC
AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT
GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT
CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC
GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT
GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC
GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT
AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC
AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT
GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT
GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC
AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT
GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT
CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA
GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC
GCAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCT
GGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTG
AATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTT
AGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACA
GCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAA
TTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAA
GCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGA
AGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATT
AGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCC
CCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGAC
GAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTG
CAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGAT
CGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGAT
ACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCT
AACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAA
GAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGT
GAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGT
ATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCG
CTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAA
CACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA
(SEQ ID NO: 109)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND
YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG
YGSSQSSQSSYGQQSSYPGYGQQPAPSNTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY
GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG
GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF
KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG
NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP
GGGPGGSHMGGNYGDDRRGGRGGAIAGKPIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPL
NTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNK FLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSI SSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDL QQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLA NYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSC MVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL
(SEQ ID NO: 110)
TOMM20-EWSR1-An22
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAGTTCTTCATGGCGTCCACGGAT TACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCA CAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCT CAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCA ACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTC ACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAG CCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACA GGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCC ATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAG AGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGC AGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGC CAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAG GAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGG GGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGA GGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGAT GAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTC TTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAA ACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTT GATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGG GGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCT GGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCC CGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGA AACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCC TTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGAT CGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATG GACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATC GCAGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTA GCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCA CCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACA CGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCT GGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCT GAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGA GCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAA GCTGCAAACCCACCGCTCGAGTCTAGAGGGCCCGTTTAA (SEQ ID NO: 111)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASTD
YSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTP
TAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSST
GGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQS
SYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRG
GFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADF
FKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMR
GGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCG
NQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGM
DRGGFGGGRRGGPGGPPGPLMEQAIAGAPGSAGSAAGSGEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANP
PLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRA EKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLESRGPV (SEQ ID NO: 112)
Figure imgf000150_0001
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAGTTCTTCATGGCGTCCACGGAT TACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCA CAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCT CAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCA ACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTC ACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAG CCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACA GGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCC ATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAG AGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGC AGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGC CAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAG GAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGG GGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGA GGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGAT GAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTC TTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAA ACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTT GATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGG GGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCT GGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCC CGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGA AACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCC TTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGAT CGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATG GACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATC GCAGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTA GCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCA CCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACA CGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCT GGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCT GAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGA GCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAA GCTGCAAACCCACCGCTCGAGTCTAGAGGGCCCGTTTAA (SEQ ID NO: 113)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASTD
YSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTP
TAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSST
GGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQS
SYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRG
GFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADF
FKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMR
GGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCG
NQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGM
DRGGFGGGRRGGPGGPPGPLMEQAIAGAPGSAGSAAGSGEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANP
PLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRA
EKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLESRGPV (SEQ ID
NO: 114)
TOMM20-3xMCP-Py1RSAF
DNA: ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAGTTCTTCTATACAGATATTGAA ATGAACAGATTGGGAAAGGAGCAGAAGCTGATCTCAGAGGAGGACCTGGGCGCCCCCGGCTCCGCCGGCTCCGCC GCCGGCTCCGGCGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCC CCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGT AGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCG TACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAA GGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGTGCCCCAGGCTCC GCAGGAAGCGCAGCGGGGTCCGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGAC GTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTAC AAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGC GCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTT AAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGC GCACCTGGTAGTGCTGGTTCTGCTGCTGGATCAGGTGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGC GGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGT TCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAG GTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGC GAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCC GGCATCTACGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAG CTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGG ATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCG TGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGT AAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACA AGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCT AAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCT ACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGC GCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACA AAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTT CGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAAC TATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATT CTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTG GATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTG CCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAG TTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTT CTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATG CACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCT TGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCA CGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 115)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFYTDIE
MNRLGKEQKLISEEDLGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTC
SVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAA SGIYGAPGS
AGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKG
AWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAA SGIYGAPGSAGSAAGSGASNFTQFVLVDNG
GTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDC
ELIVKAMQGLLKDGNPIPSAIAANSGIYGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLW
MSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQT
SVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATAS
ALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEEREN
YLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRAL
PDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVM
HGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID
NO: 116)
TOMM20-FUS-3xMCP-PylRSAF
DNA: ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAACACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGAGCAG AAGCTGATCTCAGAGGAGGACCTGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCTTCTAACTTT ACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATC GCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAG AATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATT CCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCG ATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGTGCCCCAGGCTCCGCAGGAAGCGCAGCGGGGTCCGGA GCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTC GCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAG AGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATG GAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAA GATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGCGCACCTGGTAGTGCTGGTTCTGCT GCTGGATCAGGTGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCC CCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGT AGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCG TACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAA GGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGTACCGGCGCCCCC GGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACC CTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGC GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTG GAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGT ATTTCTACTAACCTGTAA (SEQ ID NO: 117) Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSNTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGEQKLISEEDLGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGI AEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNP IPSAIAANSGIYGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQ SSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGAPGSAGSA AGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRS YLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAA SGIYGTGAPGSAGSAAGSGACPVPLQLPPLERLT LDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRV SDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVP ASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELL SRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRP MLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDF KIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNG ISTNL (SEQ ID NO: 118)
Figure imgf000153_0001
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAACACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGAGCAG AAGCTGATCTCAGAGGAGGACCTGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCTTCTAACTTT ACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATC GCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAG AATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATT CCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCG ATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGTGCCCCAGGCTCCGCAGGAAGCGCAGCGGGGTCCGGA GCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTC GCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAG AGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATG GAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAA GATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGCGCACCTGGTAGTGCTGGTTCTGCT GCTGGATCAGGTGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCC CCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGT AGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCG TACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAA GGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGTACCGGCGCCCCC GGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACC CTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGC GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTG GAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGT ATTTCTACTAACCTGTAA (SEQ ID NO: 119)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSNTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGEQKLISEEDLGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGI AEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNP IPSAIAANSGIYGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQ SSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAA SGIYGAPGSAGSA AGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRS YLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAA SGIYGTGAPGSAGSAAGSGACPVPLQLPPLERLT LDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRV SDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVP ASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELL SRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRP MLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDF KIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNG ISTNL (SEQ ID NO: 120)
TOMM20-FUS-4clN22-Py1RSAF
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAACACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC
GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT
GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC
GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT
AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC
AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT
GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT
GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC
AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT
GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT
CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA
GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCCACC
ATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTC
GACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGA
CGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCT
GGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAA
CAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGC
GGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCA
AACCCACCGCTCGATTACAAGGATGACGACGATAAGGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGC
TCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAAT
ACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGC
CGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCA
CGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTC
CTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCA
ATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGC
AAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGC
AGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCG
GTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAA
ATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAA
CAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGT
GGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACC
GAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAAC
TATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAG
TCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAG
AACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATG
GTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTG
GATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACAC
GACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ
ID NO: 121)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSNTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRR RERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAG GLATMDAQTRRRERRAEKQAQWKAANPPLDYKDDDDKGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLN TLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKF LTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSIS SISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQ QIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLAN YLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCM VFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL
(SEQ ID NO: 122) ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC
CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG
AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT
TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT
CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT
TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC
TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT
CCCAGCAACACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC
AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT
GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT
CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC
GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT
GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC
GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT
AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC
AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT
GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT
GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC
AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT
GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT
CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA
GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCCACC
ATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTC
GACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGA
CGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCT
GGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAA
CAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGC
GGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCA
AACCCACCGCTCGATTACAAGGATGACGACGATAAGGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGC
TCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAAT
ACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGC
CGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCA
CGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTC
CTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCA
ATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGC
AAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGC
AGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCG
GTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAA
ATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAA
CAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGT
GGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACC
GAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAAC
TATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAG
TCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAG
AACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATG
GTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTG
GATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACAC
GACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ
ID NO: 123)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND
YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG
YGSSQSSQSSYGQQSSYPGYGQQPAPSNTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY
GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG
GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF
KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRR RERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAG GLATMDAQTRRRERRAEKQAQWKAANPPLDYKDDDDKGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLN TLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKF LTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSIS SISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQ QIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLAN YLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCM VFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL
(SEQ ID NO: 124)
LcK-EWSRl-MCP
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCGTCCACGGATTACAGTACCTATAGCCAAGCT GCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGG CAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGG CAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGC CAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTAT GCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACA AGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGC CTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCT CCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAAC ACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCC ACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGG CAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGA CCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATG AGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGT GGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCA ATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTT AAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGAT GCCACAGTGTCCTATGAAGACCCACCCACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGG AGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAG GGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATG GGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGA GGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGA ACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGAT CGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATG TTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGA GGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGATTACAAGGATGACGACGATAAGGGT ACCGAGCAGAAGCTGATCTCAGAGGAGGACCTGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCT TCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCT AACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGC TCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAA CTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGAT GGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 125)
Protein :
MGCVCSSNPEGTELASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYG QTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPT RPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQN TYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSG PGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSA IYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQG SKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGG GNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGM FRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQDYKDDDDKGTEQKLISEEDLGAPGSAGSAAGSGA SNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNME LTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY (SEQ ID NO: 126) LCK-EWSR1-4XAn22
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCGTCCACGGATTACAGTACCTATAGCCAAGCT GCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGG CAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGG CAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGC CAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTAT GCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACA AGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGC CTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCT CCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAAC ACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCC ACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGG CAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGA CCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATG AGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGT GGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCA ATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTT AAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGAT GCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGG AGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAG GGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATG GGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGA GGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGA ACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGAT CGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATG TTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGA GGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGATTACAAGGATGACGACGATAAGGGT ACCGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGC GCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCC GGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGG AAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACC ATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTC GACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGA CGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCTAGAGGGCCCGTTAAA CCCGCTGATCAGCCTCGACTGTGCCTTCTAGTTGCCAGCCATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGA
(SEQ ID NO: 127)
Protein :
MGCVCSSNPEGTELASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYG QTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPT RPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQN TYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSG PGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSA IYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQG SKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGG GNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGM FRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQDYKDDDDKGTEQKLISEEDLLATMDAQTRRRERR AEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLAT MDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLESRGPVK PADQPRLCLLVASHLLFAPPPCLP (SEQ ID NO: 128)
LcK-FUS-PylRSAF
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCCTCAAACGATTATACCCAACAAGCAACCCAA AGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGT TACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAG AACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCC CAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAACACCTCGGGAAGT TACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGT GGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAAC AGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGT GGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGAC CGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGT GGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTC AATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATC TTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAG ACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCA ACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAAT CCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGG CGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGT GGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAAC TTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCT CACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGATTACAAGGATGACGACGATAAGGGT ACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG GAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG AACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCG GGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCC TATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 129)
Protein :
MGCVCSSNPEGTELASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQ NTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSNTSGSYGSSSQSSSYGQPQSGSYSQQPSYG GQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQD RGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTI FVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGN PIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMN FSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGDYKDDDDKGTGAPGSAGSAAGSGACPVPLQLPPL ERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCK RCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQES VSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELE SELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNF CLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHL GIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSES YYNGISTNL (SEQ ID NO: 130)
LcK-FUS-MCP-PylRSAF
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCCTCAAACGATTATACCCAACAAGCAACCCAA AGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGT TACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAG AACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCC CAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGT TACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGT GGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAAC AGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGT GGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGAC CGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGT GGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTC AATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATC TTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAG ACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCA ACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAAT CCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGG CGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGT GGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAAC TTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCT CACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGATTACAAGGATGACGACGATAAGGGT ACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCTTCTAACTTTACTCAGTTCGTTCTCGTCGAC AATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAAC TCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAA GTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCC GACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCA AACTCCGGCATCTACGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCG CTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGT CTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAG ATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAA TATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGAC CAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGT GCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCT GTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACC GCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCA CTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAA CCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGT GAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCC CCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTC CGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGT GCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTG GAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACC GATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGAT GTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGAC AAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGT GCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 131)
Protein :
MGCVCSSNPEGTELASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQ
NTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYG
GQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQD
RGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTI
FVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGN
PIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMN
FSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGDYKDDDDKGTGAPGSAGSAAGSGASNFTQFVLVD
NGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNS
DCELIVKAMQGLLKDGNPIPSAIAANSGIYGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATG
LWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANED
QTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGAT
ASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEER
ENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDR
ALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLD
VMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID
NO: 132) LCK-FUS-3xMCP-PylRSAF
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCCTCAAACGATTATACCCAACAAGCAACCCAA AGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGT TACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAG AACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCC CAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAACACCTCGGGAAGT TACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGT GGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAAC AGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGT GGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGAC CGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGT GGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTC AATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATC TTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAG ACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCA ACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAAT CCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGG CGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGT GGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAAC TTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCT CACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGAGCAGAAGCTGATCTCAGAGGAGGAC CTGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCTTCTAACTTTACTCAGTTCGTTCTCGTCGAC AATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAAC TCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAA GTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCC GACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCA AACTCCGGCATCTACGGTGCCCCAGGCTCCGCAGGAAGCGCAGCGGGGTCCGGAGCTTCTAACTTTACTCAGTTC GTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGG ATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAA TACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTC GCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCA GCAATCGCAGCAAACTCCGGCATCTACGGCGCACCTGGTAGTGCTGGTTCTGCTGCTGGATCAGGTGCTTCTAAC TTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGG ATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCG CAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACC ATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAAC CCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCC GGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTG AATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTT AGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACA GCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAA TTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAA GCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGA AGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATT AGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCC CCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGAC GAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTG CAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGAT CGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGAT ACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCT AACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAA GAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGT GAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGT ATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCG CTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAA CACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 133)
Protein :
MGCVCSSNPEGTELASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQ NTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSNTSGSYGSSSQSSSYGQPQSGSYSQQPSYG GQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQD RGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTI FVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGN PIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMN FSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGEQKLISEEDLGAPGSAGSAAGSGASNFTQFVLVD NGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNS DCELIVKAMQGLLKDGNPIPSAIAANSGIYGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEW ISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPS AIAANSGIYGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSA QNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAA SGIYGTGAPGSAGSAA GSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRT ARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSG SKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKD EISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDND TELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTR ENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVK HDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 134)
LcK-FUS-PylRSAAAF
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCCTCAAACGATTATACCCAACAAGCAACCCAA AGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGT TACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAG AACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCC CAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAACACCTCGGGAAGT TACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGT GGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAAC AGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGT GGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGAC CGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGT GGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTC AATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATC TTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAG ACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCA ACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAAT CCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGG CGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGT GGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAAC TTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCT CACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGATTACAAGGATGACGACGATAAGGGT ACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG GAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG GCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCG GGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCC TATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 135)
Protein :
MGCVCSSNPEGTELASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQ NTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSNTSGSYGSSSQSSSYGQPQSGSYSQQPSYG GQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQD RGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTI FVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGN PIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMN FSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGDYKDDDDKGTGAPGSAGSAAGSGACPVPLQLPPL ERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCK RCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQES VSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELE SELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNF CLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHL GIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSES YYNGISTNL (SEQ ID NO: 136)
LcK-PylRSAF
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG GAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG AACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCG GGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCC TATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 137)
Protein :
MGCVCSSNPEGTELACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDH LWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLE NTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQT DRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPL EYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTML NFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGA GFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 138)
LcK-PylRSAAAF
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG
GAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG GCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCG GGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCC TATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 139)
Protein :
MGCVCSSNPEGTELACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDH LWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLE NTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQT DRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPL EYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTML AFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGA GFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 140)
LcK-MCP
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGAGCAGAAGCTGATCTCAGAGGAGGACCTGGGC GCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGC GGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGT TCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAG GTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGC GAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCC GGCATCTACTAA (SEQ ID NO: 141)
Protein :
MGCVCSSNPEGTELEQKLISEEDLGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSR SQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANS GIY (SEQ ID NO: 142)
FRB-CD28-FUS-PylRSAF
DNA:
ATGTGCCGAGCCATCTCTCTTAGGCGCTTGCTGCTGCTGCTGCTGCAGCTGTCACAACTCCTAGCTGTCACTCAA GGGATGCTCGAGATGTGGCATGAAGGCCTGGAAGAGGCATCTCGTTTGTACTTTGGGGAAAGGAACGTGAAAGGC ATGTTTGAGGTGCTGGAGCCCTTGCATGCTATGATGGAACGGGGCCCCCAGACTCTGAAGGAAACATCCTTTAAT CAGGCCTATGGTCGAGATTTAATGGAGGCCCAAGAGTGGTGCAGGAAGTACATGAAATCAGGGAATGTCAAGGAC CTCCTCCAAGCCTGGGACCTCTATTATCATGTGTTCCGACGAATCTCAAAGACTAGAACCGGTAAGCTTTTTTGG GCACTGGTCGTGGTTGCTGGAGTCCTGTTTTGTTATGGCTTGCTAGTGACAGTGGCTCTTTGTGTTATCTGGGTA AGATCTGGTATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGG CAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACT TCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCC CAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTAC CCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTAT GGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAA AGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGA GGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAA GACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGT GGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGC CGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGA TCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACA ATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATT AATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCT AAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGG GCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTAT GGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGA GCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGT AAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGT CGTGGTGGCAGAGGAGGCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCGTGCCCGGTGCCG CTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGT CTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAG ATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAA TATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGAC CAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGT GCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCT GTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACC GCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCA CTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAA CCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGT GAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCC CCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTC CGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGT GCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTG GAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACC GATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGAT GTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGAC AAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGT GCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 143)
Protein :
MCRAISLRRLLLLLLQLSQLLAVTQGMLEMWHEGLEEASRLYFGERNVKGMFEVLEPLHAMMERGPQTLKETSFN
QAYGRDLMEAQEWCRKYMKSGNVKDLLQAWDLYYHVFRRISKTRTGKLFWALVWAGVLFCYGLLVTVALCVIWV
RSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTP
QGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQ
SYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSG
GGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVT
IESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRR
ADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQC
KAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGGAPGSAGSAAGSGMACPVPLQLPPLERLTLDDKKPLNTLISATG
LWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANED
QTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGAT
ASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEER
ENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDR
ALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLD
VMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID
NO: 144)
FRB-CD28-FUS-PylRSAA
DNA:
ATGTGCCGAGCCATCTCTCTTAGGCGCTTGCTGCTGCTGCTGCTGCAGCTGTCACAACTCCTAGCTGTCACTCAA
GGGATGCTCGAGATGTGGCATGAAGGCCTGGAAGAGGCATCTCGTTTGTACTTTGGGGAAAGGAACGTGAAAGGC
ATGTTTGAGGTGCTGGAGCCCTTGCATGCTATGATGGAACGGGGCCCCCAGACTCTGAAGGAAACATCCTTTAAT
CAGGCCTATGGTCGAGATTTAATGGAGGCCCAAGAGTGGTGCAGGAAGTACATGAAATCAGGGAATGTCAAGGAC
CTCCTCCAAGCCTGGGACCTCTATTATCATGTGTTCCGACGAATCTCAAAGACTAGAACCGGTAAGCTTTTTTGG
GCACTGGTCGTGGTTGCTGGAGTCCTGTTTTGTTATGGCTTGCTAGTGACAGTGGCTCTTTGTGTTATCTGGGTA
AGATCTGGTATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGG
CAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACT TCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCC CAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTAC CCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTAT GGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAA AGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGA GGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAA GACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGT GGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGC CGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGA TCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACA ATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATT AATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCT AAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGG GCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTAT GGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGA GCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGT AAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGT CGTGGTGGCAGAGGAGGCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCGTGCCCGGTGCCG CTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGT CTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAG ATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAA TATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGAC CAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGT GCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCT GTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACC GCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCA CTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAA CCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGT GAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCC CCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTC CGTGTGGATAAAAACTTCTGTCTTCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGT GCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTG GAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACC GATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGAT GTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGAC AAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGT GCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 145)
Protein :
MCRAISLRRLLLLLLQLSQLLAVTQGMLEMWHEGLEEASRLYFGERNVKGMFEVLEPLHAMMERGPQTLKETSFN
QAYGRDLMEAQEWCRKYMKSGNVKDLLQAWDLYYHVFRRISKTRTGKLFWALVWAGVLFCYGLLVTVALCVIWV
RSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTP
QGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQ
SYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSG
GGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVT
IESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRR
ADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQC
KAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGGAPGSAGSAAGSGMACPVPLQLPPLERLTLDDKKPLNTLISATG
LWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANED
QTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGAT
ASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEER
ENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDR
ALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLD
VMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID
NO: 146)
FRB-CD28-EWSRl-MCP
DNA: ATGTGCCGAGCCATCTCTCTTAGGCGCTTGCTGCTGCTGCTGCTGCAGCTGTCACAACTCCTAGCTGTCACTCAA GGGATGCTCGAGATGTGGCATGAAGGCCTGGAAGAGGCATCTCGTTTGTACTTTGGGGAAAGGAACGTGAAAGGC ATGTTTGAGGTGCTGGAGCCCTTGCATGCTATGATGGAACGGGGCCCCCAGACTCTGAAGGAAACATCCTTTAAT CAGGCCTATGGTCGAGATTTAATGGAGGCCCAAGAGTGGTGCAGGAAGTACATGAAATCAGGGAATGTCAAGGAC CTCCTCCAAGCCTGGGACCTCTATTATCATGTGTTCCGACGAATCTCAAAGACTAGAACCGGTAAGCTTTTTTGG GCACTGGTCGTGGTTGCTGGAGTCCTGTTTTGTTATGGCTTGCTAGTGACAGTGGCTCTTTGTGTTATCTGGGTA AGATCTATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCC CAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACT GATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCT CCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTAT GATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCT TATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACT AGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCC CAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACA CAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAG AGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGC CAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGT AGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGAT AACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGA ATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTA GGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACT CTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATC CACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCC AAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAG CCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGT CCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCT CCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAG TGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCT GAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGA GGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGC TTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGA CCTTTGATGGAACAGGATTACAAGGATGACGACGATAAGGGTACCGAGCAGAAGCTGATCTCAGAGGAGGACCTG GGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAAT GGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCG CGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTC GAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGAC TGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAAC TCCGGCATCTACTAA (SEQ ID NO: 147)
Protein :
MCRAISLRRLLLLLLQLSQLLAVTQGMLEMWHEGLEEASRLYFGERNVKGMFEVLEPLHAMMERGPQTLKETSFN QAYGRDLMEAQEWCRKYMKSGNVKDLLQAWDLYYHVFRRISKTRTGKLFWALVWAGVLFCYGLLVTVALCVIWV RSMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQP PTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTET SQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQ SSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPD NRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVT LDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKK PPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQ CPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGG FRGGRGMDRGGFGGGRRGGPGGPPGPLMEQDYKDDDDKGTEQKLISEEDLGAPGSAGSAAGSGASNFTQFVLVDN GGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSD CELIVKAMQGLLKDGNPIPSAIAANSGIY (SEQ ID NO: 148)
FRB-CD28-EWSR1-4XAn22
DNA:
ATGTGCCGAGCCATCTCTCTTAGGCGCTTGCTGCTGCTGCTGCTGCAGCTGTCACAACTCCTAGCTGTCACTCAA
GGGATGCTCGAGATGTGGCATGAAGGCCTGGAAGAGGCATCTCGTTTGTACTTTGGGGAAAGGAACGTGAAAGGC
ATGTTTGAGGTGCTGGAGCCCTTGCATGCTATGATGGAACGGGGCCCCCAGACTCTGAAGGAAACATCCTTTAAT CAGGCCTATGGTCGAGATTTAATGGAGGCCCAAGAGTGGTGCAGGAAGTACATGAAATCAGGGAATGTCAAGGAC CTCCTCCAAGCCTGGGACCTCTATTATCATGTGTTCCGACGAATCTCAAAGACTAGAACCGGTAAGCTTTTTTGG GCACTGGTCGTGGTTGCTGGAGTCCTGTTTTGTTATGGCTTGCTAGTGACAGTGGCTCTTTGTGTTATCTGGGTA AGATCTATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCC CAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACT GATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCT CCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTAT GATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCT TATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACT AGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCC CAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACA CAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAG AGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGC CAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGT AGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGAT AACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGA ATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTA GGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACT CTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATC CACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCC AAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAG CCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGT CCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCT CCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAG TGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCT GAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGA GGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGC TTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGA CCTTTGATGGAACAGGATTACAAGGATGACGACGATAAGGGTACCGAGCAGAAGCTGATCTCAGAGGAGGACCTG CTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAAC CCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAA ACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGA GCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGC GCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCC GGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGG AAAGCTGCAAACCCACCGCTCGAGTCTAGAGGGCCCGTTTAA (SEQ ID NO: 149)
Protein :
MCRAISLRRLLLLLLQLSQLLAVTQGMLEMWHEGLEEASRLYFGERNVKGMFEVLEPLHAMMERGPQTLKETSFN
QAYGRDLMEAQEWCRKYMKSGNVKDLLQAWDLYYHVFRRISKTRTGKLFWALVWAGVLFCYGLLVTVALCVIWV
RSMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQP
PTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTET
SQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQ
SSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPD
NRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVT
LDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKK
PPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQ
CPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGG
FRGGRGMDRGGFGGGRRGGPGGPPGPLMEQDYKDDDDKGTEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAAN
PPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERR
AEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLESRGPV ( SEQ ID
NO: 150)
FUS-CD28-FUS-PylRSAF
DNA:
ATGTGCCGAGCCATCTCTCTTAGGCGCTTGCTGCTGCTGCTGCTGCAGCTGTCACAACTCCTAGCTGTCACTCAA GGGATGCTCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGG CAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACT TCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCC CAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTAC CCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTAT GGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAA AGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGA GGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAA GACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGT GGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGC CGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGA TCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACA ATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATT AATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCT AAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGG GCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTAT GGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGA GCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGT AAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGT CGTGGTGGCAGAGGAGGCGGCACCGGTAAGCTTTTTTGGGCACTGGTCGTGGTTGCTGGAGTCCTGTTTTGTTAT GGCTTGCTAGTGACAGTGGCTCTTTGTGTTATCTGGGTAAGATCTGGTATGGCCTCAAACGATTATACCCAACAA GCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGA CAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGC CAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGC CAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACC TCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCT AGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAAC CAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATG AGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGA CAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGC AGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGT GGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAAC AACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGT ATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAG GGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTC TCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGA GGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGA TTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAG AATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCA GGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGGCGCCCCCGGCTCCGCC GGCTCCGCCGCCGGCTCCGGCATGGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGAT GATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATC AAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGC CGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGAT GAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCT ACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAG GCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGT GTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATT ACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTG CTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGT CGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACC CGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATG GGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTA GCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGC CCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGT TCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATT GTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTT GTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGT CTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCT ACTAACCTGTAA (SEQ ID NO: 151)
Protein : MCRAISLRRLLLLLLQLSQLLAVTQGMLMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDT SGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSY GQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQ DQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQG SRHDSEQDNSDNNTI VQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVS DDPPSA KAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQR AGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGGTGKLFWALVWAGVLFCY GLLVTVALCVIWVRSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYG QSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQP SYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYG QQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDN NTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEF SGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCE NMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGGAPGSAGSAAGSGMACPVPLQLPPLERLTLD DKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSD EDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPAS VSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSR RKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPML APNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKI VGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGIS TNL (SEQ ID NO: 152)
FUS-CD28-FUS-PylRSAA
DNA:
ATGTGCCGAGCCATCTCTCTTAGGCGCTTGCTGCTGCTGCTGCTGCAGCTGTCACAACTCCTAGCTGTCACTCAA GGGATGCTCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGG CAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACT TCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCC CAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTAC CCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTAT GGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAA AGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGA GGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAA GACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGT GGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGC CGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGA TCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACA ATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATT AATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCT AAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGG GCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTAT GGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGA GCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGT AAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGT CGTGGTGGCAGAGGAGGCGGCACCGGTAAGCTTTTTTGGGCACTGGTCGTGGTTGCTGGAGTCCTGTTTTGTTAT GGCTTGCTAGTGACAGTGGCTCTTTGTGTTATCTGGGTAAGATCTGGTATGGCCTCAAACGATTATACCCAACAA GCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGA CAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGC CAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGC CAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACC TCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCT AGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAAC CAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATG AGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGA CAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGC AGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGT GGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAAC AACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGT ATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAG GGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTC TCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGA GGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGA TTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAG AATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCA GGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGGCGCCCCCGGCTCCGCC GGCTCCGCCGCCGGCTCCGGCATGGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGAT GACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATC AAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGC CGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGAT GAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCT ACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAG GCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGT GTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATT ACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTG CTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGT CGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACC CGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATG GGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTG GCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGC CCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGT TCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATT GTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTT GTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGC CTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCT ACGAACCTGTAA (SEQ ID NO: 153)
Protein :
MCRAISLRRLLLLLLQLSQLLAVTQGMLMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDT SGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSY GQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQ DQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQG SRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSA KAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQR AGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGGTGKLFWALVWAGVLFCY GLLVTVALCVIWVRSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYG QSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQP SYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYG QQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDN NTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEF SGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCE NMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGGAPGSAGSAAGSGMACPVPLQLPPLERLTLD DKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSD EDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPAS VSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSR RKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPML APNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKI VGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGIS TNL (SEQ ID NO: 154)
FUS-CD28-FUS-MCP-PylRSAF
DNA:
ATGTGCCGAGCCATCTCTCTTAGGCGCTTGCTGCTGCTGCTGCTGCAGCTGTCACAACTCCTAGCTGTCACTCAA GGGATGCTCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGG CAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACT TCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCC CAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTAC CCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTAT G G G C AG C C C C AGAGT G G GAG C T AC AG C C AG C AG C C T AG C TAT G GT G GAC AG C AG C AAAG C TAT G GAC AG C AG C AA AGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGA GGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAA GACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGT GGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGC CGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGA T C AC GT CAT GAC T C C GAAC AG GAT AAT T C AGAC AAC AAC AC CAT C T T T GT G C AAG GCCTGGGT GAGAAT GT T AC A AT T GAGT CTGTGGCT GAT T AC T T C AAG C AGAT T G GT AT TAT T AAGAC AAAC AAGAAAAC G G GAC AG C C CAT GAT T AATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCT AAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGG GCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTAT GGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGA GCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGT AAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGT CGTGGTGGCAGAGGAGGCGGCACCGGTAAGCTTTTTTGGGCACTGGTCGTGGTTGCTGGAGTCCTGTTTTGTTAT GGCTTGCTAGTGACAGTGGCTCTTTGTGTTATCTGGGTAAGATCTGGTATGGCCTCAAACGATTATACCCAACAA GCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGA C AG C AGAGT T AC AGT G GT TAT AG C C AGT C C AC G GAC AC T T C AG GAT AT G G C C AGAG C AG C TAT T C T T C T TAT G G C CAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGC CAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACC TCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCT AG C TAT G GT G GAC AG C AG C AAAG C TAT G GAC AG C AG C AAAG C TAT AAT C C C C C T C AG G G C TAT G GAC AG C AGAAC CAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATG AGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGA CAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGC AGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGT GGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAAC AACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGT AT TAT T AAGAC AAAC AAGAAAAC G G GAC AG C C CAT GAT T AAT T T GT AC AC AGAC AG G GAAAC T G G C AAG C T GAAG GGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTC TCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGA GGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGA TTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAG AATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCA GGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGATTACAAGGATGACGAC GATAAGGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCTTCTAACTTTACTCAGTTCGTT CTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATC AGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATAC ACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCC ACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCA ATCGCAGCAAACTCCGGCATCTACGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCGTGC CCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCT GCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATC TATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGT CACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCC AATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCC GTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCG GCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACC GGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCA GCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAAT TCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCC GAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAG ATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAA CAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAA CTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAA GAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGC ATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGAC ACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGG GGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAAC ATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 155) Protein :
MCRAISLRRLLLLLLQLSQLLAVTQGMLMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDT
SGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSY
GQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQ
DQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQG
SRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSA
KAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQR
AGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGGTGKLFWALVWAGVLFCY
GLLVTVALCVIWVRSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYG
QSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQP
SYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYG
QQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDN
NTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEF
SGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCE
NMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGDYKDDDDKGTGAPGSAGSAAGSGASNFTQFV
LVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFA
TNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLIS
ATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKA
NEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSIST
GATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYA
EERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRK
LDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGD
TLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ
ID NO: 156)
FUS-CD28-FUS-MCP-PylRSAA
DNA:
ATGTGCCGAGCCATCTCTCTTAGGCGCTTGCTGCTGCTGCTGCTGCAGCTGTCACAACTCCTAGCTGTCACTCAA GGGATGCTCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGG CAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACT TCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCC CAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTAC CCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTAT GGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAA AGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGA GGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAA GACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGT GGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGC CGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGA TCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACA ATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATT AATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCT AAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGG GCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTAT GGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGA GCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGT AAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGT CGTGGTGGCAGAGGAGGCGGCACCGGTAAGCTTTTTTGGGCACTGGTCGTGGTTGCTGGAGTCCTGTTTTGTTAT GGCTTGCTAGTGACAGTGGCTCTTTGTGTTATCTGGGTAAGATCTGGTATGGCCTCAAACGATTATACCCAACAA GCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGA CAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGC CAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGC CAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACC TCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCT AGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAAC CAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATG AGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGA CAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGC AGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGT GGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAAC AACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGT ATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAG GGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTC TCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGA GGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGA TTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAG AATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCA GGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGGCGCCCCCGGCTCCGCC GGCTCCGCCGCCGGCTCCGGCATGGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGAT GACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATC AAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGC CGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGAT GAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCT ACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAG GCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGT GTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATT ACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTG CTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGT CGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACC CGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATG GGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTG GCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGC CCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGT TCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATT GTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTT GTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGC CTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCT ACGAACCTGTAA (SEQ ID NO: 157)
Protein :
MCRAISLRRLLLLLLQLSQLLAVTQGMLMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDT SGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSY GQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQ DQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQG SRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSA KAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQR AGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGGTGKLFWALVWAGVLFCY GLLVTVALCVIWVRSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYG QSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQP SYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYG QQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDN NTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEF SGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCE NMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGGAPGSAGSAAGSGMACPVPLQLPPLERLTLD DKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSD EDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPAS VSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSR RKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPML APNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKI VGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGIS TNL (SEQ ID NO: 158)
FUS-CD28-EWSRl-MCP
DNA: ATGTGCCGAGCCATCTCTCTTAGGCGCTTGCTGCTGCTGCTGCTGCAGCTGTCACAACTCCTAGCTGTCACTCAA
GGGATGCTCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGG
CAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACT
TCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCC
CAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTAC
CCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTAT
GGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAA
AGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGA
GGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAA
GACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGT
GGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGC
CGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGA
TCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACA
ATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATT
AATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCT
AAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGG
GCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTAT
GGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGA
GCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGT
AAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGT
CGTGGTGGCAGAGGAGGCGGCACCGGTAAGCTTTTTTGGGCACTGGTCGTGGTTGCTGGAGTCCTGTTTTGTTAT
GGCTTGCTAGTGACAGTGGCTCTTTGTGTTATCTGGGTAAGATCTATGGCGTCCACGGATTACAGTACCTATAGC
CAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCA
TATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACC
TATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCA
TACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCC
TCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCA
CCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAG
CCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACT
GCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAG
CAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAG
CCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGC
TACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTT
TCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGA
GGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAG
CCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAAC
AGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGG
GTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAA
GGCGATGCCACAGTGTCCTATGAAGACCCACCCACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTT
CAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCC
CGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGT
CGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCT
GGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCC
TGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGT
GGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGT
GGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTT
GGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGATTACAAGGATGACGACGAT
AAGGGTACCGAGCAGAAGCTGATCTCAGAGGAGGACCTGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCC
GGCGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAAC
TTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGT
CAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAAT
ATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTA
AAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 159)
Protein :
MCRAISLRRLLLLLLQLSQLLAVTQGMLMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDT
SGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSY
GQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQ
DQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQG SRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSA KAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQR AGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGGTGKLFWALVWAGVLFCY GLLVTVALCVIWVRSMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTAT YGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATA PTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQ QNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGF SGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDN SAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDF QGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPS GGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPG GMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQDYKDDDDKGTEQKLISEEDLGAPGSAGSAAGS GASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLN MELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAA SGIY (SEQ ID NO: 160)
FUS-CD28-EWSR1-4XAn22
DNA:
ATGTGCCGAGCCATCTCTCTTAGGCGCTTGCTGCTGCTGCTGCTGCAGCTGTCACAACTCCTAGCTGTCACTCAA GGGATGCTCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGG CAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACT TCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCC CAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTAC CCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTAT GGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAA AGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGA GGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAA GACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGT GGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGC CGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGA TCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACA ATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATT AATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCT AAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGG GCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTAT GGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGA GCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGT AAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGT CGTGGTGGCAGAGGAGGCGGCACCGGTAAGCTTTTTTGGGCACTGGTCGTGGTTGCTGGAGTCCTGTTTTGTTAT GGCTTGCTAGTGACAGTGGCTCTTTGTGTTATCTGGGTAAGATCTATGGCGTCCACGGATTACAGTACCTATAGC CAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCA TATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACC TATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCA TACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCC TCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCA CCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAG CCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACT GCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAG CAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAG CCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGC TACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTT TCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGA GGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAG CCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAAC AGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGG GTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAA GGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTT CAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCC CGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGT CGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCT GGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCC TGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGT GGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGT GGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTT GGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGATTACAAGGATGACGACGAT AAGGGTACCGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCACAAACACGACGACGTGAG CGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCT GGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCT CAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTA GCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCA CCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACA CGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCTAGAGGGCCC GTTTAA (SEQ ID NO: 161)
Protein :
MCRAISLRRLLLLLLQLSQLLAVTQGMLMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDT SGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSY GQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQ DQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQG SRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSA KAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQR AGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGGTGKLFWALVWAGVLFCY GLLVTVALCVIWVRSMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTAT YGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATA PTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQ QNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGF SGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDN SAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDF QGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPS GGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPG GMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQDYKDDDDKGTEQKLISEEDLLATMDAQTRRRE RRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGL ATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLESRGP
V (SEQ ID NO: 162)
FRB-CD28-FUS-MCP-PylRSAA
DNA:
ATGTGCCGAGCCATCTCTCTTAGGCGCTTGCTGCTGCTGCTGCTGCAGCTGTCACAACTCCTAGCTGTCACTCAA GGGATGCTCGAGATGTGGCATGAAGGCCTGGAAGAGGCATCTCGTTTGTACTTTGGGGAAAGGAACGTGAAAGGC ATGTTTGAGGTGCTGGAGCCCTTGCATGCTATGATGGAACGGGGCCCCCAGACTCTGAAGGAAACATCCTTTAAT CAGGCCTATGGTCGAGATTTAATGGAGGCCCAAGAGTGGTGCAGGAAGTACATGAAATCAGGGAATGTCAAGGAC CTCCTCCAAGCCTGGGACCTCTATTATCATGTGTTCCGACGAATCTCAAAGACTAGAACCGGTAAGCTTTTTTGG GCACTGGTCGTGGTTGCTGGAGTCCTGTTTTGTTATGGCTTGCTAGTGACAGTGGCTCTTTGTGTTATCTGGGTA AGATCTGGTATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGG CAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACT TCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCC CAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTAC CCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTAT GGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAA AGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGA GGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAA GACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGT GGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGC CGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGA TCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACA ATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATT AATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCT AAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGG GCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTAT GGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGA GCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGT AAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGT CGTGGTGGCAGAGGAGGCGATTACAAGGATGACGACGATAAGGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCC GCCGGCTCCGGCGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCC CCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGT AGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCG TACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAA GGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGTACCGGCGCCCCC GGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACC CTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGC GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTG GAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGG ATCTCTACGAACCTGTAA (SEQ ID NO: 163)
Protein :
MCRAISLRRLLLLLLQLSQLLAVTQGMLEMWHEGLEEASRLYFGERNVKGMFEVLEPLHAMMERGPQTLKETSFN QAYGRDLMEAQEWCRKYMKSGNVKDLLQAWDLYYHVFRRISKTRTGKLFWALVWAGVLFCYGLLVTVALCVIWV RSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTP QGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQ SYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSG GGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVT IESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRR ADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQC KAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGDYKDDDDKGTGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVA PSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQ GLLKDGNPIPSAIAANSGIYGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIH KIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWS APTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTN PITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLERE ITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFE IGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSS AWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 164)
FRB-CD28-FUS-MCP-PylRSAF
DNA:
ATGTGCCGAGCCATCTCTCTTAGGCGCTTGCTGCTGCTGCTGCTGCAGCTGTCACAACTCCTAGCTGTCACTCAA
GGGATGCTCGAGATGTGGCATGAAGGCCTGGAAGAGGCATCTCGTTTGTACTTTGGGGAAAGGAACGTGAAAGGC
ATGTTTGAGGTGCTGGAGCCCTTGCATGCTATGATGGAACGGGGCCCCCAGACTCTGAAGGAAACATCCTTTAAT
CAGGCCTATGGTCGAGATTTAATGGAGGCCCAAGAGTGGTGCAGGAAGTACATGAAATCAGGGAATGTCAAGGAC
CTCCTCCAAGCCTGGGACCTCTATTATCATGTGTTCCGACGAATCTCAAAGACTAGAACCGGTAAGCTTTTTTGG
GCACTGGTCGTGGTTGCTGGAGTCCTGTTTTGTTATGGCTTGCTAGTGACAGTGGCTCTTTGTGTTATCTGGGTA
AGATCTGGTATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGG CAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACT TCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCC CAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTAC CCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTAT GGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAA AGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGA GGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAA GACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGT GGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGC CGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGA TCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACA ATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATT AATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCT AAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGG GCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTAT GGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGA GCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGT AAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGT CGTGGTGGCAGAGGAGGCGATTACAAGGATGACGACGATAAGGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCC GCCGGCTCCGGCGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCC CCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGT AGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCG TACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAA GGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGTACCGGCGCCCCC GGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACC CTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGC GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTG GAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGT ATTTCTACTAACCTGTAA (SEQ ID NO: 165)
Protein :
MCRAISLRRLLLLLLQLSQLLAVTQGMLEMWHEGLEEASRLYFGERNVKGMFEVLEPLHAMMERGPQTLKETSFN QAYGRDLMEAQEWCRKYMKSGNVKDLLQAWDLYYHVFRRISKTRTGKLFWALVWAGVLFCYGLLVTVALCVIWV RSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTP QGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQ SYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSG GGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVT IESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRR ADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQC KAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGDYKDDDDKGTGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVA PSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQ GLLKDGNPIPSAIAANSGIYGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIH KIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWS APTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTN PITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLERE ITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFE IGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSS AWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 166)
FUS-MCP-PylRSAF
DNA:
ATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTAT TCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATAT GGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATAT GGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTAT GGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCC CAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAAT CCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGT AACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGT GGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGC GGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGC AGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCAT GACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCT GTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTAC ACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCT ATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTT AATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGT GGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGAC TGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCT AAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGC AGAGGAGGCGATTACAAGGATGACGACGATAAGGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCC GGCGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAAC TTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGT CAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAAT ATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTA AAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGTACCGGCGCCCCCGGCTCCGCC GGCTCCGCCGCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGAT AAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAA CACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGC TCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAG GATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACC CGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCA CAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTG AGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACA AGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTG AATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGT AAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGC TTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGC ATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCA CCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCG TGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCA GGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTG GGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTG GGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTG CTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACT AACCTGTAA (SEQ ID NO: 167)
Protein :
MASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGY
GSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYN
PPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGG
GGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIES
VADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADF NRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAP KPDGPGGGPGGSHMGGNYGDDRRGGRGGDYKDDDDKGTGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSN FANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLL KDGNPIPSAIAANSGIYGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIK HHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPT RTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPIT SMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITR FFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGP CYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAW GPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 168)
SPD5-MCP-PylRSAF
DNA:
ATGGAGGACAACAGCGTGCTGAACGAGGACAGCAACCTGGAGCACGTGGAGGGCCAGCCCAGAAGAAGCATGAGC CAGCCCGTGCTGAACGTGGAGGGCGACAAGAGAACCAGCAGCACCAGCGCCACCCAGCAGCAGGTGCTGAGCGGC GCCTTCAGCAGCGCCGACGTGAGAAGCATCCCCATCATCCAGACCTGGGAGGAGAACAAGGCCCTGAAGACCAAG ATCACCATCCTGAGAGGCGAGCTGCAGATGTACCAGAGAAGATACAGCGAGGCCAAGGAGGCCAGCCAGAAGAGA GTGAAGGAGGTGATGGACGACTACGTGGACCTGAAGCTGGGCCAGGAGAACGTGCAGGAGAAGATGGAGCAGTAC AAGCTGATGGAGGAGGACCTGCTGGCCATGCAGAGCAGAATCGAGACCAGCGAGGACAACTTCGCCAGACAGATG AAGGAGTTCGAGGCCCAGAAGCACGCCATGGAGGAGAGAATCAAGGAGCTGGAGCTGAGCGCCACCGACGCCAAC AACACCACCGTGGGCAGCTTCAGAGGCACCCTGGACGACATCCTGAAGAAGAACGACCCCGACTTCACCCTGACC AGCGGCTACGAGGAGAGAAAGATCAACGACCTGGAGGCCAAGCTGCTGAGCGAGATCGACAAGGTGGCCGAGCTG GAGGACCACATCCAGCAGCTGAGACAGGAGCTGGACGACCAGAGCGCCAGACTGGCCGACAGCGAGAACGTGAGA GCCCAGCTGGAGGCCGCCACCGGCCAGGGCATCCTGGGCGCCGCCGGCAACGCCATGGTGCCCAACAGCACCTTC ATGATCGGCAACGGCAGAGAGAGCCAGACCAGAGACCAGCTGAACTACATCGACGACCTGGAGACCAAGCTGGCC GACGCCAAGAAGGAGAACGACAAGGCCAGACAGGCCCTGGTGGAGTACATGAACAAGTGCAGCAAGCTGGAGCAC GAGATCAGAACCATGGTGAAGAACAGCACCTTCGACAGCAGCAGCATGCTGCTGGGCGGCCAGACCAGCGACGAG CTGAAGATCCAGATCGGCAAGGTGAACGGCGAGCTGAACGTGCTGAGAGCCGAGAACAGAGAGCTGAGAATCAGA TGCGACCAGCTGACCGGCGGCGACGGCAACCTGAGCATCAGCCTGGGCCAGAGCAGACTGATGGCCGGCATCGCC ACCAACGACGTGGACAGCATCGGCCAGGGCAACGAGACCGGCGGCACCAGCATGAGAATCCTGCCCAGAGAGAGC CAGCTGGACGACCTGGAGGAGAGCAAGCTGCCCCTGATGGACACCAGCAGCGCCGTGAGAAACCAGCAGCAGTTC GCCAGCATGTGGGAGGACTTCGAGAGCGTGAAGGACAGCCTGCAGAACAACCACAACGACACCCTGGAGGGCAGC TTCAACAGCAGCATGCCCCCCCCCGGCAGAGACGCCACCCAGAGCTTCCTGAGCCAGAAGAGCTTCAAGAACAGC CCCATCGTGATGCAGAAGCCCAAGAGCCTGCACCTGCACCTGAAGAGCCACCAGAGCGAGGGCGCCGGCGAGCAG ATCCAGAACAACAGCTTCAGCACCAAGACCGCCAGCCCCCACGTGAGCCAGAGCCACATCCCCATCCTGCACGAC ATGCAGCAGATCCTGGACAGCAGCGCCATGTTCCTGGAGGGCCAGCACGACGTGGCCGTGAACGTGGAGCAGATG CAGGAGAAGATGAGCCAGATCAGAGAGGCCCTGGCCAGACTGTTCGAGAGACTGAAGAGCAGCGCCGCCCTGTTC GAGGAGATCCTGGAGAGAATGGGCAGCAGCGACCCCAACGCCGACAAGATCAAGAAGATGAAGCTGGCCTTCGAG ACCAGCATCAACGACAAGCTGAACGTGAGCGCCATCCTGGAGGCCGCCGAGAAGGACCTGCACAACATGAGCCTG AACTTCAGCATCCTGGAGAAGAGCATCGTGAGCCAGGCCGCCGAGGCCAGCAGAAGATTCACCATCGCCCCCGAC GCCGAGGACGTGGCCAGCAGCAGCCTGCTGAACGCCAGCTACAGCCCCCTGTTCAAGTTCACCAGCAACAGCGAC ATCGTGGAGAAGCTGCAGAACGAGGTGAGCGAGCTGAAGAACGAGCTGGAGATGGCCAGAACCAGAGACATGAGA AGCCCCCTGAACGGCAGCAGCGGCAGACTGAGCGACGTGCAGATCAACACCAACAGAATGTTCGAGGACCTGGAG GTGAGCGAGGCCACCCTGCAGAAGGCCAAGGAGGAGAACAGCACCCTGAAGAGCCAGTTCGCCGAGCTGGAGGCC AACCTGCACCAGGTGAACAGCAAGCTGGGCGAGGTGAGATGCGAGCTGAACGAGGCCCTGGCCAGAGTGGACGGC GAGCAGGAGACCAGAGTGAAGGCCGAGAACGCCCTGGAGGAGGCCAGACAGCTGATCAGCAGCCTGAAGCACGAG GAGAACGAGCTGAAGAAGACCATCACCGACATGGGCATGAGACTGAACGAGGCCAAGAAGAGCGACGAGTTCCTG AAGAGCGAGCTGAGCACCGCCCTGGAGGAGGAGAAGAAGAGCCAGAACCTGGCCGACGAGCTGAGCGAGGAGCTG AACGGCTGGAGAATGAGAACCAAGGAGGCCGAGAACAAGGTGGAGCACGCCAGCAGCGAGAAGAGCGAGATGCTG GAGAGAATCGTGCACCTGGAGACCGAGATGGAGAAGCTGAGCACCAGCGAGATCGCCGCCGACTACTGCAGCACC AAGATGACCGAGAGAAAGAAGGAGATCGAGCTGGCCAAGTACAGAGAGGACTTCGAGAACGCCGCCATCGTGGGC CTGGAGAGAATCAGCAAGGAGATCAGCGAGCTGACCAAGAAGACCCTGAAGGCCAAGATCATCCCCAGCAACATC AGCAGCATCCAGCTGGTGTGCGACGAGCTGTGCAGAAGACTGAGCAGAGAGAGAGAGCAGCAGCACGAGTACGCC AAGGTGATGAGAGACGTGAACGAGAAGATCGAGAAGCTGCAGCTGGAGAAGGACGCCCTGGAGCACGAGCTGAAG ATGATGAGCAGCAACAACGAGAACGTGCCCCCCGTGGGCACCAGCGTGAGCGGCATGCCCACCAAGACCAGCAAC CAGAAGTGCGCCCAGCCCCACTACACCAGCCCCACCAGACAGCTGCTGCACGAGAGCACCATGGCCGTGGACGCC ATCGTGCAGAAGCTGAAGAAGACCCACAACATGAGCGGCATGGGCCCCGAGCTGAAGGAGACCATCGGCAACGTG ATCAACGAGAGCAGAGTGCTGAGAGACTTCCTGCACCAGAAGCTGATCCTGTTCAAGGGCATCGACATGAGCAAC TGGAAGAACGAGACCGTGGACCAGCTGATCACCGACCTGGGCCAGCTGCACCAGGACAACCTGATGCTGGAGGAG CAGATCAAGAAGTACAAGAAGGAGCTGAAGCTGACCAAGAGCGCCATCCCCACCCTGGGCGTGGAGTTCCAGGAC AGAATCAAGACCGAGATCGGCAAGATCGCCACCGACATGGGCGGCGCCGTGAAGGAGATCAGAAAGAAGGGTACC GAGCAGAAGCTGATCTCAGAGGAGGACCTGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCGCTTCT AACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAAC GGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCT GCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTA ACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGA AACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCC GCCGGCTCCGGCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCG CTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAG GTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGT ACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAAC AAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAA AAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCT GGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGC ATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCT GCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAA GACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGAC CTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTG GATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAAT GATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTG GCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGT AAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACT CGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGC TGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATC CCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTA AAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA
(SEQ ID NO: 169)
Protein :
MEDNSVLNEDSNLEHVEGQPRRSMSQPVLNVEGDKRTSSTSATQQQVLSGAFSSADVRSIPIIQTWEENKALKTK ITILRGELQMYQRRYSEAKEASQKRVKEVMDDYVDLKLGQENVQEKMEQYKLMEEDLLAMQSRIETSEDNFARQM KEFEAQKHAMEERIKELELSATDANNTTVGSFRGTLDDILKKNDPDFTLTSGYEERKINDLEAKLLSEIDKVAEL EDHIQQLRQELDDQSARLADSENVRAQLEAATGQGILGAAGNAMVPNSTFMIGNGRESQTRDQLNYIDDLETKLA DAKKENDKARQALVEYMNKCSKLEHEIRTMVKNSTFDSSSMLLGGQTSDELKIQIGKVNGELNVLRAENRELRIR CDQLTGGDGNLSISLGQSRLMAGIATNDVDSIGQGNETGGTSMRILPRESQLDDLEESKLPLMDTSSAVRNQQQF ASMWEDFESVKDSLQNNHNDTLEGSFNSSMPPPGRDATQSFLSQKSFKNSPIVMQKPKSLHLHLKSHQSEGAGEQ IQNNSFSTKTASPHVSQSHIPILHDMQQILDSSAMFLEGQHDVAVNVEQMQEKMSQIREALARLFERLKSSAALF EEILERMGSSDPNADKIKKMKLAFETSINDKLNVSAILEAAEKDLHNMSLNFSILEKSIVSQAAEASRRFTIAPD AEDVASSSLLNASYSPLFKFTSNSDIVEKLQNEVSELKNELEMARTRDMRSPLNGSSGRLSDVQINTNRMFEDLE VSEATLQKAKEENSTLKSQFAELEANLHQVNSKLGEVRCELNEALARVDGEQETRVKAENALEEARQLISSLKHE ENELKKTITDMGMRLNEAKKSDEFLKSELSTALEEEKKSQNLADELSEELNGWRMRTKEAENKVEHASSEKSEML ERIVHLETEMEKLSTSEIAADYCSTKMTERKKEIELAKYREDFENAAIVGLERISKEISELTKKTLKAKIIPSNI SSIQLVCDELCRRLSREREQQHEYAKVMRDVNEKIEKLQLEKDALEHELKMMSSNNENVPPVGTSVSGMPTKTSN QKCAQPHYTSPTRQLLHESTMAVDAIVQKLKKTHNMSGMGPELKETIGNVINESRVLRDFLHQKLILFKGIDMSN WKNETVDQLITDLGQLHQDNLMLEEQIKKYKKELKLTKSAIPTLGVEFQDRIKTEIGKIATDMGGAVKEIRKKGT EQKLISEEDLGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSS AQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAA SGIYGTGAPGSAGSA AGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSR TARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPS GSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPK DEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDN DTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCT RENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKV KHDFKNIKRAARSESYYNGISTNL (SEQ ID NO: 170)
9. POIs (reporters) and controls GI?p39TAG (GFP with Amber-encoded amino acid position 39)
DNA: (Amber codon underlined)
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGATTACAAAGATGAT GATGATAAAGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTA AACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTAGGGCAAGCTGACCCTGAAGTTCATC TGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGC CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACC ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGC CACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC AACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAG TTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGCATCACCATCACCATCACTAA (SEQ ID NO: 171)
Protein: (X indicates non-canonical amino acid)
MGRLESTPPKKKRKVEDSASDYKDDDDKVSKGEELFTGWPILVELDGDVNGHKFSVSGEGEGDATXGKLTLKFI CTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNR IELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKA FKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPD NHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHH (SEQ ID NO: 172)
Gj?p39TAG_2xMS2 (GFP39TAG with 2 MS2 stem-loops)
DNA: (MS2 stem-loops and Amber codon underlined)
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGATTACAAAGATGAT GATGATAAAGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTA AACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTAGGGCAAGCTGACCCTGAAGTTCATC TGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGC CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACC ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGC CACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC AACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAG TTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGCATCACCATCACCATCACTAAGGATCC TAAGGTACCTAATTGCCTAGAAAACATGAGGATCACCCATGTCTGCAGGTCGACTCTAGAAAACATGAGGATCAC CCATGT (SEQ ID NO: 173)
Protein: (X indicates non-canonical amino acid)
MGRLESTPPKKKRKVEDSASDYKDDDDKVSKGEELFTGWPILVELDGDVNGHKFSVSGEGEGDATXGKLTLKFI CTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNR IELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPD NHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHH (SEQ ID NO: 174)
mOrange
DNA:
ATGGTGAGCAAGGGCGAGGAGAATAATATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCGCATGGAGGGC ACCGTGAACGGCCACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCTTTCAGACCGCTAAG CTGAAGGTGACCAAGGGCGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCTCTTCACCTACGGCTCCAAG GCCTACGTGAAGCACCCCGCCGACATCCCCGACTACTTCAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGAGCGC GTGATGAACTACGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCACTGCAGGACGGCGAGTTCATCTAC AAGGTGAAGATGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTGATGCAGAAGAAGACCATGGGCTGGGAGGCC TCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAGGATGAGGCTGAAGCTGAAGGACGGC GGCCACTACACCTCCGAGGTCAAGACCACCTACAAGGCCAAGAAGTCCGTGCAGCTGCCCGGCGCCTACATCGTC GGCATCAAGCTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGC CACTCCACCGGCGGCATGGACGAGCTGTACAAGTAA (SEQ ID NO: 175)
Protein :
MVSKGEENNMAIIKEFMRFKVRMEGTVNGHEFEIEGEGEGRPYEGFQTAKLKVTKGGPLPFAWDILSPLFTYGSK
AYVKHPADIPDYFKLSFPEGFKWERVMNYEDGGWTVTQDSSLQDGEFIYKVKMRGTNFPSDGPVMQKKTMGWEA SSERMYPEDGALKGEIRMRLKLKDGGHYTSEVKTTYKAKKSVQLPGAYIVGIKLDITSHNEDYTIVEQYERAEGR HSTGGMDELYK (SEQ ID NO: 176)
iRFP (near-infrared fluorescent protein)
DNA:
GAAGGATCCGTCGCCAGGCAGCCTGACCTCTTGACCTGCGACGATGAGCCGATCCATATCCCCGGTGCCATCCAA CCGCATGGACTGCTGCTCGCCCTCGCCGCCGACATGACGATCGTTGCCGGCAGCGACAACCTTCCCGAACTCACC GGACTGGCGATCGGCGCCCTGATCGGCCGCTCTGCGGCCGATGTCTTCGACTCGGAGACGCACAACCGTCTGACG ATCGCCTTGGCCGAGCCCGGGGCGGCCGTCGGAGCACCGATCACTGTCGGCTTCACGATGCGAAAGGACGCAGGC TTCATCGGCTCCTGGCATCGCCATGATCAGCTCATCTTCCTCGAGCTCGAGCCTCCCCAGCGGGACGTCGCCGAG CCGCAGGCGTTCTTCCGCCGCACCAACAGCGCCATCCGCCGCCTGCAGGCCGCCGAAACCTTGGAAAGCGCCTGC GCCGCCGCGGCGCAAGAGGTGCGGAAGATTACCGGCTTCGATCGGGTGATGATCTATCGCTTCGCCTCCGACTTC AGCGGCGAAGTGATCGCAGAGGATCGGTGCGCCGAGGTCGAGTCAAAACTAGGCCTGCACTATCCTGCCTCAACC GTGCCGGCGCAGGCCCGTCGGCTCTATACCATCAACCCGGTACGGATCATTCCCGATATCAATTATCGGCCGGTG CCGGTCACCCCAGACCTCAATCCGGTCACCGGGCGGCCGATTGATCTTAGCTTCGCCATCCTGCGCAGCGTCTCG CCCGTCCATCTGGAATTCATGCGCAACATAGGCATGCACGGCACGATGTCGATCTCGATTTTGCGCGGCGAGCGA CTGTGGGGATTGATCGTTTGCCATCACCGAACGCCGTACTACGTCGATCTCGATGGCCGCCAAGCCTGCGAGCTA GTCGCCCAGGTTCTGGCCTGGCAGATCGGCGTGATGGAAGAG (SEQ ID NO: 177)
Protein :
EGSVARQPDLLTCDDEPIHIPGAIQPHGLLLALAADMTIVAGSDNLPELTGLAIGALIGRSAADVFDSETHNRLT IALAEPGAAVGAPITVGFTMRKDAGFIGSWHRHDQLIFLELEPPQRDVAEPQAFFRRTNSAIRRLQAAETLESAC AAAAQEVRKITGFDRVMIYRFASDFSGEVIAEDRCAEVESKLGLHYPASTVPAQARRLYTINPVRIIPDINYRPV PVTPDLNPVTGRPIDLSFAILRSVSPVHLEFMRNIGMHGTMSISILRGERLWGLIVCHHRTPYYVDLDGRQACEL VAQVLAWQIGVMEE (SEQ ID NO: 178)
mCherry185TAG (mCherry with Amber-encoded amino acid position 185)
DNA: (Amber codon underlined)
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGTGAGCAAGGGCGAG GAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAG TTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGT GGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCC GCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGAC GGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGC ACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAGACGATGGGCTGGGAGGCCTCCTCCGAGCGGATGTAC CCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAG GTCAAGACCACCTACAAGGCCAAGTAGCCCGTGCAGCTGCCCGGCGCCTACAACGTCAACATCAAGTTGGACATC ACCTCCCACAACGAGGACTACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATG GACGAGCTGTACAAGCATCATCATCATCATCATTAA (SEQ ID NO: 179)
Protein: (X non-canonical amino acid)
MGRLESTPPKKKRKVEDSASVSKGEEDNMAIIKEFMRFKVHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKG GPLPFAWDILSPQFMYGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGWTVTQDSSLQDGEFIYKVKLRG TNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKAKXPVQLPGAYNVNIKLDI TSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH (SEQ ID NO: 180)
mCherry185TAG-2xMS2 (mCherry185TAG with 2 MS2 RNA stem-loops)
DNA: (MS2 stem-loops and Amber codon underlined)
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGTGAGCAAGGGCGAG GAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAG TTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGT GGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCC GCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGAC GGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGC ACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAGACGATGGGCTGGGAGGCCTCCTCCGAGCGGATGTAC CCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAG GTCAAGACCACCTACAAGGCCAAGTAGCCCGTGCAGCTGCCCGGCGCCTACAACGTCAACATCAAGTTGGACATC ACCTCCCACAACGAGGACTACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATG GACGAGCTGTACAAGCATCATCATCATCATCATTAAAGATCCTAAGGTACCTAATTGCCTAGAAAACATGAGGAT CACCCATGTCTGCAGGTCGACTCTAGAAAACATGAGGATCACCCATGT (SEQ ID NO: 181) Protein: (X indicates non-canonical amino acid)
MGRLESTPPKKKRKVEDSASVSKGEEDNMAIIKEFMRFKVHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKG GPLPFAWDILSPQFMYGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGWTVTQDSSLQDGEFIYKVKLRG TNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKAKXPVQLPGAYNVNIKLDI TSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH (SEQ ID NO: 182)
mCherry185TAG-4xBoxB (mCherry185TAG with 4 BoxB-loops)
DNA: (BoxB-stem loops and Amber codon underlined)
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGTGAGCAAGGGCGAG GAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAG TTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGT GGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCC GCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGAC GGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGC ACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAGACGATGGGCTGGGAGGCCTCCTCCGAGCGGATGTAC CCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAG GTCAAGACCACCTACAAGGCCAAGTAGCCCGTGCAGCTGCCCGGCGCCTACAACGTCAACATCAAGTTGGACATC ACCTCCCACAACGAGGACTACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATG GACGAGCTGTACAAGCATCATCATCATCATCATTAAAGATCCTAAGGTACCGCCCTGAAAAAGGGCTCGAGCCCT GAAAAAGGGCAATTGCCCTGAAAAAGGGCGTCGACGCCCTGAAAAAGGGC (SEQ ID NO: 183)
Protein: (X indicates non-canonical amino acid)
MGRLESTPPKKKRKVEDSASVSKGEEDNMAIIKEFMRFKVHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKG GPLPFAWDILSPQFMYGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGWTVTQDSSLQDGEFIYKVKLRG TNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKAKXPVQLPGAYNVNIKLDI TSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH (SEQ ID NO: 184)
GFP39TAA-2xMS2 (GFP with Ochre-encoded amino acid position 39 and 2 MS2 RNA stem-loops )
DNA: (MS2 stem-loops and Ochre codon underlined)
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGATTACAAAGATGAT GATGATAAAGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTA AACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTAAGGCAAGCTGACCCTGAAGTTCATC TGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGC CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACC ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGC CACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC AACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAG TTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGCATCACCATCACCATCACTGAGGATCC TAAGGTACCTAATTGCCTAGAAAACATGAGGATCACCCATGTCTGCAGGTCGACTCTAGAAAACATGAGGATCAC CCATGT (SEQ ID NO:185)
Protein: (X indicates non-canonical amino acid)
MGRLESTPPKKKRKVEDSASDYKDDDDKVSKGEELFTGWPILVELDGDVNGHKFSVSGEGEGDATXGKLTLKFI CTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNR IELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPD NHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHH (SEQ ID NO: 186)
mCherry185TAA-2xMS2 (mCherry with Ochre-encoded amino acid position 185 and 2 MS2 RNA stem-loops)
DNA: (MS2 stem-loops and Ochre codon underlined)
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGTGAGCAAGGGCGAG
GAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAG
TTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGT
GGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCC
GCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGAC
GGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGC
ACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAGACGATGGGCTGGGAGGCCTCCTCCGAGCGGATGTAC CCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAG GTCAAGACCACCTACAAGGCCAAGTAACCCGTGCAGCTGCCCGGCGCCTACAACGTCAACATCAAGTTGGACATC ACCTCCCACAACGAGGACTACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATG GACGAGCTGTACAAGCATCATCATCATCATCATTGAAGATCCTAAGGTACCTAATTGCCTAGAAAACATGAGGAT CACCCATGTCTGCAGGTCGACTCTAGAAAACATGAGGATCACCCATGT (SEQ ID NO: 187)
Protein: (X indicates non-canonical amino acid)
MGRLESTPPKKKRKVEDSASVSKGEEDNMAIIKEFMRFKVHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKG GPLPFAWDILSPQFMYGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGWTVTQDSSLQDGEFIYKVKLRG TNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKAKXPVQLPGAYNVNIKLDI TSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH (SEQ ID NO: 188)
GFP39TGA-2XMS2 (GFP with Opal-encoded amino acid position 39 and 2 MS2 RNA stem-loops )
DNA: (MS2 stem-loops and Opal codon underlined)
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGATTACAAAGATGAT GATGATAAAGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTA AACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTGAGGCAAGCTGACCCTGAAGTTCATC TGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGC CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACC ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGC CACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC AACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAG TTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGCATCACCATCACCATCACTAAGGATCC TAAGGTACCTAATTGCCTAGAAAACATGAGGATCACCCATGTCTGCAGGTCGACTCTAGAAAACATGAGGATCAC CCATGT (SEQ ID NO:189)
Protein: (X indicates non-canonical amino acid)
MGRLESTPPKKKRKVEDSASDYKDDDDKVSKGEELFTGWPILVELDGDVNGHKFSVSGEGEGDATXGKLTLKFI CTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLWR IELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPD NHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHH (SEQ ID NO: 190)
mCherry185TGA-2xMS2 (mCherry with Opal-encoded amino acid position 185 and 2 MS2 RNA stem-loops)
DNA: (MS2 stem-loops and Opal codon underlined)
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGTGAGCAAGGGCGAG GAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAG TTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGT GGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCC GCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGAC GGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGC ACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAGACGATGGGCTGGGAGGCCTCCTCCGAGCGGATGTAC CCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAG GTCAAGACCACCTACAAGGCCAAGTGACCCGTGCAGCTGCCCGGCGCCTACAACGTCAACATCAAGTTGGACATC ACCTCCCACAACGAGGACTACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATG GACGAGCTGTACAAGCATCATCATCATCATCATTAAAGATCCTAAGGTACCTAATTGCCTAGAAAACATGAGGAT CACCCATGTCTGCAGGTCGACTCTAGAAAACATGAGGATCACCCATGT (SEQ ID NO: 191)
Protein: (X indicates non-canonical amino acid)
MGRLESTPPKKKRKVEDSASVSKGEEDNMAIIKEFMRFKVHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKG GPLPFAWDILSPQFMYGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGWTVTQDSSLQDGEFIYKVKLRG TNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKAKXPVQLPGAYNVNIKLDI TSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH (SEQ ID NO: 192)
Nupl53 ( Homo sapiens Nucleoporin 153; Uniprot: P49790)
DNA:
ATGGCGTCTGGTGCTGGCGGTGTTGGTGGAGGAGGTGGGGGTAAAATTCGTACTCGTCGCTGTCATCAAGGTCCG
ATTAAACCGTATCAGCAGGGACGTCAGCAACATCAGGGTATTCTGAGCCGTGTGACCGAAAGCGTGAAAAACATT GTGCCGGGTTGGCTGCAACGTTATTTCAACAAAAATGAGGATGTGTGTTCGTGTTCTACCGATACCAGTGAAGTT CCTCGTTGGCCGGAAAACAAAGAAGATCACCTGGTGTATGCCGATGAAGAATCGAGCAATATCACCGATGGCCGT ATTACTCCTGAACCGGCGGTTAGTAACACTGAAGAACCGTCAACCACAAGCACAGCATCGAACTATCCAGATGTC CTGACTCGCCCTTCTCTGCACCGTTCTCACCTGAACTTTAGCATGCTGGAATCACCAGCTCTGCATTGTCAGCCG TCTACCAGTAGTGCCTTCCCGATTGGCTCTAGTGGCTTTTCGCTGGTCAAAGAGATCAAAGACTCGACCTCTCAA CATGACGATGATAACATTAGCACGACCTCGGGTTTTAGTAGCCGTGCCTCCGATAAAGACATTACCGTGAGCAAA AACACCTCTCTGCCGCCTCTGTGGAGTCCTGAAGCCGAACGCTCTCATAGTCTGTCTCAGCACACAGCCACCAGT TCCAAAAAACCAGCCTTCAACCTGAGCGCCTTTGGTACACTGTCACCGAGCCTGGGAAATTCCTCTATCCTGAAA ACATCACAGCTGGGCGATAGTCCGTTTTATCCGGGCAAAACGACGTATGGTGGTGCCGCTGCTGCTGTTCGCCAG TCTAAACTGCGTAACACTCCGTATCAAGCTCCAGTCCGTCGCCAAATGAAAGCAAAACAACTGTCGGCCCAGTCT TATGGTGTGACAAGCTCTACAGCTCGTCGTATCCTGCAAAGTCTGGAGAAAATGTCATCTCCGCTGGCAGATGCC AAACGTATTCCGTCCATTGTGAGCAGTCCGCTGAATAGCCCGCTGGACCGTAGTGGGATCGATATCACCGACTTC CAAGCCAAACGTGAGAAAGTGGATAGCCAGTATCCGCCTGTACAACGTCTGATGACCCCGAAACCGGTTTCAATT GCCACGAATCGTAGCGTGTATTTCAAACCGTCACTGACCCCTAGTGGTGAGTTTCGTAAAACAAATCAGCGTATC GACAACAAATGCTCTACCGGGTATGAAAAAAACATGACGCCGGGACAGAATCGTGAACAACGTGAATCTGGCTTC TCTTATCCGAACTTTAGTCTGCCGGCAGCAAATGGTCTGAGTAGCGGTGTAGGAGGTGGTGGGGGCAAAATGCGC CGTGAACGTCACGCCTTTGTGGCCTCTAAACCTCTGGAAGAAGAAGAGATGGAGGTTCCTGTACTGCCGAAAATC AGTCTGCCTATCACCTCTTCAAGTCTGCCGACCTTCAACTTTTCTAGTCCGGAAATCACAACCTCTAGCCCGTCA CCGATTAATAGCAGTCAAGCACTGACGAATAAAGTCCAAATGACCTCACCGAGTTCTACGGGTTCTCCGATGTTC AAATTCTCTAGTCCTATCGTGAAATCAACCGAAGCGAACGTCCTGCCTCCTTCTAGTATTGGGTTCACCTTTAGC GTCCCAGTGGCCAAAACAGCTGAACTGAGCGGTAGCAGTAGTACTCTGGAACCGATTATCAGCTCAAGCGCCCAT CATGTCACTACCGTGAATAGCACAAACTGTAAAAAAACGCCGCCTGAGGACTGTGAAGGACCGTTTCGTCCTGCC GAAATCCTGAAAGAAGGTTCCGTCCTGGACATTCTGAAATCTCCGGGATTTGCCTCTCCTAAAATCGACTCTGTT GCCGCTCAACCAACTGCCACATCACCGGTGGTTTATACTCGTCCGGCGATTAGCAGTTTTAGCAGTAGTGGCATC GGTTTTGGTGAATCCCTGAAAGCTGGCTCATCTTGGCAGTGTGACACCTGCCTGCTGCAAAACAAAGTGACCGAT AACAAATGTATTGCCTGTCAGGCCGCCAAACTGTCTCCTCGTGATACAGCCAAACAGACCGGCATCGAAACCCCT AATAAAAGCGGGAAAACGACCCTGTCAGCAAGTGGTACGGGATTTGGGGACAAATTCAAACCTGTGATCGGCACA TGGGACTGTGACACTTGTCTGGTACAGAACAAACCAGAAGCGATCAAATGTGTGGCCTGTGAAACGCCTAAACCT GGAACATGTGTGAAACGTGCCCTGACTCTGACTGTTGTGTCAGAAAGCGCCGAAACCATGACGGCAAGCAGCTCA TCCTGTACTGTGACTACCGGGACTCTGGGATTTGGTGACAAATTCAAACGCCCGATTGGTTCCTGGGAATGCTCC GTGTGTTGTGTGAGCAATAATGCCGAGGACAACAAATGTGTGTCCTGTATGAGCGAGAAACCTGGCAGCTCTGTT CCTGCTAGCAGCTCTAGCACAGTTCCTGTTAGTCTGCCTAGTGGTGGTTCTCTGGGTCTGGAAAAATTCAAAAAA CCTGAAGGAAGCTGGGATTGTGAGCTGTGCCTGGTACAGAATAAAGCGGATAGCACGAAATGTCTGGCCTGTGAG TCAGCCAAACCAGGGACTAAAAGCGGCTTTAAAGGCTTCGACACGTCGAGCAGTTCTAGTAACAGCGCCGCCTCA TCATCTTTCAAATTTGGGGTGAGCAGCTCCTCTAGTGGTCCTAGTCAAACACTGACCTCTACCGGAAACTTCAAA TTCGGCGATCAGGGTGGCTTCAAAATTGGTGTCTCCTCTGATTCGGGTAGCATTAACCCGATGAGTGAGGGGTTC AAATTCAGCAAACCAATTGGCGATTTCAAATTCGGTGTGTCGTCTGAATCCAAACCTGAAGAAGTCAAAAAAGAC AGCAAAAACGACAATTTCAAATTCGGCCTGTCTAGTGGTCTGTCTAATCCGGTTAGCCTGACCCCGTTTCAGTTC GGGGTGTCTAATCTGGGTCAGGAAGAGAAAAAAGAGGAGCTGCCTAAAAGTTCATCTGCCGGGTTCAGTTTTGGT ACAGGCGTGATCAATAGCACTCCAGCACCAGCCAATACAATCGTGACGAGCGAGAACAAATCGAGCTTCAACCTG GGGACAATCGAAACGAAAAGCGCCAGTGTAGCGCCATTCACGTGTAAAACCTCCGAGGCAAAAAAAGAAGAGATG CCGGCCACAAAAGGTGGATTCTCATTCGGCAACGTGGAACCGGCTAGCCTGCCATCAGCAAGCGTGTTTGTACTG GGCCGTACCGAGGAGAAACAGCAGGAACCTGTTACTAGCACCAGTCTGGTCTTTGGTAAAAAAGCCGACAACGAA GAACCGAAATGTCAGCCAGTGTTCAGCTTCGGCAATAGCGAACAGACGAAAGACGAAAACAGCAGCAAATCGACG TTCAGCTTCAGTATGACGAAACCGAGCGAAAAAGAAAGTGAGCAGCCAGCAAAAGCAACGTTCGCCTTTGGAGCA CAGACATCAACCACAGCCGATCAAGGAGCAGCGAAACCAGTTTTCAGTTTTCTGAATAACAGCTCAAGCAGCAGT TCTACACCAGCAACCTCAGCAGGTGGTGGGATCTTTGGATCAAGCACCTCATCCAGCAATCCGCCAGTGGCAACA TTCGTGTTTGGCCAGAGCAGTAATCCGGTGTCATCTTCAGCATTTGGGAATACCGCCGAGAGTAGCACATCACAG TCTCTGCTGTTCTCACAGGACTCTAAACTGGCAACCACCTCTTCTACTGGTACAGCGGTTACCCCGTTTGTGTTC GGTCCGGGAGCATCATCCAATAATACCACGACGTCGGGCTTTGGGTTTGGTGCCACGACAACAAGCAGTAGCGCT GGTAGCAGCTTTGTCTTTGGCACAGGTCCTTCAGCACCTTCTGCTTCACCAGCTTTCGGAGCCAATCAGACTCCG ACATTCGGACAGTCACAGGGTGCCTCTCAACCAAATCCTCCGGGTTTTGGCAGTATTAGCAGTAGTACCGCCCTG TTCCCGACCGGTAGTCAACCGGCACCGCCAACATTTGGAACGGTTAGCAGTAGTAGTCAGCCTCCGGTGTTTGGA CAACAACCGAGCCAGAGCGCCTTCGGATCAGGAACGACCCCTAATAGTAGCAGTGCCTTCCAGTTCGGTAGCAGT ACCACCAACTTCAACTTCACGAACAATAGCCCGTCAGGTGTGTTCACGTTTGGCGCCAATTCTTCTACCCCAGCG GCAAGTGCTCAACCTTCAGGCTCAGGTGGATTTCCTTTCAACCAGTCACCAGCAGCGTTTACTGTTGGTTCTAAC GGGAAAAACGTTTTCAGTAGCAGCGGCACCTCGTTTTCTGGTCGTAAAATCAAAACGGCCGTTCGTCGCCGTAAA
(SEQ ID NO: 193) Protein :
MASGAGGVGGGGGGKIRTRRCHQGPIKPYQQGRQQHQGILSRVTESVKNIVPGWLQRYFNKNEDVCSCSTDTSEV PRWPENKEDHLVYADEESSNITDGRITPEPAVSNTEEPSTTSTASNYPDVLTRPSLHRSHLNFSMLESPALHCQP STSSAFPIGSSGFSLVKEIKDSTSQHDDDNISTTSGFSSRASDKDITVSKNTSLPPLWSPEAERSHSLSQHTATS SKKPAFNLSAFGTLSPSLGNSSILKTSQLGDSPFYPGKTTYGGAAAAVRQSKLRNTPYQAPVRRQMKAKQLSAQS YGVTSSTARRILQSLEKMSSPLADAKRIPSIVSSPLNSPLDRSGIDITDFQAKREKVDSQYPPVQRLMTPKPVSI ATNRSVYFKPSLTPSGEFRKTNQRIDNKCSTGYEKNMTPGQNREQRESGFSYPNFSLPAANGLSSGVGGGGGKMR RERHAFVASKPLEEEEMEVPVLPKISLPITSSSLPTFNFSSPEITTSSPSPINSSQALTNKVQMTSPSSTGSPMF KFSSPIVKSTEANVLPPSSIGFTFSVPVAKTAELSGSSSTLEPIISSSAHHVTTVNSTNCKKTPPEDCEGPFRPA EILKEGSVLDILKSPGFASPKIDSVAAQPTATSPWYTRPAISSFSSSGIGFGESLKAGSSWQCDTCLLQNKVTD NKCIACQAAKLSPRDTAKQTGIETPNKSGKTTLSASGTGFGDKFKPVIGTWDCDTCLVQNKPEAIKCVACETPKP GTCVKRALTLTWSESAETMTASSSSCTVTTGTLGFGDKFKRPIGSWECSVCCVSNNAEDNKCVSCMSEKPGSSV PASSSSTVPVSLPSGGSLGLEKFKKPEGSWDCELCLVQNKADSTKCLACESAKPGTKSGFKGFDTSSSSSNSAAS SSFKFGVSSSSSGPSQTLTSTGNFKFGDQGGFKIGVSSDSGSINPMSEGFKFSKPIGDFKFGVSSESKPEEVKKD SKNDNFKFGLSSGLSNPVSLTPFQFGVSNLGQEEKKEELPKSSSAGFSFGTGVINSTPAPANTIVTSENKSSFNL GTIETKSASVAPFTCKTSEAKKEEMPATKGGFSFGNVEPASLPSASVFVLGRTEEKQQEPVTSTSLVFGKKADNE EPKCQPVFSFGNSEQTKDENSSKSTFSFSMTKPSEKESEQPAKATFAFGAQTSTTADQGAAKPVFSFLNNSSSSS STPATSAGGGIFGSSTSSSNPPVATFVFGQSSNPVSSSAFGNTAESSTSQSLLFSQDSKLATTSSTGTAVTPFVF GPGASSNNTTTSGFGFGATTTSSSAGSSFVFGTGPSAPSASPAFGANQTPTFGQSQGASQPNPPGFGSISSSTAL FPTGSQPAPPTFGTVSSSSQPPVFGQQPSQSAFGSGTTPNSSSAFQFGSSTTNFNFTNNSPSGVFTFGANSSTPA ASAQPSGSGGFPFNQSPAAFTVGSNGKNVFSSSGTSFSGRKIKTAVRRRK (SEQ ID NO: 194)
Vim116TAG ( Homo sapiens Vimentin with Amber-encoded amino acid position 116; Uniprot: P08670)
DNA: (Amber codon underlined)
ATGTCCACCAGGTCCGTGTCCTCGTCCTCCTACCGCAGGATGTTCGGCGGCCCGGGCACCGCGAGCCGGCCGAGC TCCAGCCGGAGCTACGTGACTACGTCCACCCGCACCTACAGCCTGGGCAGCGCGCTGCGCCCCAGCACCAGCCGC AGCCTCTACGCCTCGTCCCCGGGCGGCGTGTATGCCACGCGCTCCTCTGCCGTGCGCCTGCGGAGCAGCGTGCCC GGGGTGCGGCTCCTGCAGGACTCGGTGGACTTCTCGCTGGCCGACGCCATCAACACCGAGTTCAAGAACACCCGC ACCAACGAGAAGGTGGAGCTGCAGGAGCTGAATGACCGCTTCGCCTAGTACATCGACAAGGTGCGCTTCCTGGAG CAGCAGAATAAGATCCTGCTGGCCGAGCTCGAGCAGCTCAAGGGCCAAGGCAAGTCGCGCCTGGGGGACCTCTAC GAGGAGGAGATGCGGGAGCTGCGCCGGCAGGTGGACCAGCTAACCAACGACAAAGCCCGCGTCGAGGTGGAGCGC GACAACCTGGCCGAGGACATCATGCGCCTCCGGGAGAAATTGCAGGAGGAGATGCTTCAGAGAGAGGAAGCCGAA AACACCCTGCAATCTTTCAGACAGGATGTTGACAATGCGTCTCTGGCACGTCTTGACCTTGAACGCAAAGTGGAA TCTTTGCAAGAAGAGATTGCCTTTTTGAAGAAACTCCACGAAGAGGAAATCCAGGAGCTGCAGGCTCAGATTCAG GAACAGCATGTCCAAATCGATGTGGATGTTTCCAAGCCTGACCTCACGGCTGCCCTGCGTGACGTACGTCAGCAA TATGAAAGTGTGGCTGCCAAGAACCTGCAGGAGGCAGAAGAATGGTACAAATCCAAGTTTGCTGACCTCTCTGAG GCTGCCAACCGGAACAATGACGCCCTGCGCCAGGCAAAGCAGGAGTCCACTGAGTACCGGAGACAGGTGCAGTCC CTCACCTGTGAAGTGGATGCCCTTAAAGGAACCAATGAGTCCCTGGAACGCCAGATGCGTGAAATGGAAGAGAAC TTTGCCGTTGAAGCTGCTAACTACCAAGACACTATTGGCCGCCTGCAGGATGAGATTCAGAATATGAAGGAGGAA ATGGCTCGTCACCTTCGTGAATACCAAGACCTGCTCAATGTTAAGATGGCCCTTGACATTGAGATTGCCACCTAC AGGAAGCTGCTGGAAGGCGAGGAGAGCAGGATTTCTCTGCCTCTTCCAAACTTTTCCTCCCTGAACCTGAGGGAA ACTAATCTGGATTCACTCCCTCTGGTTGATACCCACTCAAAAAGGACACTTCTGATTAAGACGGTTGAAACTAGA GATGGACAGGTTATCAACGAAACTTCTCAGCATCACGATGACCTTGAA (SEQ ID NO: 195)
Protein: (X indicates non-canonical amino acid)
MSTRSVSSSSYRRMFGGPGTASRPSSSRSYVTTSTRTYSLGSALRPSTSRSLYASSPGGVYATRSSAVRLRSSVP GVRLLQDSVDFSLADAINTEFKNTRTNEKVELQELNDRFAXYIDKVRFLEQQNKILLAELEQLKGQGKSRLGDLY EEEMRELRRQVDQLTNDKARVEVERDNLAEDIMRLREKLQEEMLQREEAENTLQSFRQDVDNASLARLDLERKVE SLQEEIAFLKKLHEEEIQELQAQIQEQHVQIDVDVSKPDLTAALRDVRQQYESVAAKNLQEAEEWYKSKFADLSE AANRNNDALRQAKQESTEYRRQVQSLTCEVDALKGTNESLERQMREMEENFAVEAANYQDTIGRLQDEIQNMKEE MARHLREYQDLLNVKMALDIEIATYRKLLEGEESRISLPLPNFSSLNLRETNLDSLPLVDTHSKRTLLIKTVETR DGQVINETSQHHDDLE (SEQ ID NO: 196)
INSR676TAG ( Homo sapiens insulin receptor; Uniprot: P06213)
DNA: (Amber codon underlined)
ATGGGCACCGGGGGCCGGCGGGGGGCGGCGGCCGCGCCGCTGCTGGTGGCGGTGGCCGCGCTGCTACTGGGCGCC
GCGGGCCACCTGTACCCCGGAGAGGTGTGTCCCGGCATGGATATCCGGAACAACCTCACTAGGTTGCATGAGCTG
GAGAATTGCTCTGTCATCGAAGGACACTTGCAGATACTCTTGATGTTCAAAACGAGGCCCGAAGATTTCCGAGAC CTCAGTTTCCCCAAACTCATCATGATCACTGATTACTTGCTGCTCTTCCGGGTCTATGGGCTCGAGAGCCTGAAG
GACCTGTTCCCCAACCTCACGGTCATCCGGGGATCACGACTGTTCTTTAACTACGCGCTGGTCATCTTCGAGATG
GTTCACCTCAAGGAACTCGGCCTCTACAACCTGATGAACATCACCCGGGGTTCTGTCCGCATCGAGAAGAACAAT
GAGCTCTGTTACTTGGCCACTATCGACTGGTCCCGTATCCTGGATTCCGTGGAGGATAATTACATCGTGTTGAAC
AAAGATGACAACGAGGAGTGTGGAGACATCTGTCCGGGTACCGCGAAGGGCAAGACCAACTGCCCCGCCACCGTC
ATCAACGGGCAGTTTGTCGAACGATGTTGGACTCATAGTCACTGCCAGAAAGTTTGCCCGACCATCTGTAAGTCA
CACGGCTGCACCGCCGAAGGCCTCTGTTGCCACAGCGAGTGCCTGGGCAACTGTTCTCAGCCCGACGACCCCACC
AAGTGCGTGGCCTGCCGCAACTTCTACCTGGATGGCAGGTGTGTGGAGACCTGCCCGCCCCCGTACTACCACTTC
CAGGACTGGCGCTGTGTGAACTTCAGCTTCTGCCAGGACCTGCACCACAAATGCAAGAACTCGCGGAGGCAGGGC
TGCCACCAGTACGTCATTCACAACAACAAGTGCATCCCTGAGTGTCCCTCCGGGTACACGATGAATTCCAGCAAC
TTGCTGTGCACCCCATGCCTGGGTCCCTGTCCCAAGGTGTGCCACCTCCTAGAAGGCGAGAAGACCATCGACTCG
GTGACGTCTGCCCAGGAGCTCCGAGGATGCACCGTCATCAACGGGAGTCTGATCATCAACATTCGAGGAGGCAAC
AATCTGGCAGCTGAGCTAGAAGCCAACCTCGGCCTCATTGAAGAAATTTCAGGGTATCTAAAAATCCGCCGATCC
TACGCTCTGGTGTCACTTTCCTTCTTCCGGAAGTTACGTCTGATTCGAGGAGAGACCTTGGAAATTGGGAACTAC
TCCTTCTATGCCTTGGACAACCAGAACCTAAGGCAGCTCTGGGACTGGAGCAAACACAACCTCACCATCACTCAG
GGGAAACTCTTCTTCCACTATAACCCCAAACTCTGCTTGTCAGAAATCCACAAGATGGAAGAAGTTTCAGGAACC
AAGGGGCGCCAGGAGAGAAACGACATTGCCCTGAAGACCAATGGGGACCAGGCATCCTGTGAAAATGAGTTACTT
AAATTTTCTTACATTCGGACATCTTTTGACAAGATCTTGCTGAGATGGGAGCCGTACTGGCCCCCCGACTTCCGA
GACCTCTTGGGGTTCATGCTGTTCTACAAAGAGGCCCCTTATCAGAATGTGACGGAGTTCGACGGGCAGGATGCA
TGTGGTTCCAACAGTTGGACGGTGGTAGACATTGACCCACCCCTGAGGTCCAACGACCCCAAATCACAGAACCAC
CCAGGGTGGCTGATGCGGGGTCTCAAGCCCTGGACCCAGTATGCCATCTTTGTGAAGACCCTGGTCACCTTTTCG
GATGAACGCCGGACCTATGGGGCCAAGAGTGACATCATTTATGTCCAGACAGATGCCACCAACCCCTCTGTGCCC
CTGGATCCAATCTCAGTGTCTAACTCATCATCCCAGATTATTCTGAAGTGGAAACCACCCTCCGACCCCAATGGC
AACATCACCCACTACCTGGTTTTCTGGGAGAGGCAGGCGGAAGACAGTGAGCTGTTCGAGCTGGATTATTGCCTC
TAGGGGCTGAAGCTGCCCTCGAGGACCTGGTCTCCACCATTCGAGTCTGAAGATTCTCAGAAGCACAACCAGAGT
GAGTATGAGGATTCGGCCGGCGAATGCTGCTCCTGTCCAAAGACAGACTCTCAGATCCTGAAGGAGCTGGAGGAG
TCCTCGTTTAGGAAGACGTTTGAGGATTACCTGCACAACGTGGTTTTCGTCCCCAGGCCATCTCGGAAACGCAGG
TCCCTTGGCGATGTTGGGAATGTGACGGTGGCCGTGCCCACGGTGGCAGCTTTCCCCAACACTTCCTCGACCAGC
GTGCCCACGAGTCCGGAGGAGCACAGGCCTTTTGAGAAGGTGGTGAACAAGGAGTCGCTGGTCATCTCCGGCTTG
CGACACTTCACGGGCTATCGCATCGAGCTGCAGGCTTGCAACCAGGACACCCCTGAGGAACGGTGCAGTGTGGCA
GCCTACGTCAGTGCGAGGACCATGCCTGAAGCCAAGGCTGATGACATTGTTGGCCCTGTGACGCATGAAATCTTT
GAGAACAACGTCGTCCACTTGATGTGGCAGGAGCCGAAGGAGCCCAATGGTCTGATCGTGCTGTATGAAGTGAGT
TATCGGCGATATGGTGATGAGGAGCTGCATCTCTGCGTCTCCCGCAAGCACTTCGCTCTGGAACGGGGCTGCAGG
CTGCGTGGGCTGTCACCGGGGAACTACAGCGTGCGAATCCGGGCCACCTCCCTTGCGGGCAACGGCTCTTGGACG
GAACCCACCTATTTCTACGTGACAGACTATTTAGACGTCCCGTCAAATATTGCAAAAATTATCATCGGCCCCCTC
ATCTTTGTCTTTCTCTTCAGTGTTGTGATTGGAAGTATTTATCTATTCCTGAGAAAGAGGCAGCCAGATGGGCCG
CTGGGACCGCTTTACGCTTCTTCAAACCCTGAGTATCTCAGTGCCAGTGATGTGTTTCCATGCTCTGTGTACGTG
CCGGACGAGTGGGAGGTGTCTCGAGAGAAGATCACCCTCCTTCGAGAGCTGGGGCAGGGCTCCTTCGGCATGGTG
TATGAGGGCAATGCCAGGGACATCATCAAGGGTGAGGCAGAGACCCGCGTGGCGGTGAAGACGGTCAACGAGTCA
GCCAGTCTCCGAGAGCGGATTGAGTTCCTCAATGAGGCCTCGGTCATGAAGGGCTTCACCTGCCATCACGTGGTG
CGCCTCCTGGGAGTGGTGTCCAAGGGCCAGCCCACGCTGGTGGTGATGGAGCTGATGGCTCACGGAGACCTGAAG
AGCTACCTCCGTTCTCTGCGGCCAGAGGCTGAGAATAATCCTGGCCGCCCTCCCCCTACCCTTCAAGAGATGATT
CAGATGGCGGCAGAGATTGCTGACGGGATGGCCTACCTGAACGCCAAGAAGTTTGTGCATCGGGACCTGGCAGCG
AGAAACTGCATGGTCGCCCATGATTTTACTGTCAAAATTGGAGACTTTGGAATGACCAGAGACATCTATGAAACG
GATTACTACCGGAAAGGGGGCAAGGGTCTGCTCCCTGTACGGTGGATGGCACCGGAGTCCCTGAAGGATGGGGTC
TTCACCACTTCTTCTGACATGTGGTCCTTTGGCGTGGTCCTTTGGGAAATCACCAGCTTGGCAGAACAGCCTTAC
CAAGGCCTGTCTAATGAACAGGTGTTGAAATTTGTCATGGATGGAGGGTATCTGGATCAACCCGACAACTGTCCA
GAGAGAGTCACTGACCTCATGCGCATGTGCTGGCAATTCAACCCCAACATGAGGCCAACCTTCCTGGAGATTGTC
AACCTGCTCAAGGACGACCTGCACCCCAGCTTTCCAGAGGTGTCGTTCTTCCACAGCGAGGAGAACAAGGCTCCC
GAGAGTGAGGAGCTGGAGATGGAGTTTGAGGACATGGAGAATGTGCCCCTGGACCGTTCCTCGCACTGTCAGAGG
GAGGAGGCGGGGGGCCGGGATGGAGGGTCCTCGCTGGGTTTCAAGCGGAGCTACGAGGAACACATCCCTTACACA
CACATGAACGGAGGCAAGAAAAACGGGCGGATTCTGACCTTGCCTCGGTCCAATCCTTCCT (SEQ ID
NO: 197)
Protein: (X indicates non-canonical amino acid)
MGTGGRRGAAAAPLLVAVAALLLGAAGHLYPGEVCPGMDIRNNLTRLHELENCSVIEGHLQILLMFKTRPEDFRD
LSFPKLIMITDYLLLFRVYGLESLKDLFPNLTVIRGSRLFFNYALVIFEMVHLKELGLYNLMNITRGSVRIEKNN
ELCYLATIDWSRILDSVEDNYIVLNKDDNEECGDICPGTAKGKTNCPATVINGQFVERCWTHSHCQKVCPTICKS
HGCTAEGLCCHSECLGNCSQPDDPTKCVACRNFYLDGRCVETCPPPYYHFQDWRCVNFSFCQDLHHKCKNSRRQG CHQYVIHNNKCIPECPSGYTMNSSNLLCTPCLGPCPKVCHLLEGEKTIDSVTSAQELRGCTVINGSLIINIRGGN NLAAELEANLGLIEEISGYLKIRRSYALVSLSFFRKLRLIRGETLEIGNYSFYALDNQNLRQLWDWSKHNLTITQ GKLFFHYNPKLCLSEIHKMEEVSGTKGRQERNDIALKTNGDQASCENELLKFSYIRTSFDKILLRWEPYWPPDFR DLLGFMLFYKEAPYQNVTEFDGQDACGSNSWTWDIDPPLRSNDPKSQNHPGWLMRGLKPWTQYAIFVKTLVTFS DERRTYGAKSDIIYVQTDATNPSVPLDPISVSNSSSQIILKWKPPSDPNGNITHYLVFWERQAEDSELFELDYCL XGLKLPSRTWSPPFESEDSQKHNQSEYEDSAGECCSCPKTDSQILKELEESSFRKTFEDYLHNWFVPRPSRKRR SLGDVGNVTVAVPTVAAFPNTSSTSVPTSPEEHRPFEKVWKESLVISGLRHFTGYRIELQACNQDTPEERCSVA AYVSARTMPEAKADDIVGPVTHEIFENNWHLMWQEPKEPNGLIVLYEVSYRRYGDEELHLCVSRKHFALERGCR LRGLSPGNYSVRIRATSLAGNGSWTEPTYFYVTDYLDVPSNIAKIIIGPLIFVFLFSWIGSIYLFLRKRQPDGP LGPLYASSNPEYLSASDVFPCSVYVPDEWEVSREKITLLRELGQGSFGMVYEGNARDIIKGEAETRVAVKTWES ASLRERIEFLNEASVMKGFTCHHWRLLGWSKGQPTLWMELMAHGDLKSYLRSLRPEAENNPGRPPPTLQEMI QMAAEIADGMAYLNAKKFVHRDLAARNCMVAHDFTVKIGDFGMTRDIYETDYYRKGGKGLLPVRWMAPESLKDGV FTTSSDMWSFGWLWEITSLAEQPYQGLSNEQVLKFVMDGGYLDQPDNCPERVTDLMRMCWQFNPNMRPTFLEIV NLLKDDLHPSFPEVSFFHSEENKAPESEELEMEFEDMENVPLDRSSHCQREEAGGRDGGSSLGFKRSYEEHIPYT HMNGGKKNGRILTLPRSNPS (SEQ ID NO: 198)
Nupl53-EGFP149TAG
DNA: (Amber codon underlined)
ATGGCGTCTGGTGCTGGCGGTGTTGGTGGAGGAGGTGGGGGTAAAATTCGTACTCGTCGCTGTCATCAAGGTCCG ATTAAACCGTATCAGCAGGGACGTCAGCAACATCAGGGTATTCTGAGCCGTGTGACCGAAAGCGTGAAAAACATT GTGCCGGGTTGGCTGCAACGTTATTTCAACAAAAATGAGGATGTGTGTTCGTGTTCTACCGATACCAGTGAAGTT CCTCGTTGGCCGGAAAACAAAGAAGATCACCTGGTGTATGCCGATGAAGAATCGAGCAATATCACCGATGGCCGT ATTACTCCTGAACCGGCGGTTAGTAACACTGAAGAACCGTCAACCACAAGCACAGCATCGAACTATCCAGATGTC CTGACTCGCCCTTCTCTGCACCGTTCTCACCTGAACTTTAGCATGCTGGAATCACCAGCTCTGCATTGTCAGCCG TCTACCAGTAGTGCCTTCCCGATTGGCTCTAGTGGCTTTTCGCTGGTCAAAGAGATCAAAGACTCGACCTCTCAA CATGACGATGATAACATTAGCACGACCTCGGGTTTTAGTAGCCGTGCCTCCGATAAAGACATTACCGTGAGCAAA AACACCTCTCTGCCGCCTCTGTGGAGTCCTGAAGCCGAACGCTCTCATAGTCTGTCTCAGCACACAGCCACCAGT TCCAAAAAACCAGCCTTCAACCTGAGCGCCTTTGGTACACTGTCACCGAGCCTGGGAAATTCCTCTATCCTGAAA ACATCACAGCTGGGCGATAGTCCGTTTTATCCGGGCAAAACGACGTATGGTGGTGCCGCTGCTGCTGTTCGCCAG TCTAAACTGCGTAACACTCCGTATCAAGCTCCAGTCCGTCGCCAAATGAAAGCAAAACAACTGTCGGCCCAGTCT TATGGTGTGACAAGCTCTACAGCTCGTCGTATCCTGCAAAGTCTGGAGAAAATGTCATCTCCGCTGGCAGATGCC AAACGTATTCCGTCCATTGTGAGCAGTCCGCTGAATAGCCCGCTGGACCGTAGTGGGATCGATATCACCGACTTC CAAGCCAAACGTGAGAAAGTGGATAGCCAGTATCCGCCTGTACAACGTCTGATGACCCCGAAACCGGTTTCAATT GCCACGAATCGTAGCGTGTATTTCAAACCGTCACTGACCCCTAGTGGTGAGTTTCGTAAAACAAATCAGCGTATC GACAACAAATGCTCTACCGGGTATGAAAAAAACATGACGCCGGGACAGAATCGTGAACAACGTGAATCTGGCTTC TCTTATCCGAACTTTAGTCTGCCGGCAGCAAATGGTCTGAGTAGCGGTGTAGGAGGTGGTGGGGGCAAAATGCGC CGTGAACGTCACGCCTTTGTGGCCTCTAAACCTCTGGAAGAAGAAGAGATGGAGGTTCCTGTACTGCCGAAAATC AGTCTGCCTATCACCTCTTCAAGTCTGCCGACCTTCAACTTTTCTAGTCCGGAAATCACAACCTCTAGCCCGTCA CCGATTAATAGCAGTCAAGCACTGACGAATAAAGTCCAAATGACCTCACCGAGTTCTACGGGTTCTCCGATGTTC AAATTCTCTAGTCCTATCGTGAAATCAACCGAAGCGAACGTCCTGCCTCCTTCTAGTATTGGGTTCACCTTTAGC GTCCCAGTGGCCAAAACAGCTGAACTGAGCGGTAGCAGTAGTACTCTGGAACCGATTATCAGCTCAAGCGCCCAT CATGTCACTACCGTGAATAGCACAAACTGTAAAAAAACGCCGCCTGAGGACTGTGAAGGACCGTTTCGTCCTGCC GAAATCCTGAAAGAAGGTTCCGTCCTGGACATTCTGAAATCTCCGGGATTTGCCTCTCCTAAAATCGACTCTGTT GCCGCTCAACCAACTGCCACATCACCGGTGGTTTATACTCGTCCGGCGATTAGCAGTTTTAGCAGTAGTGGCATC GGTTTTGGTGAATCCCTGAAAGCTGGCTCATCTTGGCAGTGTGACACCTGCCTGCTGCAAAACAAAGTGACCGAT AACAAATGTATTGCCTGTCAGGCCGCCAAACTGTCTCCTCGTGATACAGCCAAACAGACCGGCATCGAAACCCCT AATAAAAGCGGGAAAACGACCCTGTCAGCAAGTGGTACGGGATTTGGGGACAAATTCAAACCTGTGATCGGCACA TGGGACTGTGACACTTGTCTGGTACAGAACAAACCAGAAGCGATCAAATGTGTGGCCTGTGAAACGCCTAAACCT GGAACATGTGTGAAACGTGCCCTGACTCTGACTGTTGTGTCAGAAAGCGCCGAAACCATGACGGCAAGCAGCTCA TCCTGTACTGTGACTACCGGGACTCTGGGATTTGGTGACAAATTCAAACGCCCGATTGGTTCCTGGGAATGCTCC GTGTGTTGTGTGAGCAATAATGCCGAGGACAACAAATGTGTGTCCTGTATGAGCGAGAAACCTGGCAGCTCTGTT CCTGCTAGCAGCTCTAGCACAGTTCCTGTTAGTCTGCCTAGTGGTGGTTCTCTGGGTCTGGAAAAATTCAAAAAA CCTGAAGGAAGCTGGGATTGTGAGCTGTGCCTGGTACAGAATAAAGCGGATAGCACGAAATGTCTGGCCTGTGAG TCAGCCAAACCAGGGACTAAAAGCGGCTTTAAAGGCTTCGACACGTCGAGCAGTTCTAGTAACAGCGCCGCCTCA TCATCTTTCAAATTTGGGGTGAGCAGCTCCTCTAGTGGTCCTAGTCAAACACTGACCTCTACCGGAAACTTCAAA TTCGGCGATCAGGGTGGCTTCAAAATTGGTGTCTCCTCTGATTCGGGTAGCATTAACCCGATGAGTGAGGGGTTC AAATTCAGCAAACCAATTGGCGATTTCAAATTCGGTGTGTCGTCTGAATCCAAACCTGAAGAAGTCAAAAAAGAC AGCAAAAACGACAATTTCAAATTCGGCCTGTCTAGTGGTCTGTCTAATCCGGTTAGCCTGACCCCGTTTCAGTTC GGGGTGTCTAATCTGGGTCAGGAAGAGAAAAAAGAGGAGCTGCCTAAAAGTTCATCTGCCGGGTTCAGTTTTGGT
ACAGGCGTGATCAATAGCACTCCAGCACCAGCCAATACAATCGTGACGAGCGAGAACAAATCGAGCTTCAACCTG
GGGACAATCGAAACGAAAAGCGCCAGTGTAGCGCCATTCACGTGTAAAACCTCCGAGGCAAAAAAAGAAGAGATG
CCGGCCACAAAAGGTGGATTCTCATTCGGCAACGTGGAACCGGCTAGCCTGCCATCAGCAAGCGTGTTTGTACTG
GGCCGTACCGAGGAGAAACAGCAGGAACCTGTTACTAGCACCAGTCTGGTCTTTGGTAAAAAAGCCGACAACGAA
GAACCGAAATGTCAGCCAGTGTTCAGCTTCGGCAATAGCGAACAGACGAAAGACGAAAACAGCAGCAAATCGACG
TTCAGCTTCAGTATGACGAAACCGAGCGAAAAAGAAAGTGAGCAGCCAGCAAAAGCAACGTTCGCCTTTGGAGCA
CAGACATCAACCACAGCCGATCAAGGAGCAGCGAAACCAGTTTTCAGTTTTCTGAATAACAGCTCAAGCAGCAGT
TCTACACCAGCAACCTCAGCAGGTGGTGGGATCTTTGGATCAAGCACCTCATCCAGCAATCCGCCAGTGGCAACA
TTCGTGTTTGGCCAGAGCAGTAATCCGGTGTCATCTTCAGCATTTGGGAATACCGCCGAGAGTAGCACATCACAG
TCTCTGCTGTTCTCACAGGACTCTAAACTGGCAACCACCTCTTCTACTGGTACAGCGGTTACCCCGTTTGTGTTC
GGTCCGGGAGCATCATCCAATAATACCACGACGTCGGGCTTTGGGTTTGGTGCCACGACAACAAGCAGTAGCGCT
GGTAGCAGCTTTGTCTTTGGCACAGGTCCTTCAGCACCTTCTGCTTCACCAGCTTTCGGAGCCAATCAGACTCCG
ACATTCGGACAGTCACAGGGTGCCTCTCAACCAAATCCTCCGGGTTTTGGCAGTATTAGCAGTAGTACCGCCCTG
TTCCCGACCGGTAGTCAACCGGCACCGCCAACATTTGGAACGGTTAGCAGTAGTAGTCAGCCTCCGGTGTTTGGA
CAACAACCGAGCCAGAGCGCCTTCGGATCAGGAACGACCCCTAATAGTAGCAGTGCCTTCCAGTTCGGTAGCAGT
ACCACCAACTTCAACTTCACGAACAATAGCCCGTCAGGTGTGTTCACGTTTGGCGCCAATTCTTCTACCCCAGCG
GCAAGTGCTCAACCTTCAGGCTCAGGTGGATTTCCTTTCAACCAGTCACCAGCAGCGTTTACTGTTGGTTCTAAC
GGGAAAAACGTTTTCAGTAGCAGCGGCACCTCGTTTTCTGGTCGTAAAATCAAAACGGCCGTTCGTCGCCGTAAA
GCGGATCCACCGGTCGCCACGAGAGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAG
CTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTG
ACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGC
GTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTAC
GTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGAC
ACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAG
TACAACTACAACAGCCACTAGGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATC
CGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCC
GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCAC
ATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAA ( SEQ ID
NO: 199)
Protein: (X indicates non-canonical amino acid)
MASGAGGVGGGGGGKIRTRRCHQGPIKPYQQGRQQHQGILSRVTESVKNIVPGWLQRYFNKNEDVCSCSTDTSEV
PRWPENKEDHLVYADEESSNITDGRITPEPAVSNTEEPSTTSTASNYPDVLTRPSLHRSHLNFSMLESPALHCQP
STSSAFPIGSSGFSLVKEIKDSTSQHDDDNISTTSGFSSRASDKDITVSKNTSLPPLWSPEAERSHSLSQHTATS
SKKPAFNLSAFGTLSPSLGNSSILKTSQLGDSPFYPGKTTYGGAAAAVRQSKLRNTPYQAPVRRQMKAKQLSAQS
YGVTSSTARRILQSLEKMSSPLADAKRIPSIVSSPLNSPLDRSGIDITDFQAKREKVDSQYPPVQRLMTPKPVSI
ATNRSVYFKPSLTPSGEFRKTNQRIDNKCSTGYEKNMTPGQNREQRESGFSYPNFSLPAANGLSSGVGGGGGKMR
RERHAFVASKPLEEEEMEVPVLPKISLPITSSSLPTFNFSSPEITTSSPSPINSSQALTNKVQMTSPSSTGSPMF
KFSSPIVKSTEANVLPPSSIGFTFSVPVAKTAELSGSSSTLEPIISSSAHHVTTVNSTNCKKTPPEDCEGPFRPA
EILKEGSVLDILKSPGFASPKIDSVAAQPTATSPWYTRPAISSFSSSGIGFGESLKAGSSWQCDTCLLQNKVTD
NKCIACQAAKLSPRDTAKQTGIETPNKSGKTTLSASGTGFGDKFKPVIGTWDCDTCLVQNKPEAIKCVACETPKP
GTCVKRALTLTWSESAETMTASSSSCTVTTGTLGFGDKFKRPIGSWECSVCCVSNNAEDNKCVSCMSEKPGSSV
PASSSSTVPVSLPSGGSLGLEKFKKPEGSWDCELCLVQNKADSTKCLACESAKPGTKSGFKGFDTSSSSSNSAAS
SSFKFGVSSSSSGPSQTLTSTGNFKFGDQGGFKIGVSSDSGSINPMSEGFKFSKPIGDFKFGVSSESKPEEVKKD
SKNDNFKFGLSSGLSNPVSLTPFQFGVSNLGQEEKKEELPKSSSAGFSFGTGVINSTPAPANTIVTSENKSSFNL
GTIETKSASVAPFTCKTSEAKKEEMPATKGGFSFGNVEPASLPSASVFVLGRTEEKQQEPVTSTSLVFGKKADNE
EPKCQPVFSFGNSEQTKDENSSKSTFSFSMTKPSEKESEQPAKATFAFGAQTSTTADQGAAKPVFSFLNNSSSSS
STPATSAGGGIFGSSTSSSNPPVATFVFGQSSNPVSSSAFGNTAESSTSQSLLFSQDSKLATTSSTGTAVTPFVF
GPGASSNNTTTSGFGFGATTTSSSAGSSFVFGTGPSAPSASPAFGANQTPTFGQSQGASQPNPPGFGSISSSTAL
FPTGSQPAPPTFGTVSSSSQPPVFGQQPSQSAFGSGTTPNSSSAFQFGSSTTNFNFTNNSPSGVFTFGANSSTPA
ASAQPSGSGGFPFNQSPAAFTVGSNGKNVFSSSGTSFSGRKIKTAVRRRKADPPVATRVSKGEELFTGWPILVE
LDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGY
VQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHXVYIMADKQKNGIKVNFKI
RHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYK
(SEQ ID NO: 200)
Nupl53-EGFP149TAG-MS2
DNA: (MS2stem-loops and Amber codon underlined) ATGGCGTCTGGTGCTGGCGGTGTTGGTGGAGGAGGTGGGGGTAAAATTCGTACTCGTCGCTGTCATCAAGGTCCG ATTAAACCGTATCAGCAGGGACGTCAGCAACATCAGGGTATTCTGAGCCGTGTGACCGAAAGCGTGAAAAACATT GTGCCGGGTTGGCTGCAACGTTATTTCAACAAAAATGAGGATGTGTGTTCGTGTTCTACCGATACCAGTGAAGTT CCTCGTTGGCCGGAAAACAAAGAAGATCACCTGGTGTATGCCGATGAAGAATCGAGCAATATCACCGATGGCCGT AT T AC T C C T GAAC C G G C G GT T AGT AAC AC T GAAGAAC C GT C AAC C AC AAG C AC AG CAT C GAAC TAT C C AGAT GT C CTGACTCGCCCTTCTCTGCACCGTTCTCACCTGAACTTTAGCATGCTGGAATCACCAGCTCTGCATTGTCAGCCG TCTACCAGTAGTGCCTTCCCGATTGGCTCTAGTGGCTTTTCGCTGGTCAAAGAGATCAAAGACTCGACCTCTCAA CAT GAC GAT GAT AAC AT TAG C AC GAC CTCGGGTTTTAGTAGCCGTGCCTCC GAT AAAGAC AT T AC C GT GAG C AAA AACACCTCTCTGCCGCCTCTGTGGAGTCCTGAAGCCGAACGCTCTCATAGTCTGTCTCAGCACACAGCCACCAGT TCCAAAAAACCAGCCTTCAACCTGAGCGCCTTTGGTACACTGTCACCGAGCCTGGGAAATTCCTCTATCCTGAAA ACATCACAGCTGGGCGATAGTCCGTTTTATCCGGGCAAAACGACGTATGGTGGTGCCGCTGCTGCTGTTCGCCAG TCTAAACTGCGTAACACTCCGTATCAAGCTCCAGTCCGTCGCCAAATGAAAGCAAAACAACTGTCGGCCCAGTCT TATGGTGTGACAAGCTCTACAGCTCGTCGTATCCTGCAAAGTCTGGAGAAAATGTCATCTCCGCTGGCAGATGCC AAACGTATTCCGTCCATTGTGAGCAGTCCGCTGAATAGCCCGCTGGACCGTAGTGGGATCGATATCACCGACTTC CAAGCCAAACGTGAGAAAGTGGATAGCCAGTATCCGCCTGTACAACGTCTGATGACCCCGAAACCGGTTTCAATT GCCACGAATCGTAGCGTGTATTTCAAACCGTCACTGACCCCTAGTGGTGAGTTTCGTAAAACAAATCAGCGTATC GACAACAAATGCTCTACCGGGTATGAAAAAAACATGACGCCGGGACAGAATCGTGAACAACGTGAATCTGGCTTC TCTTATCCGAACTTTAGTCTGCCGGCAGCAAATGGTCTGAGTAGCGGTGTAGGAGGTGGTGGGGGCAAAATGCGC CGTGAACGTCACGCCTTTGTGGCCTCTAAACCTCTGGAAGAAGAAGAGATGGAGGTTCCTGTACTGCCGAAAATC AGTCTGCCTATCACCTCTTCAAGTCTGCCGACCTTCAACTTTTCTAGTCCGGAAATCACAACCTCTAGCCCGTCA CCGATTAATAGCAGTCAAGCACTGACGAATAAAGTCCAAATGACCTCACCGAGTTCTACGGGTTCTCCGATGTTC AAATTCTCTAGTCCTATCGTGAAATCAACCGAAGCGAACGTCCTGCCTCCTTCTAGTATTGGGTTCACCTTTAGC GTCCCAGTGGCCAAAACAGCTGAACTGAGCGGTAGCAGTAGTACTCTGGAACCGATTATCAGCTCAAGCGCCCAT CATGTCACTACCGTGAATAGCACAAACTGTAAAAAAACGCCGCCTGAGGACTGTGAAGGACCGTTTCGTCCTGCC GAAATCCTGAAAGAAGGTTCCGTCCTGGACATTCTGAAATCTCCGGGATTTGCCTCTCCTAAAATCGACTCTGTT GCCGCTCAACCAACTGCCACATCACCGGTGGTTTATACTCGTCCGGCGATTAGCAGTTTTAGCAGTAGTGGCATC GGTTTTGGTGAATCCCTGAAAGCTGGCTCATCTTGGCAGTGTGACACCTGCCTGCTGCAAAACAAAGTGACCGAT AACAAATGTATTGCCTGTCAGGCCGCCAAACTGTCTCCTCGTGATACAGCCAAACAGACCGGCATCGAAACCCCT AATAAAAGCGGGAAAACGACCCTGTCAGCAAGTGGTACGGGATTTGGGGACAAATTCAAACCTGTGATCGGCACA TGGGACTGTGACACTTGTCTGGTACAGAACAAACCAGAAGCGATCAAATGTGTGGCCTGTGAAACGCCTAAACCT GGAACATGTGTGAAACGTGCCCTGACTCTGACTGTTGTGTCAGAAAGCGCCGAAACCATGACGGCAAGCAGCTCA TCCTGTACTGTGACTACCGGGACTCTGGGATTTGGTGACAAATTCAAACGCCCGATTGGTTCCTGGGAATGCTCC GTGTGTTGTGTGAGCAATAATGCCGAGGACAACAAATGTGTGTCCTGTATGAGCGAGAAACCTGGCAGCTCTGTT CCTGCTAGCAGCTCTAGCACAGTTCCTGTTAGTCTGCCTAGTGGTGGTTCTCTGGGTCTGGAAAAATTCAAAAAA CCTGAAGGAAGCTGGGATTGTGAGCTGTGCCTGGTACAGAATAAAGCGGATAGCACGAAATGTCTGGCCTGTGAG TCAGCCAAACCAGGGACTAAAAGCGGCTTTAAAGGCTTCGACACGTCGAGCAGTTCTAGTAACAGCGCCGCCTCA TCATCTTTCAAATTTGGGGTGAGCAGCTCCTCTAGTGGTCCTAGTCAAACACTGACCTCTACCGGAAACTTCAAA TTCGGCGATCAGGGTGGCTTCAAAATTGGTGTCTCCTCTGATTCGGGTAGCATTAACCCGATGAGTGAGGGGTTC AAATTCAGCAAACCAATTGGCGATTTCAAATTCGGTGTGTCGTCTGAATCCAAACCTGAAGAAGTCAAAAAAGAC AGCAAAAACGACAATTTCAAATTCGGCCTGTCTAGTGGTCTGTCTAATCCGGTTAGCCTGACCCCGTTTCAGTTC GGGGTGTCTAATCTGGGTCAGGAAGAGAAAAAAGAGGAGCTGCCTAAAAGTTCATCTGCCGGGTTCAGTTTTGGT AC AG G C GT GAT C AAT AG C AC T C C AG C AC C AG C C AAT AC AAT C GT GAC GAG C GAGAAC AAAT C GAG C T T C AAC C T G G G GAC AAT C GAAAC GAAAAG C G C C AGT GT AG C G C CAT T C AC GT GT AAAAC C T C C GAG G C AAAAAAAGAAGAGAT G CCGGCCACAAAAGGTGGATTCTCATTCGGCAACGTGGAACCGGCTAGCCTGCCATCAGCAAGCGTGTTTGTACTG GGCCGTACCGAGGAGAAACAGCAGGAACCTGTTACTAGCACCAGTCTGGTCTTTGGTAAAAAAGCCGACAACGAA GAAC C GAAAT GT C AG C C AGT GT T C AG C T T C G G C AAT AG C GAAC AGAC GAAAGAC GAAAAC AG C AG C AAAT C GAC G TTCAGCTTCAGTATGACGAAACCGAGCGAAAAAGAAAGTGAGCAGCCAGCAAAAGCAACGTTCGCCTTTGGAGCA C AGAC AT C AAC C AC AG C C GAT C AAG GAG C AG C GAAAC C AGT T T T C AGT T T T C T GAAT AAC AG C T C AAG C AG C AGT TCTACACCAGCAACCTCAGCAGGTGGTGGGATCTTTGGATCAAGCACCTCATCCAGCAATCCGCCAGTGGCAACA TTCGTGTTTGGCCAGAGCAGTAATCCGGTGTCATCTTCAGCATTTGGGAATACCGCCGAGAGTAGCACATCACAG TCTCTGCTGTTCTCACAGGACTCTAAACTGGCAACCACCTCTTCTACTGGTACAGCGGTTACCCCGTTTGTGTTC GGTCCGGGAGCATCATCCAATAATACCACGACGTCGGGCTTTGGGTTTGGTGCCACGACAACAAGCAGTAGCGCT GGTAGCAGCTTTGTCTTTGGCACAGGTCCTTCAGCACCTTCTGCTTCACCAGCTTTCGGAGCCAATCAGACTCCG ACATTCGGACAGTCACAGGGTGCCTCTCAACCAAATCCTCCGGGTTTTGGCAGTATTAGCAGTAGTACCGCCCTG TTCCCGACCGGTAGTCAACCGGCACCGCCAACATTTGGAACGGTTAGCAGTAGTAGTCAGCCTCCGGTGTTTGGA CAACAACCGAGCCAGAGCGCCTTCGGATCAGGAACGACCCCTAATAGTAGCAGTGCCTTCCAGTTCGGTAGCAGT ACCACCAACTTCAACTTCACGAACAATAGCCCGTCAGGTGTGTTCACGTTTGGCGCCAATTCTTCTACCCCAGCG GCAAGTGCTCAACCTTCAGGCTCAGGTGGATTTCCTTTCAACCAGTCACCAGCAGCGTTTACTGTTGGTTCTAAC GGGAAAAACGTTTTCAGTAGCAGCGGCACCTCGTTTTCTGGTCGTAAAATCAAAACGGCCGTTCGTCGCCGTAAA GCGGATCCACCGGTCGCCACGAGAGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAG CTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTG ACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGC GTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTAC GTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGAC ACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAG TACAACTACAACAGCCACTAGGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGTGAACTTCAAGATC CGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCC GTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCAC ATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAAAGCGGCCGC GACTCTAGATCATAATCAGCACATGAGGATCACCCATGTCTGCAGGTCGACTCTAGAAAACATGAGGATCACCCA TGT (SEQ ID NO: 201)
Protein: (X indicates non-canonical amino acid)
MASGAGGVGGGGGGKIRTRRCHQGPIKPYQQGRQQHQGILSRVTESVKNIVPGWLQRYFNKNEDVCSCSTDTSEV
PRWPENKEDHLVYADEESSNITDGRITPEPAVSNTEEPSTTSTASNYPDVLTRPSLHRSHLNFSMLESPALHCQP
STSSAFPIGSSGFSLVKEIKDSTSQHDDDNISTTSGFSSRASDKDITVSKNTSLPPLWSPEAERSHSLSQHTATS
SKKPAFNLSAFGTLSPSLGNSSILKTSQLGDSPFYPGKTTYGGAAAAVRQSKLRNTPYQAPVRRQMKAKQLSAQS
YGVTSSTARRILQSLEKMSSPLADAKRIPSIVSSPLNSPLDRSGIDITDFQAKREKVDSQYPPVQRLMTPKPVSI
ATNRSVYFKPSLTPSGEFRKTNQRIDNKCSTGYEKNMTPGQNREQRESGFSYPNFSLPAANGLSSGVGGGGGKMR
RERHAFVASKPLEEEEMEVPVLPKISLPITSSSLPTFNFSSPEITTSSPSPINSSQALTNKVQMTSPSSTGSPMF
KFSSPIVKSTEANVLPPSSIGFTFSVPVAKTAELSGSSSTLEPIISSSAHHVTTVNSTNCKKTPPEDCEGPFRPA
EILKEGSVLDILKSPGFASPKIDSVAAQPTATSPWYTRPAISSFSSSGIGFGESLKAGSSWQCDTCLLQNKVTD
NKCIACQAAKLSPRDTAKQTGIETPNKSGKTTLSASGTGFGDKFKPVIGTWDCDTCLVQNKPEAIKCVACETPKP
GTCVKRALTLTWSESAETMTASSSSCTVTTGTLGFGDKFKRPIGSWECSVCCVSNNAEDNKCVSCMSEKPGSSV
PASSSSTVPVSLPSGGSLGLEKFKKPEGSWDCELCLVQNKADSTKCLACESAKPGTKSGFKGFDTSSSSSNSAAS
SSFKFGVSSSSSGPSQTLTSTGNFKFGDQGGFKIGVSSDSGSINPMSEGFKFSKPIGDFKFGVSSESKPEEVKKD
SKNDNFKFGLSSGLSNPVSLTPFQFGVSNLGQEEKKEELPKSSSAGFSFGTGVINSTPAPANTIVTSENKSSFNL
GTIETKSASVAPFTCKTSEAKKEEMPATKGGFSFGNVEPASLPSASVFVLGRTEEKQQEPVTSTSLVFGKKADNE
EPKCQPVFSFGNSEQTKDENSSKSTFSFSMTKPSEKESEQPAKATFAFGAQTSTTADQGAAKPVFSFLNNSSSSS
STPATSAGGGIFGSSTSSSNPPVATFVFGQSSNPVSSSAFGNTAESSTSQSLLFSQDSKLATTSSTGTAVTPFVF
GPGASSNNTTTSGFGFGATTTSSSAGSSFVFGTGPSAPSASPAFGANQTPTFGQSQGASQPNPPGFGSISSSTAL
FPTGSQPAPPTFGTVSSSSQPPVFGQQPSQSAFGSGTTPNSSSAFQFGSSTTNFNFTNNSPSGVFTFGANSSTPA
ASAQPSGSGGFPFNQSPAAFTVGSNGKNVFSSSGTSFSGRKIKTAVRRRKADPPVATRVSKGEELFTGWPILVE
LDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGY
VQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHXVYIMADKQKNGIKVNFKI
RHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYK
(SEQ ID NO : 202 )
Vim116TAG-mOrange
DNA: (Amber codon underlined)
ATGTCCACCAGGTCCGTGTCCTCGTCCTCCTACCGCAGGATGTTCGGCGGCCCGGGCACCGCGAGCCGGCCGAGC TCCAGCCGGAGCTACGTGACTACGTCCACCCGCACCTACAGCCTGGGCAGCGCGCTGCGCCCCAGCACCAGCCGC AGCCTCTACGCCTCGTCCCCGGGCGGCGTGTATGCCACGCGCTCCTCTGCCGTGCGCCTGCGGAGCAGCGTGCCC GGGGTGCGGCTCCTGCAGGACTCGGTGGACTTCTCGCTGGCCGACGCCATCAACACCGAGTTCAAGAACACCCGC ACCAACGAGAAGGTGGAGCTGCAGGAGCTGAATGACCGCTTCGCCTAGTACATCGACAAGGTGCGCTTCCTGGAG CAGCAGAATAAGATCCTGCTGGCCGAGCTCGAGCAGCTCAAGGGCCAAGGCAAGTCGCGCCTGGGGGACCTCTAC GAGGAGGAGATGCGGGAGCTGCGCCGGCAGGTGGACCAGCTAACCAACGACAAAGCCCGCGTCGAGGTGGAGCGC GACAACCTGGCCGAGGACATCATGCGCCTCCGGGAGAAATTGCAGGAGGAGATGCTTCAGAGAGAGGAAGCCGAA AACACCCTGCAATCTTTCAGACAGGATGTTGACAATGCGTCTCTGGCACGTCTTGACCTTGAACGCAAAGTGGAA TCTTTGCAAGAAGAGATTGCCTTTTTGAAGAAACTCCACGAAGAGGAAATCCAGGAGCTGCAGGCTCAGATTCAG GAACAGCATGTCCAAATCGATGTGGATGTTTCCAAGCCTGACCTCACGGCTGCCCTGCGTGACGTACGTCAGCAA TATGAAAGTGTGGCTGCCAAGAACCTGCAGGAGGCAGAAGAATGGTACAAATCCAAGTTTGCTGACCTCTCTGAG GCTGCCAACCGGAACAATGACGCCCTGCGCCAGGCAAAGCAGGAGTCCACTGAGTACCGGAGACAGGTGCAGTCC CTCACCTGTGAAGTGGATGCCCTTAAAGGAACCAATGAGTCCCTGGAACGCCAGATGCGTGAAATGGAAGAGAAC TTTGCCGTTGAAGCTGCTAACTACCAAGACACTATTGGCCGCCTGCAGGATGAGATTCAGAATATGAAGGAGGAA ATGGCTCGTCACCTTCGTGAATACCAAGACCTGCTCAATGTTAAGATGGCCCTTGACATTGAGATTGCCACCTAC AGGAAGCTGCTGGAAGGCGAGGAGAGCAGGATTTCTCTGCCTCTTCCAAACTTTTCCTCCCTGAACCTGAGGGAA ACTAATCTGGATTCACTCCCTCTGGTTGATACCCACTCAAAAAGGACACTTCTGATTAAGACGGTTGAAACTAGA GATGGACAGGTTATCAACGAAACTTCTCAGCATCACGATGACCTTGAAGGGGATCCACCGGTCGCCACCATGGTG AGCAAGGGCGAGGAGAATAATATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCGCATGGAGGGCACCGTG AACGGCCACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCTTTCAGACCGCTAAGCTGAAG GTGACCAAGGGCGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCTCTTCACCTACGGCTCCAAGGCCTAC GTGAAGCACCCCGCCGACATCCCCGACTACTTCAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGAGCGCGTGATG AACTACGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCACTGCAGGACGGCGAGTTCATCTACAAGGTG AAGATGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTGATGCAGAAGAAGACCATGGGCTGGGAGGCCTCCTCC GAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAGGATGAGGCTGAAGCTGAAGGACGGCGGCCAC TACACCTCCGAGGTCAAGACCACCTACAAGGCCAAGAAGTCCGTGCAGCTGCCCGGCGCCTACATCGTCGGCATC AAGCTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCC ACCGGCGGCATGGACGAGCTGTACAAGTAA (SEQ ID NO: 203)
Protein: (X indicates non-canonical amino acid)
MSTRSVSSSSYRRMFGGPGTASRPSSSRSYVTTSTRTYSLGSALRPSTSRSLYASSPGGVYATRSSAVRLRSSVP GVRLLQDSVDFSLADAINTEFKNTRTNEKVELQELNDRFAXYIDKVRFLEQQNKILLAELEQLKGQGKSRLGDLY EEEMRELRRQVDQLTNDKARVEVERDNLAEDIMRLREKLQEEMLQREEAENTLQSFRQDVDNASLARLDLERKVE SLQEEIAFLKKLHEEEIQELQAQIQEQHVQIDVDVSKPDLTAALRDVRQQYESVAAKNLQEAEEWYKSKFADLSE AANRNNDALRQAKQESTEYRRQVQSLTCEVDALKGTNESLERQMREMEENFAVEAANYQDTIGRLQDEIQNMKEE MARHLREYQDLLNVKMALDIEIATYRKLLEGEESRISLPLPNFSSLNLRETNLDSLPLVDTHSKRTLLIKTVETR DGQVINETSQHHDDLEGDPPVATMVSKGEENNMAIIKEFMRFKVRMEGTVNGHEFEIEGEGEGRPYEGFQTAKLK VTKGGPLPFAWDILSPLFTYGSKAYVKHPADIPDYFKLSFPEGFKWERVMNYEDGGWTVTQDSSLQDGEFIYKV KMRGTNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIRMRLKLKDGGHYTSEVKTTYKAKKSVQLPGAYIVGI KLDITSHNEDYTIVEQYERAEGRHSTGGMDELYK (SEQ ID NO: 204)
Vim116TAG-mOrange-MS2
DNA: (MS2 stem-loops and Amber codon underlined)
ATGTCCACCAGGTCCGTGTCCTCGTCCTCCTACCGCAGGATGTTCGGCGGCCCGGGCACCGCGAGCCGGCCGAGC TCCAGCCGGAGCTACGTGACTACGTCCACCCGCACCTACAGCCTGGGCAGCGCGCTGCGCCCCAGCACCAGCCGC AGCCTCTACGCCTCGTCCCCGGGCGGCGTGTATGCCACGCGCTCCTCTGCCGTGCGCCTGCGGAGCAGCGTGCCC GGGGTGCGGCTCCTGCAGGACTCGGTGGACTTCTCGCTGGCCGACGCCATCAACACCGAGTTCAAGAACACCCGC ACCAACGAGAAGGTGGAGCTGCAGGAGCTGAATGACCGCTTCGCCTAGTACATCGACAAGGTGCGCTTCCTGGAG CAGCAGAATAAGATCCTGCTGGCCGAGCTCGAGCAGCTCAAGGGCCAAGGCAAGTCGCGCCTGGGGGACCTCTAC GAGGAGGAGATGCGGGAGCTGCGCCGGCAGGTGGACCAGCTAACCAACGACAAAGCCCGCGTCGAGGTGGAGCGC GACAACCTGGCCGAGGACATCATGCGCCTCCGGGAGAAATTGCAGGAGGAGATGCTTCAGAGAGAGGAAGCCGAA AACACCCTGCAATCTTTCAGACAGGATGTTGACAATGCGTCTCTGGCACGTCTTGACCTTGAACGCAAAGTGGAA TCTTTGCAAGAAGAGATTGCCTTTTTGAAGAAACTCCACGAAGAGGAAATCCAGGAGCTGCAGGCTCAGATTCAG GAACAGCATGTCCAAATCGATGTGGATGTTTCCAAGCCTGACCTCACGGCTGCCCTGCGTGACGTACGTCAGCAA TATGAAAGTGTGGCTGCCAAGAACCTGCAGGAGGCAGAAGAATGGTACAAATCCAAGTTTGCTGACCTCTCTGAG GCTGCCAACCGGAACAATGACGCCCTGCGCCAGGCAAAGCAGGAGTCCACTGAGTACCGGAGACAGGTGCAGTCC CTCACCTGTGAAGTGGATGCCCTTAAAGGAACCAATGAGTCCCTGGAACGCCAGATGCGTGAAATGGAAGAGAAC TTTGCCGTTGAAGCTGCTAACTACCAAGACACTATTGGCCGCCTGCAGGATGAGATTCAGAATATGAAGGAGGAA ATGGCTCGTCACCTTCGTGAATACCAAGACCTGCTCAATGTTAAGATGGCCCTTGACATTGAGATTGCCACCTAC AGGAAGCTGCTGGAAGGCGAGGAGAGCAGGATTTCTCTGCCTCTTCCAAACTTTTCCTCCCTGAACCTGAGGGAA ACTAATCTGGATTCACTCCCTCTGGTTGATACCCACTCAAAAAGGACACTTCTGATTAAGACGGTTGAAACTAGA GATGGACAGGTTATCAACGAAACTTCTCAGCATCACGATGACCTTGAAGGGGATCCACCGGTCGCCACCATGGTG AGCAAGGGCGAGGAGAATAATATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCGCATGGAGGGCACCGTG AACGGCCACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCTTTCAGACCGCTAAGCTGAAG GTGACCAAGGGCGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCTCTTCACCTACGGCTCCAAGGCCTAC GTGAAGCACCCCGCCGACATCCCCGACTACTTCAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGAGCGCGTGATG AACTACGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCACTGCAGGACGGCGAGTTCATCTACAAGGTG AAGATGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTGATGCAGAAGAAGACCATGGGCTGGGAGGCCTCCTCC GAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAGGATGAGGCTGAAGCTGAAGGACGGCGGCCAC TACACCTCCGAGGTCAAGACCACCTACAAGGCCAAGAAGTCCGTGCAGCTGCCCGGCGCCTACATCGTCGGCATC AAGCTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCC ACCGGCGGCATGGACGAGCTGTACAAGTAAAGCGGCCGCGACTCTAGATCATAATCAGCACATGAGGATCACCCA TGTCTGCAGGTCGACTCTAGAAAACATGAGGATCACCCATGT (SEQ ID NO: 205)
Protein: (X indicates non-canonical amino acid) MSTRSVSSSSYRRMFGGPGTASRPSSSRSYVTTSTRTYSLGSALRPSTSRSLYASSPGGVYATRSSAVRLRSSVP GVRLLQDSVDFSLADAINTEFKNTRTNEKVELQELNDRFAXYIDKVRFLEQQNKILLAELEQLKGQGKSRLGDLY EEEMRELRRQVDQLTNDKARVEVERDNLAEDIMRLREKLQEEMLQREEAENTLQSFRQDVDNASLARLDLERKVE SLQEEIAFLKKLHEEEIQELQAQIQEQHVQIDVDVSKPDLTAALRDVRQQYESVAAKNLQEAEEWYKSKFADLSE AANRNNDALRQAKQESTEYRRQVQSLTCEVDALKGTNESLERQMREMEENFAVEAANYQDTIGRLQDEIQNMKEE MARHLREYQDLLNVKMALDIEIATYRKLLEGEESRISLPLPNFSSLNLRETNLDSLPLVDTHSKRTLLIKTVETR DGQVINETSQHHDDLEGDPPVATMVSKGEENNMAIIKEFMRFKVRMEGTWGHEFEIEGEGEGRPYEGFQTAKLK VTKGGPLPFAWDILSPLFTYGSKAYVKHPADIPDYFKLSFPEGFKWERVMNYEDGGWTVTQDSSLQDGEFIYKV KMRGTNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIRMRLKLKDGGHYTSEVKTTYKAKKSVQLPGAYIVGI KLDITSHNEDYTIVEQYERAEGRHSTGGMDELYK (SEQ ID NO: 206)
INSR676TAG-EGFP
DNA: (Amber codon underlined)
ATGGGCACCGGGGGCCGGCGGGGGGCGGCGGCCGCGCCGCTGCTGGTGGCGGTGGCCGCGCTGCTACTGGGCGCC
GCGGGCCACCTGTACCCCGGAGAGGTGTGTCCCGGCATGGATATCCGGAACAACCTCACTAGGTTGCATGAGCTG
GAGAATTGCTCTGTCATCGAAGGACACTTGCAGATACTCTTGATGTTCAAAACGAGGCCCGAAGATTTCCGAGAC
CTCAGTTTCCCCAAACTCATCATGATCACTGATTACTTGCTGCTCTTCCGGGTCTATGGGCTCGAGAGCCTGAAG
GACCTGTTCCCCAACCTCACGGTCATCCGGGGATCACGACTGTTCTTTAACTACGCGCTGGTCATCTTCGAGATG
GTTCACCTCAAGGAACTCGGCCTCTACAACCTGATGAACATCACCCGGGGTTCTGTCCGCATCGAGAAGAACAAT
GAGCTCTGTTACTTGGCCACTATCGACTGGTCCCGTATCCTGGATTCCGTGGAGGATAATTACATCGTGTTGAAC
AAAGATGACAACGAGGAGTGTGGAGACATCTGTCCGGGTACCGCGAAGGGCAAGACCAACTGCCCCGCCACCGTC
ATCAACGGGCAGTTTGTCGAACGATGTTGGACTCATAGTCACTGCCAGAAAGTTTGCCCGACCATCTGTAAGTCA
CACGGCTGCACCGCCGAAGGCCTCTGTTGCCACAGCGAGTGCCTGGGCAACTGTTCTCAGCCCGACGACCCCACC
AAGTGCGTGGCCTGCCGCAACTTCTACCTGGATGGCAGGTGTGTGGAGACCTGCCCGCCCCCGTACTACCACTTC
CAGGACTGGCGCTGTGTGAACTTCAGCTTCTGCCAGGACCTGCACCACAAATGCAAGAACTCGCGGAGGCAGGGC
TGCCACCAGTACGTCATTCACAACAACAAGTGCATCCCTGAGTGTCCCTCCGGGTACACGATGAATTCCAGCAAC
TTGCTGTGCACCCCATGCCTGGGTCCCTGTCCCAAGGTGTGCCACCTCCTAGAAGGCGAGAAGACCATCGACTCG
GTGACGTCTGCCCAGGAGCTCCGAGGATGCACCGTCATCAACGGGAGTCTGATCATCAACATTCGAGGAGGCAAC
AATCTGGCAGCTGAGCTAGAAGCCAACCTCGGCCTCATTGAAGAAATTTCAGGGTATCTAAAAATCCGCCGATCC
TACGCTCTGGTGTCACTTTCCTTCTTCCGGAAGTTACGTCTGATTCGAGGAGAGACCTTGGAAATTGGGAACTAC
TCCTTCTATGCCTTGGACAACCAGAACCTAAGGCAGCTCTGGGACTGGAGCAAACACAACCTCACCATCACTCAG
GGGAAACTCTTCTTCCACTATAACCCCAAACTCTGCTTGTCAGAAATCCACAAGATGGAAGAAGTTTCAGGAACC
AAGGGGCGCCAGGAGAGAAACGACATTGCCCTGAAGACCAATGGGGACCAGGCATCCTGTGAAAATGAGTTACTT
AAATTTTCTTACATTCGGACATCTTTTGACAAGATCTTGCTGAGATGGGAGCCGTACTGGCCCCCCGACTTCCGA
GACCTCTTGGGGTTCATGCTGTTCTACAAAGAGGCCCCTTATCAGAATGTGACGGAGTTCGACGGGCAGGATGCA
TGTGGTTCCAACAGTTGGACGGTGGTAGACATTGACCCACCCCTGAGGTCCAACGACCCCAAATCACAGAACCAC
CCAGGGTGGCTGATGCGGGGTCTCAAGCCCTGGACCCAGTATGCCATCTTTGTGAAGACCCTGGTCACCTTTTCG
GATGAACGCCGGACCTATGGGGCCAAGAGTGACATCATTTATGTCCAGACAGATGCCACCAACCCCTCTGTGCCC
CTGGATCCAATCTCAGTGTCTAACTCATCATCCCAGATTATTCTGAAGTGGAAACCACCCTCCGACCCCAATGGC
AACATCACCCACTACCTGGTTTTCTGGGAGAGGCAGGCGGAAGACAGTGAGCTGTTCGAGCTGGATTATTGCCTC
TAGGGGCTGAAGCTGCCCTCGAGGACCTGGTCTCCACCATTCGAGTCTGAAGATTCTCAGAAGCACAACCAGAGT
GAGTATGAGGATTCGGCCGGCGAATGCTGCTCCTGTCCAAAGACAGACTCTCAGATCCTGAAGGAGCTGGAGGAG
TCCTCGTTTAGGAAGACGTTTGAGGATTACCTGCACAACGTGGTTTTCGTCCCCAGAAAAACCTCTTCAGGCACT
GGTGCCGAGGACCCTAGGCCATCTCGGAAACGCAGGTCCCTTGGCGATGTTGGGAATGTGACGGTGGCCGTGCCC
ACGGTGGCAGCTTTCCCCAACACTTCCTCGACCAGCGTGCCCACGAGTCCGGAGGAGCACAGGCCTTTTGAGAAG
GTGGTGAACAAGGAGTCGCTGGTCATCTCCGGCTTGCGACACTTCACGGGCTATCGCATCGAGCTGCAGGCTTGC
AACCAGGACACCCCTGAGGAACGGTGCAGTGTGGCAGCCTACGTCAGTGCGAGGACCATGCCTGAAGCCAAGGCT
GATGACATTGTTGGCCCTGTGACGCATGAAATCTTTGAGAACAACGTCGTCCACTTGATGTGGCAGGAGCCGAAG
GAGCCCAATGGTCTGATCGTGCTGTATGAAGTGAGTTATCGGCGATATGGTGATGAGGAGCTGCATCTCTGCGTC
TCCCGCAAGCACTTCGCTCTGGAACGGGGCTGCAGGCTGCGTGGGCTGTCACCGGGGAACTACAGCGTGCGAATC
CGGGCCACCTCCCTTGCGGGCAACGGCTCTTGGACGGAACCCACCTATTTCTACGTGACAGACTATTTAGACGTC
CCGTCAAATATTGCAAAAATTATCATCGGCCCCCTCATCTTTGTCTTTCTCTTCAGTGTTGTGATTGGAAGTATT
TATCTATTCCTGAGAAAGAGGCAGCCAGATGGGCCGCTGGGACCGCTTTACGCTTCTTCAAACCCTGAGTATCTC
AGTGCCAGTGATGTGTTTCCATGCTCTGTGTACGTGCCGGACGAGTGGGAGGTGTCTCGAGAGAAGATCACCCTC
CTTCGAGAGCTGGGGCAGGGCTCCTTCGGCATGGTGTATGAGGGCAATGCCAGGGACATCATCAAGGGTGAGGCA
GAGACCCGCGTGGCGGTGAAGACGGTCAACGAGTCAGCCAGTCTCCGAGAGCGGATTGAGTTCCTCAATGAGGCC
TCGGTCATGAAGGGCTTCACCTGCCATCACGTGGTGCGCCTCCTGGGAGTGGTGTCCAAGGGCCAGCCCACGCTG
GTGGTGATGGAGCTGATGGCTCACGGAGACCTGAAGAGCTACCTCCGTTCTCTGCGGCCAGAGGCTGAGAATAAT CCTGGCCGCCCTCCCCCTACCCTTCAAGAGATGATTCAGATGGCGGCAGAGATTGCTGACGGGATGGCCTACCTG AACGCCAAGAAGTTTGTGCATCGGGACCTGGCAGCGAGAAACTGCATGGTCGCCCATGATTTTACTGTCAAAATT GGAGACTTTGGAATGACCAGAGACATCTATGAAACGGATTACTACCGGAAAGGGGGCAAGGGTCTGCTCCCTGTA CGGTGGATGGCACCGGAGTCCCTGAAGGATGGGGTCTTCACCACTTCTTCTGACATGTGGTCCTTTGGCGTGGTC CTTTGGGAAATCACCAGCTTGGCAGAACAGCCTTACCAAGGCCTGTCTAATGAACAGGTGTTGAAATTTGTCATG GATGGAGGGTATCTGGATCAACCCGACAACTGTCCAGAGAGAGTCACTGACCTCATGCGCATGTGCTGGCAATTC AACCCCAACATGAGGCCAACCTTCCTGGAGATTGTCAACCTGCTCAAGGACGACCTGCACCCCAGCTTTCCAGAG GTGTCGTTCTTCCACAGCGAGGAGAACAAGGCTCCCGAGAGTGAGGAGCTGGAGATGGAGTTTGAGGACATGGAG AATGTGCCCCTGGACCGTTCCTCGCACTGTCAGAGGGAGGAGGCGGGGGGCCGGGATGGAGGGTCCTCGCTGGGT TTCAAGCGGAGCTACGAGGAACACATCCCTTACACACACATGAACGGAGGCAAGAAAAACGGGCGGATTCTGACC TTGCCTCGGTCCAATCCTTCCTGGGCCCGGGATCCACCGGTCGCCACCATGGTGAGCAAGGGCGAGGAGCTGTTC ACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGC GAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCC ACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTC TTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACC CGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGAC GGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAG AACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAG CAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGC AAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATG GACGAGCTGTACAAGTAA (SEQ ID NO: 207)
Protein: (X indicates non-canonical amino acid)
MGTGGRRGAAAAPLLVAVAALLLGAAGHLYPGEVCPGMDIRNNLTRLHELENCSVIEGHLQILLMFKTRPEDFRD LSFPKLIMITDYLLLFRVYGLESLKDLFPNLTVIRGSRLFFNYALVIFEMVHLKELGLYNLMNITRGSVRIEKNN ELCYLATIDWSRILDSVEDNYIVLNKDDNEECGDICPGTAKGKTNCPATVINGQFVERCWTHSHCQKVCPTICKS HGCTAEGLCCHSECLGNCSQPDDPTKCVACRNFYLDGRCVETCPPPYYHFQDWRCVNFSFCQDLHHKCKNSRRQG CHQYVIHNNKCIPECPSGYTMNSSNLLCTPCLGPCPKVCHLLEGEKTIDSVTSAQELRGCTVINGSLIINIRGGN NLAAELEANLGLIEEISGYLKIRRSYALVSLSFFRKLRLIRGETLEIGNYSFYALDNQNLRQLWDWSKHNLTITQ GKLFFHYNPKLCLSEIHKMEEVSGTKGRQERNDIALKTNGDQASCENELLKFSYIRTSFDKILLRWEPYWPPDFR DLLGFMLFYKEAPYQNVTEFDGQDACGSNSWTWDIDPPLRSNDPKSQNHPGWLMRGLKPWTQYAIFVKTLVTFS DERRTYGAKSDIIYVQTDATNPSVPLDPISVSNSSSQIILKWKPPSDPNGNITHYLVFWERQAEDSELFELDYCL XGLKLPSRTWSPPFESEDSQKHNQSEYEDSAGECCSCPKTDSQILKELEESSFRKTFEDYLHNWFVPRKTSSGT GAEDPRPSRKRRSLGDVGNVTVAVPTVAAFPNTSSTSVPTSPEEHRPFEKWNKESLVISGLRHFTGYRIELQAC NQDTPEERCSVAAYVSARTMPEAKADDIVGPVTHEIFENNWHLMWQEPKEPNGLIVLYEVSYRRYGDEELHLCV SRKHFALERGCRLRGLSPGNYSVRIRATSLAGNGSWTEPTYFYVTDYLDVPSNIAKIIIGPLIFVFLFSWIGSI YLFLRKRQPDGPLGPLYASSNPEYLSASDVFPCSVYVPDEWEVSREKITLLRELGQGSFGMVYEGNARDIIKGEA ETRVAVKTVNESASLRERIEFLNEASVMKGFTCHHWRLLGWSKGQPTLWMELMAHGDLKSYLRSLRPEAENN PGRPPPTLQEMIQMAAEIADGMAYLNAKKFVHRDLAARNCMVAHDFTVKIGDFGMTRDIYETDYYRKGGKGLLPV RWMAPESLKDGVFTTSSDMWSFGWLWEITSLAEQPYQGLSNEQVLKFVMDGGYLDQPDNCPERVTDLMRMCWQF NPNMRPTFLEIVNLLKDDLHPSFPEVSFFHSEENKAPESEELEMEFEDMENVPLDRSSHCQREEAGGRDGGSSLG FKRSYEEHIPYTHMNGGKKNGRILTLPRSNPSWARDPPVATMVSKGEELFTGWPILVELDGDVNGHKFSVSGEG EGDATYGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKT RAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQ QNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYK (SEQ ID NO:208)
INSR676TAG-EGFP-MS2
DNA: (MS2 stem-loops and Amber codon underlined)
ATGGGCACCGGGGGCCGGCGGGGGGCGGCGGCCGCGCCGCTGCTGGTGGCGGTGGCCGCGCTGCTACTGGGCGCC
GCGGGCCACCTGTACCCCGGAGAGGTGTGTCCCGGCATGGATATCCGGAACAACCTCACTAGGTTGCATGAGCTG
GAGAATTGCTCTGTCATCGAAGGACACTTGCAGATACTCTTGATGTTCAAAACGAGGCCCGAAGATTTCCGAGAC
CTCAGTTTCCCCAAACTCATCATGATCACTGATTACTTGCTGCTCTTCCGGGTCTATGGGCTCGAGAGCCTGAAG
GACCTGTTCCCCAACCTCACGGTCATCCGGGGATCACGACTGTTCTTTAACTACGCGCTGGTCATCTTCGAGATG
GTTCACCTCAAGGAACTCGGCCTCTACAACCTGATGAACATCACCCGGGGTTCTGTCCGCATCGAGAAGAACAAT
GAGCTCTGTTACTTGGCCACTATCGACTGGTCCCGTATCCTGGATTCCGTGGAGGATAATTACATCGTGTTGAAC
AAAGATGACAACGAGGAGTGTGGAGACATCTGTCCGGGTACCGCGAAGGGCAAGACCAACTGCCCCGCCACCGTC
ATCAACGGGCAGTTTGTCGAACGATGTTGGACTCATAGTCACTGCCAGAAAGTTTGCCCGACCATCTGTAAGTCA
CACGGCTGCACCGCCGAAGGCCTCTGTTGCCACAGCGAGTGCCTGGGCAACTGTTCTCAGCCCGACGACCCCACC
AAGTGCGTGGCCTGCCGCAACTTCTACCTGGATGGCAGGTGTGTGGAGACCTGCCCGCCCCCGTACTACCACTTC CAGGACTGGCGCTGTGTGAACTTCAGCTTCTGCCAGGACCTGCACCACAAATGCAAGAACTCGCGGAGGCAGGGC TGCCACCAGTACGTCATTCACAACAACAAGTGCATCCCTGAGTGTCCCTCCGGGTACACGATGAATTCCAGCAAC TTGCTGTGCACCCCATGCCTGGGTCCCTGTCCCAAGGTGTGCCACCTCCTAGAAGGCGAGAAGACCATCGACTCG GTGACGTCTGCCCAGGAGCTCCGAGGATGCACCGTCATCAACGGGAGTCTGATCATCAACATTCGAGGAGGCAAC AATCTGGCAGCTGAGCTAGAAGCCAACCTCGGCCTCATTGAAGAAATTTCAGGGTATCTAAAAATCCGCCGATCC TACGCTCTGGTGTCACTTTCCTTCTTCCGGAAGTTACGTCTGATTCGAGGAGAGACCTTGGAAATTGGGAACTAC TCCTTCTATGCCTTGGACAACCAGAACCTAAGGCAGCTCTGGGACTGGAGCAAACACAACCTCACCATCACTCAG GGGAAACTCTTCTTCCACTATAACCCCAAACTCTGCTTGTCAGAAATCCACAAGATGGAAGAAGTTTCAGGAACC AAGGGGCGCCAGGAGAGAAACGACATTGCCCTGAAGACCAATGGGGACCAGGCATCCTGTGAAAATGAGTTACTT AAATTTTCTTACATTCGGACATCTTTTGACAAGATCTTGCTGAGATGGGAGCCGTACTGGCCCCCCGACTTCCGA GACCTCTTGGGGTTCATGCTGTTCTACAAAGAGGCCCCTTATCAGAATGTGACGGAGTTCGACGGGCAGGATGCA TGTGGTTCCAACAGTTGGACGGTGGTAGACATTGACCCACCCCTGAGGTCCAACGACCCCAAATCACAGAACCAC CCAGGGTGGCTGATGCGGGGTCTCAAGCCCTGGACCCAGTATGCCATCTTTGTGAAGACCCTGGTCACCTTTTCG GATGAACGCCGGACCTATGGGGCCAAGAGTGACATCATTTATGTCCAGACAGATGCCACCAACCCCTCTGTGCCC CTGGATCCAATCTCAGTGTCTAACTCATCATCCCAGATTATTCTGAAGTGGAAACCACCCTCCGACCCCAATGGC AACATCACCCACTACCTGGTTTTCTGGGAGAGGCAGGCGGAAGACAGTGAGCTGTTCGAGCTGGATTATTGCCTC TAGGGGCTGAAGCTGCCCTCGAGGACCTGGTCTCCACCATTCGAGTCTGAAGATTCTCAGAAGCACAACCAGAGT GAGTATGAGGATTCGGCCGGCGAATGCTGCTCCTGTCCAAAGACAGACTCTCAGATCCTGAAGGAGCTGGAGGAG TCCTCGTTTAGGAAGACGTTTGAGGATTACCTGCACAACGTGGTTTTCGTCCCCAGAAAAACCTCTTCAGGCACT GGTGCCGAGGACCCTAGGCCATCTCGGAAACGCAGGTCCCTTGGCGATGTTGGGAATGTGACGGTGGCCGTGCCC ACGGTGGCAGCTTTCCCCAACACTTCCTCGACCAGCGTGCCCACGAGTCCGGAGGAGCACAGGCCTTTTGAGAAG GTGGTGAACAAGGAGTCGCTGGTCATCTCCGGCTTGCGACACTTCACGGGCTATCGCATCGAGCTGCAGGCTTGC AACCAGGACACCCCTGAGGAACGGTGCAGTGTGGCAGCCTACGTCAGTGCGAGGACCATGCCTGAAGCCAAGGCT GATGACATTGTTGGCCCTGTGACGCATGAAATCTTTGAGAACAACGTCGTCCACTTGATGTGGCAGGAGCCGAAG GAGCCCAATGGTCTGATCGTGCTGTATGAAGTGAGTTATCGGCGATATGGTGATGAGGAGCTGCATCTCTGCGTC TCCCGCAAGCACTTCGCTCTGGAACGGGGCTGCAGGCTGCGTGGGCTGTCACCGGGGAACTACAGCGTGCGAATC CGGGCCACCTCCCTTGCGGGCAACGGCTCTTGGACGGAACCCACCTATTTCTACGTGACAGACTATTTAGACGTC CCGTCAAATATTGCAAAAATTATCATCGGCCCCCTCATCTTTGTCTTTCTCTTCAGTGTTGTGATTGGAAGTATT TATCTATTCCTGAGAAAGAGGCAGCCAGATGGGCCGCTGGGACCGCTTTACGCTTCTTCAAACCCTGAGTATCTC AGTGCCAGTGATGTGTTTCCATGCTCTGTGTACGTGCCGGACGAGTGGGAGGTGTCTCGAGAGAAGATCACCCTC CTTCGAGAGCTGGGGCAGGGCTCCTTCGGCATGGTGTATGAGGGCAATGCCAGGGACATCATCAAGGGTGAGGCA GAGACCCGCGTGGCGGTGAAGACGGTCAACGAGTCAGCCAGTCTCCGAGAGCGGATTGAGTTCCTCAATGAGGCC TCGGTCATGAAGGGCTTCACCTGCCATCACGTGGTGCGCCTCCTGGGAGTGGTGTCCAAGGGCCAGCCCACGCTG GTGGTGATGGAGCTGATGGCTCACGGAGACCTGAAGAGCTACCTCCGTTCTCTGCGGCCAGAGGCTGAGAATAAT CCTGGCCGCCCTCCCCCTACCCTTCAAGAGATGATTCAGATGGCGGCAGAGATTGCTGACGGGATGGCCTACCTG AACGCCAAGAAGTTTGTGCATCGGGACCTGGCAGCGAGAAACTGCATGGTCGCCCATGATTTTACTGTCAAAATT GGAGACTTTGGAATGACCAGAGACATCTATGAAACGGATTACTACCGGAAAGGGGGCAAGGGTCTGCTCCCTGTA CGGTGGATGGCACCGGAGTCCCTGAAGGATGGGGTCTTCACCACTTCTTCTGACATGTGGTCCTTTGGCGTGGTC CTTTGGGAAATCACCAGCTTGGCAGAACAGCCTTACCAAGGCCTGTCTAATGAACAGGTGTTGAAATTTGTCATG GATGGAGGGTATCTGGATCAACCCGACAACTGTCCAGAGAGAGTCACTGACCTCATGCGCATGTGCTGGCAATTC AACCCCAACATGAGGCCAACCTTCCTGGAGATTGTCAACCTGCTCAAGGACGACCTGCACCCCAGCTTTCCAGAG GTGTCGTTCTTCCACAGCGAGGAGAACAAGGCTCCCGAGAGTGAGGAGCTGGAGATGGAGTTTGAGGACATGGAG AATGTGCCCCTGGACCGTTCCTCGCACTGTCAGAGGGAGGAGGCGGGGGGCCGGGATGGAGGGTCCTCGCTGGGT TTCAAGCGGAGCTACGAGGAACACATCCCTTACACACACATGAACGGAGGCAAGAAAAACGGGCGGATTCTGACC TTGCCTCGGTCCAATCCTTCCTGGGCCCGGGATCCACCGGTCGCCACCATGGTGAGCAAGGGCGAGGAGCTGTTC ACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGTCCGGCGAGGGC GAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCC ACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTC TTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACC CGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGAC GGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAG AACGGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAG CAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGC AAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATG GACGAGCTGTACAAGTAAAGCGGCCGCGCGGCCGCGACTCTAGATCATAATCAGCACATGAGGATCACCCATGTC TGCAGGTCGACTCTAGAAAACATGAGGATCACCCATGT (SEQ ID NO: 209)
Protein: (X indicates non-canonical amino acid) MGTGGRRGAAAAPLLVAVAALLLGAAGHLYPGEVCPGMDIRNNLTRLHELENCSVIEGHLQILLMFKTRPEDFRD LSFPKLIMITDYLLLFRVYGLESLKDLFPNLTVIRGSRLFFNYALVIFEMVHLKELGLYNLMNITRGSVRIEKNN ELCYLATIDWSRILDSVEDNYIVLNKDDNEECGDICPGTAKGKTNCPATVINGQFVERCWTHSHCQKVCPTICKS HGCTAEGLCCHSECLGNCSQPDDPTKCVACRNFYLDGRCVETCPPPYYHFQDWRCWFSFCQDLHHKCKNSRRQG CHQYVIHNNKCIPECPSGYTMNSSNLLCTPCLGPCPKVCHLLEGEKTIDSVTSAQELRGCTVINGSLIINIRGGN NLAAELEANLGLIEEISGYLKIRRSYALVSLSFFRKLRLIRGETLEIGNYSFYALDNQNLRQLWDWSKHNLTITQ GKLFFHYNPKLCLSEIHKMEEVSGTKGRQERNDIALKTNGDQASCENELLKFSYIRTSFDKILLRWEPYWPPDFR DLLGFMLFYKEAPYQNVTEFDGQDACGSNSWTWDIDPPLRSNDPKSQNHPGWLMRGLKPWTQYAIFVKTLVTFS DERRTYGAKSDIIYVQTDATNPSVPLDPISVSNSSSQIILKWKPPSDPNGNITHYLVFWERQAEDSELFELDYCL XGLKLPSRTWSPPFESEDSQKHNQSEYEDSAGECCSCPKTDSQILKELEESSFRKTFEDYLHNWFVPRKTSSGT GAEDPRPSRKRRSLGDVGNVTVAVPTVAAFPNTSSTSVPTSPEEHRPFEKVWKESLVISGLRHFTGYRIELQAC NQDTPEERCSVAAYVSARTMPEAKADDIVGPVTHEIFENNWHLMWQEPKEPNGLIVLYEVSYRRYGDEELHLCV SRKHFALERGCRLRGLSPGNYSVRIRATSLAGNGSWTEPTYFYVTDYLDVPSNIAKIIIGPLIFVFLFSWIGSI YLFLRKRQPDGPLGPLYASSNPEYLSASDVFPCSVYVPDEWEVSREKITLLRELGQGSFGMVYEGNARDIIKGEA ETRVAVKTWESASLRERIEFLNEASVMKGFTCHHWRLLGWSKGQPTLWMELMAHGDLKSYLRSLRPEAENN PGRPPPTLQEMIQMAAEIADGMAYLNAKKFVHRDLAARNCMVAHDFTVKIGDFGMTRDIYETDYYRKGGKGLLPV RWMAPESLKDGVFTTSSDMWSFGWLWEITSLAEQPYQGLSNEQVLKFVMDGGYLDQPDNCPERVTDLMRMCWQF NPNMRPTFLEIWLLKDDLHPSFPEVSFFHSEENKAPESEELEMEFEDMENVPLDRSSHCQREEAGGRDGGSSLG FKRSYEEHIPYTHMNGGKKNGRILTLPRSNPSWARDPPVATMVSKGEELFTGWPILVELDGDWGHKFSVSGEG EGDATYGKLTLKFICTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKT RAEVKFEGDTLWRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKWFKIRHNIEDGSVQLADHYQ QNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYK (SEQ ID NO: 210)
INSR676TAG-mOrange-MS2
DNA: (MS2 stem-loops and Amber codon underlined)
ATGGGCACCGGGGGCCGGCGGGGGGCGGCGGCCGCGCCGCTGCTGGTGGCGGTGGCCGCGCTGCTACTGGGCGCC
GCGGGCCACCTGTACCCCGGAGAGGTGTGTCCCGGCATGGATATCCGGAACAACCTCACTAGGTTGCATGAGCTG
GAGAATTGCTCTGTCATCGAAGGACACTTGCAGATACTCTTGATGTTCAAAACGAGGCCCGAAGATTTCCGAGAC
CTCAGTTTCCCCAAACTCATCATGATCACTGATTACTTGCTGCTCTTCCGGGTCTATGGGCTCGAGAGCCTGAAG
GACCTGTTCCCCAACCTCACGGTCATCCGGGGATCACGACTGTTCTTTAACTACGCGCTGGTCATCTTCGAGATG
GTTCACCTCAAGGAACTCGGCCTCTACAACCTGATGAACATCACCCGGGGTTCTGTCCGCATCGAGAAGAACAAT
GAGCTCTGTTACTTGGCCACTATCGACTGGTCCCGTATCCTGGATTCCGTGGAGGATAATTACATCGTGTTGAAC
AAAGATGACAACGAGGAGTGTGGAGACATCTGTCCGGGTACCGCGAAGGGCAAGACCAACTGCCCCGCCACCGTC
ATCAACGGGCAGTTTGTCGAACGATGTTGGACTCATAGTCACTGCCAGAAAGTTTGCCCGACCATCTGTAAGTCA
CACGGCTGCACCGCCGAAGGCCTCTGTTGCCACAGCGAGTGCCTGGGCAACTGTTCTCAGCCCGACGACCCCACC
AAGTGCGTGGCCTGCCGCAACTTCTACCTGGATGGCAGGTGTGTGGAGACCTGCCCGCCCCCGTACTACCACTTC
CAGGACTGGCGCTGTGTGAACTTCAGCTTCTGCCAGGACCTGCACCACAAATGCAAGAACTCGCGGAGGCAGGGC
TGCCACCAGTACGTCATTCACAACAACAAGTGCATCCCTGAGTGTCCCTCCGGGTACACGATGAATTCCAGCAAC
TTGCTGTGCACCCCATGCCTGGGTCCCTGTCCCAAGGTGTGCCACCTCCTAGAAGGCGAGAAGACCATCGACTCG
GTGACGTCTGCCCAGGAGCTCCGAGGATGCACCGTCATCAACGGGAGTCTGATCATCAACATTCGAGGAGGCAAC
AATCTGGCAGCTGAGCTAGAAGCCAACCTCGGCCTCATTGAAGAAATTTCAGGGTATCTAAAAATCCGCCGATCC
TACGCTCTGGTGTCACTTTCCTTCTTCCGGAAGTTACGTCTGATTCGAGGAGAGACCTTGGAAATTGGGAACTAC
TCCTTCTATGCCTTGGACAACCAGAACCTAAGGCAGCTCTGGGACTGGAGCAAACACAACCTCACCATCACTCAG
GGGAAACTCTTCTTCCACTATAACCCCAAACTCTGCTTGTCAGAAATCCACAAGATGGAAGAAGTTTCAGGAACC
AAGGGGCGCCAGGAGAGAAACGACATTGCCCTGAAGACCAATGGGGACCAGGCATCCTGTGAAAATGAGTTACTT
AAATTTTCTTACATTCGGACATCTTTTGACAAGATCTTGCTGAGATGGGAGCCGTACTGGCCCCCCGACTTCCGA
GACCTCTTGGGGTTCATGCTGTTCTACAAAGAGGCCCCTTATCAGAATGTGACGGAGTTCGACGGGCAGGATGCA
TGTGGTTCCAACAGTTGGACGGTGGTAGACATTGACCCACCCCTGAGGTCCAACGACCCCAAATCACAGAACCAC
CCAGGGTGGCTGATGCGGGGTCTCAAGCCCTGGACCCAGTATGCCATCTTTGTGAAGACCCTGGTCACCTTTTCG
GATGAACGCCGGACCTATGGGGCCAAGAGTGACATCATTTATGTCCAGACAGATGCCACCAACCCCTCTGTGCCC
CTGGATCCAATCTCAGTGTCTAACTCATCATCCCAGATTATTCTGAAGTGGAAACCACCCTCCGACCCCAATGGC
AACATCACCCACTACCTGGTTTTCTGGGAGAGGCAGGCGGAAGACAGTGAGCTGTTCGAGCTGGATTATTGCCTC
TAGGGGCTGAAGCTGCCCTCGAGGACCTGGTCTCCACCATTCGAGTCTGAAGATTCTCAGAAGCACAACCAGAGT
GAGTATGAGGATTCGGCCGGCGAATGCTGCTCCTGTCCAAAGACAGACTCTCAGATCCTGAAGGAGCTGGAGGAG
TCCTCGTTTAGGAAGACGTTTGAGGATTACCTGCACAACGTGGTTTTCGTCCCCAGGCCATCTCGGAAACGCAGG
TCCCTTGGCGATGTTGGGAATGTGACGGTGGCCGTGCCCACGGTGGCAGCTTTCCCCAACACTTCCTCGACCAGC
GTGCCCACGAGTCCGGAGGAGCACAGGCCTTTTGAGAAGGTGGTGAACAAGGAGTCGCTGGTCATCTCCGGCTTG
CGACACTTCACGGGCTATCGCATCGAGCTGCAGGCTTGCAACCAGGACACCCCTGAGGAACGGTGCAGTGTGGCA GCCTACGTCAGTGCGAGGACCATGCCTGAAGCCAAGGCTGATGACATTGTTGGCCCTGTGACGCATGAAATCTTT
GAGAACAACGTCGTCCACTTGATGTGGCAGGAGCCGAAGGAGCCCAATGGTCTGATCGTGCTGTATGAAGTGAGT
TATCGGCGATATGGTGATGAGGAGCTGCATCTCTGCGTCTCCCGCAAGCACTTCGCTCTGGAACGGGGCTGCAGG
CTGCGTGGGCTGTCACCGGGGAACTACAGCGTGCGAATCCGGGCCACCTCCCTTGCGGGCAACGGCTCTTGGACG
GAACCCACCTATTTCTACGTGACAGACTATTTAGACGTCCCGTCAAATATTGCAAAAATTATCATCGGCCCCCTC
ATCTTTGTCTTTCTCTTCAGTGTTGTGATTGGAAGTATTTATCTATTCCTGAGAAAGAGGCAGCCAGATGGGCCG
CTGGGACCGCTTTACGCTTCTTCAAACCCTGAGTATCTCAGTGCCAGTGATGTGTTTCCATGCTCTGTGTACGTG
CCGGACGAGTGGGAGGTGTCTCGAGAGAAGATCACCCTCCTTCGAGAGCTGGGGCAGGGCTCCTTCGGCATGGTG
TATGAGGGCAATGCCAGGGACATCATCAAGGGTGAGGCAGAGACCCGCGTGGCGGTGAAGACGGTCAACGAGTCA
GCCAGTCTCCGAGAGCGGATTGAGTTCCTCAATGAGGCCTCGGTCATGAAGGGCTTCACCTGCCATCACGTGGTG
CGCCTCCTGGGAGTGGTGTCCAAGGGCCAGCCCACGCTGGTGGTGATGGAGCTGATGGCTCACGGAGACCTGAAG
AGCTACCTCCGTTCTCTGCGGCCAGAGGCTGAGAATAATCCTGGCCGCCCTCCCCCTACCCTTCAAGAGATGATT
CAGATGGCGGCAGAGATTGCTGACGGGATGGCCTACCTGAACGCCAAGAAGTTTGTGCATCGGGACCTGGCAGCG
AGAAACTGCATGGTCGCCCATGATTTTACTGTCAAAATTGGAGACTTTGGAATGACCAGAGACATCTATGAAACG
GATTACTACCGGAAAGGGGGCAAGGGTCTGCTCCCTGTACGGTGGATGGCACCGGAGTCCCTGAAGGATGGGGTC
TTCACCACTTCTTCTGACATGTGGTCCTTTGGCGTGGTCCTTTGGGAAATCACCAGCTTGGCAGAACAGCCTTAC
CAAGGCCTGTCTAATGAACAGGTGTTGAAATTTGTCATGGATGGAGGGTATCTGGATCAACCCGACAACTGTCCA
GAGAGAGTCACTGACCTCATGCGCATGTGCTGGCAATTCAACCCCAACATGAGGCCAACCTTCCTGGAGATTGTC
AACCTGCTCAAGGACGACCTGCACCCCAGCTTTCCAGAGGTGTCGTTCTTCCACAGCGAGGAGAACAAGGCTCCC
GAGAGTGAGGAGCTGGAGATGGAGTTTGAGGACATGGAGAATGTGCCCCTGGACCGTTCCTCGCACTGTCAGAGG
GAGGAGGCGGGGGGCCGGGATGGAGGGTCCTCGCTGGGTTTCAAGCGGAGCTACGAGGAACACATCCCTTACACA
CACATGAACGGAGGCAAGAAAAACGGGCGGATTCTGACCTTGCCTCGGTCCAATCCTTCCTGGGCCCGGGATCCA
CCGGTCGCCACCGTGAGCAAGGGCGAGGAGAATAATATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCGC
ATGGAGGGCACCGTGAACGGCCACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCTTTCAG
ACCGCTAAGCTGAAGGTGACCAAGGGCGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCTCTTCACCTAC
GGCTCCAAGGCCTACGTGAAGCACCCCGCCGACATCCCCGACTACTTCAAGCTGTCCTTCCCCGAGGGCTTCAAG
TGGGAGCGCGTGATGAACTACGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCACTGCAGGACGGCGAG
TTCATCTACAAGGTGAAGATGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTGATGCAGAAGAAGACCATGGGC
TGGGAGGCCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAGGATGAGGCTGAAGCTG
AAGGACGGCGGCCACTACACCTCCGAGGTCAAGACCACCTACAAGGCCAAGAAGTCCGTGCAGCTGCCCGGCGCC
TACATCGTCGGCATCAAGCTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGAACAGTACGAACGCGCC
GAGGGCCGCCACTCCACCGGCGGCATGGACGAGCTGTACAAGTAAAGCGGCCGCGACTCTAGATCATAATCAGCA
CATGAGGATCACCCATGTCTGCAGGTCGACTCTAGAAAACATGAGGATCACCCATGT (SEQ ID NO: 211)
Protein: (X indicates non-canonical amino acid)
MGTGGRRGAAAAPLLVAVAALLLGAAGHLYPGEVCPGMDIRNNLTRLHELENCSVIEGHLQILLMFKTRPEDFRD LSFPKLIMITDYLLLFRVYGLESLKDLFPNLTVIRGSRLFFNYALVIFEMVHLKELGLYNLMNITRGSVRIEKNN ELCYLATIDWSRILDSVEDNYIVLNKDDNEECGDICPGTAKGKTNCPATVINGQFVERCWTHSHCQKVCPTICKS HGCTAEGLCCHSECLGNCSQPDDPTKCVACRNFYLDGRCVETCPPPYYHFQDWRCVNFSFCQDLHHKCKNSRRQG CHQYVIHNNKCIPECPSGYTMNSSNLLCTPCLGPCPKVCHLLEGEKTIDSVTSAQELRGCTVINGSLIINIRGGN NLAAELEANLGLIEEISGYLKIRRSYALVSLSFFRKLRLIRGETLEIGNYSFYALDNQNLRQLWDWSKHNLTITQ GKLFFHYNPKLCLSEIHKMEEVSGTKGRQERNDIALKTNGDQASCENELLKFSYIRTSFDKILLRWEPYWPPDFR DLLGFMLFYKEAPYQNVTEFDGQDACGSNSWTWDIDPPLRSNDPKSQNHPGWLMRGLKPWTQYAIFVKTLVTFS DERRTYGAKSDIIYVQTDATNPSVPLDPISVSNSSSQIILKWKPPSDPNGNITHYLVFWERQAEDSELFELDYCL XGLKLPSRTWSPPFESEDSQKHNQSEYEDSAGECCSCPKTDSQILKELEESSFRKTFEDYLHNWFVPRPSRKRR SLGDVGNVTVAVPTVAAFPNTSSTSVPTSPEEHRPFEKWNKESLVISGLRHFTGYRIELQACNQDTPEERCSVA AYVSARTMPEAKADDIVGPVTHEIFENNWHLMWQEPKEPNGLIVLYEVSYRRYGDEELHLCVSRKHFALERGCR LRGLSPGNYSVRIRATSLAGNGSWTEPTYFYVTDYLDVPSNIAKIIIGPLIFVFLFSWIGSIYLFLRKRQPDGP LGPLYASSNPEYLSASDVFPCSVYVPDEWEVSREKITLLRELGQGSFGMVYEGNARDIIKGEAETRVAVKTVNES ASLRERIEFLNEASVMKGFTCHHWRLLGWSKGQPTLWMELMAHGDLKSYLRSLRPEAENNPGRPPPTLQEMI QMAAEIADGMAYLNAKKFVHRDLAARNCMVAHDFTVKIGDFGMTRDIYETDYYRKGGKGLLPVRWMAPESLKDGV FTTSSDMWSFGWLWEITSLAEQPYQGLSNEQVLKFVMDGGYLDQPDNCPERVTDLMRMCWQFNPNMRPTFLEIV NLLKDDLHPSFPEVSFFHSEENKAPESEELEMEFEDMENVPLDRSSHCQREEAGGRDGGSSLGFKRSYEEHIPYT HMNGGKKNGRILTLPRSNPSWARDPPVATVSKGEENNMAIIKEFMRFKVRMEGTVNGHEFEIEGEGEGRPYEGFQ TAKLKVTKGGPLPFAWDILSPLFTYGSKAYVKHPADIPDYFKLSFPEGFKWERVMNYEDGGWTVTQDSSLQDGE FIYKVKMRGTNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIRMRLKLKDGGHYTSEVKTTYKAKKSVQLPGA YIVGIKLDITSHNEDYTIVEQYERAEGRHSTGGMDELYK (SEQ ID NO: 212)
INSR676TAG-iRFP-MS2 DNA: (MS2 stem-loops and Amber codon underlined)
ATGGGCACCGGGGGCCGGCGGGGGGCGGCGGCCGCGCCGCTGCTGGTGGCGGTGGCCGCGCTGCTACTGGGCGCC GCGGGCCACCTGTACCCCGGAGAGGTGTGTCCCGGCATGGATATCCGGAACAACCTCACTAGGTTGCATGAGCTG GAGAATTGCTCTGTCATCGAAGGACACTTGCAGATACTCTTGATGTTCAAAACGAGGCCCGAAGATTTCCGAGAC CTCAGTTTCCCCAAACTCATCATGATCACTGATTACTTGCTGCTCTTCCGGGTCTATGGGCTCGAGAGCCTGAAG GACCTGTTCCCCAACCTCACGGTCATCCGGGGATCACGACTGTTCTTTAACTACGCGCTGGTCATCTTCGAGATG GTTCACCTCAAGGAACTCGGCCTCTACAACCTGATGAACATCACCCGGGGTTCTGTCCGCATCGAGAAGAACAAT GAGCTCTGTTACTTGGCCACTATCGACTGGTCCCGTATCCTGGATTCCGTGGAGGATAATTACATCGTGTTGAAC AAAGATGACAACGAGGAGTGTGGAGACATCTGTCCGGGTACCGCGAAGGGCAAGACCAACTGCCCCGCCACCGTC ATCAACGGGCAGTTTGTCGAACGATGTTGGACTCATAGTCACTGCCAGAAAGTTTGCCCGACCATCTGTAAGTCA CACGGCTGCACCGCCGAAGGCCTCTGTTGCCACAGCGAGTGCCTGGGCAACTGTTCTCAGCCCGACGACCCCACC AAGTGCGTGGCCTGCCGCAACTTCTACCTGGATGGCAGGTGTGTGGAGACCTGCCCGCCCCCGTACTACCACTTC CAGGACTGGCGCTGTGTGAACTTCAGCTTCTGCCAGGACCTGCACCACAAATGCAAGAACTCGCGGAGGCAGGGC TGCCACCAGTACGTCATTCACAACAACAAGTGCATCCCTGAGTGTCCCTCCGGGTACACGATGAATTCCAGCAAC TTGCTGTGCACCCCATGCCTGGGTCCCTGTCCCAAGGTGTGCCACCTCCTAGAAGGCGAGAAGACCATCGACTCG GTGACGTCTGCCCAGGAGCTCCGAGGATGCACCGTCATCAACGGGAGTCTGATCATCAACATTCGAGGAGGCAAC AATCTGGCAGCTGAGCTAGAAGCCAACCTCGGCCTCATTGAAGAAATTTCAGGGTATCTAAAAATCCGCCGATCC TACGCTCTGGTGTCACTTTCCTTCTTCCGGAAGTTACGTCTGATTCGAGGAGAGACCTTGGAAATTGGGAACTAC TCCTTCTATGCCTTGGACAACCAGAACCTAAGGCAGCTCTGGGACTGGAGCAAACACAACCTCACCATCACTCAG GGGAAACTCTTCTTCCACTATAACCCCAAACTCTGCTTGTCAGAAATCCACAAGATGGAAGAAGTTTCAGGAACC AAGGGGCGCCAGGAGAGAAACGACATTGCCCTGAAGACCAATGGGGACCAGGCATCCTGTGAAAATGAGTTACTT AAATTTTCTTACATTCGGACATCTTTTGACAAGATCTTGCTGAGATGGGAGCCGTACTGGCCCCCCGACTTCCGA GACCTCTTGGGGTTCATGCTGTTCTACAAAGAGGCCCCTTATCAGAATGTGACGGAGTTCGACGGGCAGGATGCA TGTGGTTCCAACAGTTGGACGGTGGTAGACATTGACCCACCCCTGAGGTCCAACGACCCCAAATCACAGAACCAC CCAGGGTGGCTGATGCGGGGTCTCAAGCCCTGGACCCAGTATGCCATCTTTGTGAAGACCCTGGTCACCTTTTCG GATGAACGCCGGACCTATGGGGCCAAGAGTGACATCATTTATGTCCAGACAGATGCCACCAACCCCTCTGTGCCC CTGGATCCAATCTCAGTGTCTAACTCATCATCCCAGATTATTCTGAAGTGGAAACCACCCTCCGACCCCAATGGC AACATCACCCACTACCTGGTTTTCTGGGAGAGGCAGGCGGAAGACAGTGAGCTGTTCGAGCTGGATTATTGCCTC TAGGGGCTGAAGCTGCCCTCGAGGACCTGGTCTCCACCATTCGAGTCTGAAGATTCTCAGAAGCACAACCAGAGT GAGTATGAGGATTCGGCCGGCGAATGCTGCTCCTGTCCAAAGACAGACTCTCAGATCCTGAAGGAGCTGGAGGAG TCCTCGTTTAGGAAGACGTTTGAGGATTACCTGCACAACGTGGTTTTCGTCCCCAGGCCATCTCGGAAACGCAGG TCCCTTGGCGATGTTGGGAATGTGACGGTGGCCGTGCCCACGGTGGCAGCTTTCCCCAACACTTCCTCGACCAGC GTGCCCACGAGTCCGGAGGAGCACAGGCCTTTTGAGAAGGTGGTGAACAAGGAGTCGCTGGTCATCTCCGGCTTG CGACACTTCACGGGCTATCGCATCGAGCTGCAGGCTTGCAACCAGGACACCCCTGAGGAACGGTGCAGTGTGGCA GCCTACGTCAGTGCGAGGACCATGCCTGAAGCCAAGGCTGATGACATTGTTGGCCCTGTGACGCATGAAATCTTT GAGAACAACGTCGTCCACTTGATGTGGCAGGAGCCGAAGGAGCCCAATGGTCTGATCGTGCTGTATGAAGTGAGT TATCGGCGATATGGTGATGAGGAGCTGCATCTCTGCGTCTCCCGCAAGCACTTCGCTCTGGAACGGGGCTGCAGG CTGCGTGGGCTGTCACCGGGGAACTACAGCGTGCGAATCCGGGCCACCTCCCTTGCGGGCAACGGCTCTTGGACG GAACCCACCTATTTCTACGTGACAGACTATTTAGACGTCCCGTCAAATATTGCAAAAATTATCATCGGCCCCCTC ATCTTTGTCTTTCTCTTCAGTGTTGTGATTGGAAGTATTTATCTATTCCTGAGAAAGAGGCAGCCAGATGGGCCG CTGGGACCGCTTTACGCTTCTTCAAACCCTGAGTATCTCAGTGCCAGTGATGTGTTTCCATGCTCTGTGTACGTG CCGGACGAGTGGGAGGTGTCTCGAGAGAAGATCACCCTCCTTCGAGAGCTGGGGCAGGGCTCCTTCGGCATGGTG TATGAGGGCAATGCCAGGGACATCATCAAGGGTGAGGCAGAGACCCGCGTGGCGGTGAAGACGGTCAACGAGTCA GCCAGTCTCCGAGAGCGGATTGAGTTCCTCAATGAGGCCTCGGTCATGAAGGGCTTCACCTGCCATCACGTGGTG CGCCTCCTGGGAGTGGTGTCCAAGGGCCAGCCCACGCTGGTGGTGATGGAGCTGATGGCTCACGGAGACCTGAAG AGCTACCTCCGTTCTCTGCGGCCAGAGGCTGAGAATAATCCTGGCCGCCCTCCCCCTACCCTTCAAGAGATGATT CAGATGGCGGCAGAGATTGCTGACGGGATGGCCTACCTGAACGCCAAGAAGTTTGTGCATCGGGACCTGGCAGCG AGAAACTGCATGGTCGCCCATGATTTTACTGTCAAAATTGGAGACTTTGGAATGACCAGAGACATCTATGAAACG GATTACTACCGGAAAGGGGGCAAGGGTCTGCTCCCTGTACGGTGGATGGCACCGGAGTCCCTGAAGGATGGGGTC TTCACCACTTCTTCTGACATGTGGTCCTTTGGCGTGGTCCTTTGGGAAATCACCAGCTTGGCAGAACAGCCTTAC CAAGGCCTGTCTAATGAACAGGTGTTGAAATTTGTCATGGATGGAGGGTATCTGGATCAACCCGACAACTGTCCA GAGAGAGTCACTGACCTCATGCGCATGTGCTGGCAATTCAACCCCAACATGAGGCCAACCTTCCTGGAGATTGTC AACCTGCTCAAGGACGACCTGCACCCCAGCTTTCCAGAGGTGTCGTTCTTCCACAGCGAGGAGAACAAGGCTCCC GAGAGTGAGGAGCTGGAGATGGAGTTTGAGGACATGGAGAATGTGCCCCTGGACCGTTCCTCGCACTGTCAGAGG GAGGAGGCGGGGGGCCGGGATGGAGGGTCCTCGCTGGGTTTCAAGCGGAGCTACGAGGAACACATCCCTTACACA CACATGAACGGAGGCAAGAAAAACGGGCGGATTCTGACCTTGCCTCGGTCCAATCCTTCCTGGGCCCGGGATCCA CCGGTCGCCACCGCGGAAGGATCCGTCGCCAGGCAGCCTGACCTCTTGACCTGCGACGATGAGCCGATCCATATC CCCGGTGCCATCCAACCGCATGGACTGCTGCTCGCCCTCGCCGCCGACATGACGATCGTTGCCGGCAGCGACAAC CTTCCCGAACTCACCGGACTGGCGATCGGCGCCCTGATCGGCCGCTCTGCGGCCGATGTCTTCGACTCGGAGACG CACAACCGTCTGACGATCGCCTTGGCCGAGCCCGGGGCGGCCGTCGGAGCACCGATCACTGTCGGCTTCACGATG CGAAAGGACGCAGGCTTCATCGGCTCCTGGCATCGCCATGATCAGCTCATCTTCCTCGAGCTCGAGCCTCCCCAG CGGGACGTCGCCGAGCCGCAGGCGTTCTTCCGCCGCACCAACAGCGCCATCCGCCGCCTGCAGGCCGCCGAAACC TTGGAAAGCGCCTGCGCCGCCGCGGCGCAAGAGGTGCGGAAGATTACCGGCTTCGATCGGGTGATGATCTATCGC TTCGCCTCCGACTTCAGCGGCGAAGTGATCGCAGAGGATCGGTGCGCCGAGGTCGAGTCAAAACTAGGCCTGCAC TATCCTGCCTCAACCGTGCCGGCGCAGGCCCGTCGGCTCTATACCATCAACCCGGTACGGATCATTCCCGATATC AATTATCGGCCGGTGCCGGTCACCCCAGACCTCAATCCGGTCACCGGGCGGCCGATTGATCTTAGCTTCGCCATC CTGCGCAGCGTCTCGCCCGTCCATCTGGAATTCATGCGCAACATAGGCATGCACGGCACGATGTCGATCTCGATT TTGCGCGGCGAGCGACTGTGGGGATTGATCGTTTGCCATCACCGAACGCCGTACTACGTCGATCTCGATGGCCGC CAAGCCTGCGAGCTAGTCGCCCAGGTTCTGGCCTGGCAGATCGGCGTGATGGAAGAGTAAGCGGCCGCGACTCTA GATCATAATCAGCACATGAGGATCACCCATGTCTGCAGGTCGACTCTAGAAAACATGAGGATCACCCATGT
(SEQ ID NO : 213 )
Protein: (X indicates non-canonical amino acid)
MGTGGRRGAAAAPLLVAVAALLLGAAGHLYPGEVCPGMDIRNNLTRLHELENCSVIEGHLQILLMFKTRPEDFRD LSFPKLIMITDYLLLFRVYGLESLKDLFPNLTVIRGSRLFFNYALVIFEMVHLKELGLYNLMNITRGSVRIEKNN ELCYLATIDWSRILDSVEDNYIVLNKDDNEECGDICPGTAKGKTNCPATVINGQFVERCWTHSHCQKVCPTICKS HGCTAEGLCCHSECLGNCSQPDDPTKCVACRNFYLDGRCVETCPPPYYHFQDWRCWFSFCQDLHHKCKNSRRQG CHQYVIHNNKCIPECPSGYTMNSSNLLCTPCLGPCPKVCHLLEGEKTIDSVTSAQELRGCTVINGSLIINIRGGN NLAAELEANLGLIEEISGYLKIRRSYALVSLSFFRKLRLIRGETLEIGNYSFYALDNQNLRQLWDWSKHNLTITQ GKLFFHYNPKLCLSEIHKMEEVSGTKGRQERNDIALKTNGDQASCENELLKFSYIRTSFDKILLRWEPYWPPDFR DLLGFMLFYKEAPYQNVTEFDGQDACGSNSWTWDIDPPLRSNDPKSQNHPGWLMRGLKPWTQYAIFVKTLVTFS DERRTYGAKSDIIYVQTDATNPSVPLDPISVSNSSSQIILKWKPPSDPNGNITHYLVFWERQAEDSELFELDYCL XGLKLPSRTWSPPFESEDSQKHNQSEYEDSAGECCSCPKTDSQILKELEESSFRKTFEDYLHNWFVPRPSRKRR SLGDVGNVTVAVPTVAAFPNTSSTSVPTSPEEHRPFEKVWKESLVISGLRHFTGYRIELQACNQDTPEERCSVA AYVSARTMPEAKADDIVGPVTHEIFENNWHLMWQEPKEPNGLIVLYEVSYRRYGDEELHLCVSRKHFALERGCR LRGLSPGNYSVRIRATSLAGNGSWTEPTYFYVTDYLDVPSNIAKIIIGPLIFVFLFSWIGSIYLFLRKRQPDGP LGPLYASSNPEYLSASDVFPCSVYVPDEWEVSREKITLLRELGQGSFGMVYEGNARDIIKGEAETRVAVKTWES ASLRERIEFLNEASVMKGFTCHHWRLLGWSKGQPTLWMELMAHGDLKSYLRSLRPEAENNPGRPPPTLQEMI QMAAEIADGMAYLNAKKFVHRDLAARNCMVAHDFTVKIGDFGMTRDIYETDYYRKGGKGLLPVRWMAPESLKDGV FTTSSDMWSFGWLWEITSLAEQPYQGLSNEQVLKFVMDGGYLDQPDNCPERVTDLMRMCWQFNPNMRPTFLEIV NLLKDDLHPSFPEVSFFHSEENKAPESEELEMEFEDMENVPLDRSSHCQREEAGGRDGGSSLGFKRSYEEHIPYT HMNGGKKNGRILTLPRSNPSWARDPPVATAEGSVARQPDLLTCDDEPIHIPGAIQPHGLLLALAADMTIVAGSDN LPELTGLAIGALIGRSAADVFDSETHNRLTIALAEPGAAVGAPITVGFTMRKDAGFIGSWHRHDQLIFLELEPPQ RDVAEPQAFFRRTNSAIRRLQAAETLESACAAAAQEVRKITGFDRVMIYRFASDFSGEVIAEDRCAEVESKLGLH YPASTVPAQARRLYTINPVRIIPDINYRPVPVTPDLNPVTGRPIDLSFAILRSVSPVHLEFMRNIGMHGTMSISI LRGERLWGLIVCHHRTPYYVDLDGRQACELVAQVLAWQIGVMEE (SEQ ID NO: 214)
Sequences - Set 2 l. Farther Components :
mCherry190TAG-2xPP7 mCherry with amber site and 2 PP7 loops ( TAG codon, PP7 loops)
DNA:
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGTGAGCAAGGGCGAG GAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAG TTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGT GGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCC GCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGAC GGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGC ACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAGACGATGGGCTGGGAGGCCTCCTCCGAGCGGATGTAC CCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAG GTCAAGACCACCTACAAGGCCAAGEitSCCCGTGCAGCTGCCCGGCGCCTACAACGTCAACATCAAGTTGGACATC ACCTCCCACAACGAGGACTACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATG GACGAGCTGTACAAGCATCATCATCATCATCATTAAAGATCCTAAGGTACAATTGCCTAGAAAGGAGCAGACGAT ATGGCGTCGCTCCCTGCAGGTCGACTCTAGAAACCAGCAGAGCATATGGGCTCGCTGG (SEQ ID NO: 215)
Protein :
MGRLESTPPKKKRKVEDSASVSKGEEDNMAIIKEFMRFKVHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKG GPLPFAWDILSPQFMYGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGWTVTQDSSLQDGEFIYKVKLRG TNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKAKXPVQLPGAYNVNIKLDI TSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH* ( SEQ ID NO: 216)
mCherry190TAG-4xPP7 mCherry with amber site and 4 PP7 loops ( TAG codon, PP7 loops)
DNA:
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGTGAGCAAGGGCGAG GAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAG TTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGT GGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCC GCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGAC GGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGC ACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAGACGATGGGCTGGGAGGCCTCCTCCGAGCGGATGTAC CCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAG GTCAAGACCACCTACAAGGCCAAGTAGCCCGTGCAGCTGCCCGGCGCCTACAACGTCAACATCAAGTTGGACATC ACCTCCCACAACGAGGACTACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATG GACGAGCTGTACAAGCATCATCATCATCATCATTAAAGATCCTAAGGTACAATTGCCTAGAAAGGAGCAGACGAT ATGGCGTCGCTCCCTGCAGGTCGACTCTAGAAACCAGCAGAGCATATGGGCTCGCTGGCTGGGTACAATTGCCTA GAAAGGAGCAGACGATATGGCGTCGCTCCCTGCAGGTCGACTCTAGAAACCAGCAGAGCATATGGGCTCGCTGG
(SEQ ID NO: 217)
Protein :
MGRLESTPPKKKRKVEDSASVSKGEEDNMAIIKEFMRFKVHMEGSWGHEFEIEGEGEGRPYEGTQTAKLKVTKG GPLPFAWDILSPQFMYGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGWTVTQDSSLQDGEFIYKVKLRG TNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKAKJiPVQLPGAYNVNIKLDI TSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH* ( SEQ ID NO: 218)
mCherry190TAG-6xPP7 mCherry with amber site and 6 PP7 loops (TAG codon , PP7 loops)
DNA:
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGTGAGCAAGGGCGAG
GAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAG
TTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGT GGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCC GCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAGGGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGAC GGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGC ACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAGACGATGGGCTGGGAGGCCTCCTCCGAGCGGATGTAC CCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAG GTCAAGACCACCTACAAGGCCAAGiAGCCCGTGCAGCTGCCCGGCGCCTACAACGTCAACATCAAGTTGGACATC ACCTCCCACAACGAGGACTACACCATCGTGGAACAGTACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATG GACGAGCTGTACAAGCATCATCATCATCATCATTAAAGATCCTAAGGTACAATTGCCTAGAAAGGAGCAGACGAT ATGGCGTCGCTCCCTGCAGGTCGACTCTAGAAACCAGCAGAGCATATGGGCTCGCTGGCTGGGTACAATTGCCTA GAAAGGAGCAGACGATATGGCGTCGCTCCCTGCAGGTCGACTCTAGAAACCAGCAGAGCATATGGGCTCGCTGGC TGGGTACAATTGCCTAGAAAGGAGCAGACGATATGGCGTCGCTCCCTGCAGGTCGACTCTAGAAACCAGCAGAGC
ATATGGGCTCGCTGG (SEQ ID NO: 219)
Protein :
MGRLESTPPKKKRKVEDSASVSKGEEDNMAIIKEFMRFKVHMEGSWGHEFEIEGEGEGRPYEGTQTAKLKVTKG GPLPFAWDILSPQFMYGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGWTVTQDSSLQDGEFIYKVKLRG TNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKAKXPVQLPGAYNVNIKLDI TSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH* (SEQ ID NO: 220)
H2B-mCherry190TAG-2xMS2 Human Histone H2B type 1-J ( Unipxot : P06899) fused to mCherry with amber site and 2x ms2-loops
Figure imgf000203_0001
DNA:
ATGCCAGAGCCAGCGAAGTCTGCTCCCGCCCCGAAAAAGGGCTCCAAGAAGGCGGTGACTAAGGCGCAGAAGAAA GGCGGCAAGAAGCGCAAGCGCAGCCGCAAGGAGAGCTATTCCATCTATGTGTACAAGGTTCTGAAGCAGGTCCAC CCTGACACCGGCATTTCGTCCAAGGCCATGGGCATCATGAATTCGTTTGTGAACGACATTTTCGAGCGCATCGCA GGTGAGGCTTCCCGCCTGGCGCATTACAACAAGCGCTCGACCATCACCTCCAGGGAGATCCAGACGGCCGTGCGC CTGCTGCTGCCTGGGGAGTTGGCCAAGCACGCCGTGTCCGAGGGTACTAAGGCCATCACCAAGTACACCAGCGCT AAGGATCCACCGGTCGCCACCATGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGC TTCAAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTAC GAGGGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCT CAGTTCATGTACGGCTCCAAGGCCTACGTGAAGCACCCCGCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCC GAGGGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTG CAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAG AAGACGATGGGCTGGGAGGCCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAG AGGCTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACAAGGCCAAGEiGCCCGTGCAG CTGCCCGGCGCCTACAACGTCAACATCAAGTTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGAACAG TACGAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGCTGTACAAGCATCATCATCATCATCATTAA CTGCACCTCGACAC
Figure imgf000203_0002
Protein :
MPEPAKSAPAPKKGSKKAVTKAQKKGGKKRKRSRKESYSIYVYKVLKQVHPDTGISSKAMGIMNSFVNDIFERIA GEASRLAHYNKRSTITSREIQTAVRLLLPGELAKHAVSEGTKAITKYTSAKDPPVATMVSKGEEDNMAIIKEFMR FKVHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKGGPLPFAWDILSPQFMYGSKAYVKHPADIPDYLKLSFP EGFKWERVMNFEDGGWTVTQDSSLQDGEFIYKVKLRGTNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQ RLKLKDGGHYDAEVKTTYKAKXPVQLPGAYNVNIKLDITSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH* (SEQ ID NO: 222)
IFRS1 Methanosarcina mazei PylRS (L305M, Y306L, L309S, N346S, C348M)
DNA:
GACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATC AAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGC CGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGAT GAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCT ACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAG GCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGT GTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATT ACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTG CTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGT CGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACC CGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATG GGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTT GCACCAAATATGCTGAACTATAGCCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGC CCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGTCTTTTATGCAAATGGGT TCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATT GTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTT GTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGC CTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCT ACGAACCTGTAA (SEQ ID NO: 223)
Protein :
DKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSD EDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPAS VSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSR RKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPML APNMLNYSRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLSFMQMGSGCTRENLESIITDFLNHLGIDFKI VGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGIS TNL* (SEQ ID NO: 224)
CbzRS Methanosarcina maze! PylRS (Y306M, L309G, C348T)
DNA:
GACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATC AAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGC CGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGAT GAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCT ACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAG GCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGT GTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATT ACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTG CTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGT CGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACC CGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATG GGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTT GCACCAAATCTGATGAACTATGGACGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGC CCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTACACAAATGGGT TCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATT GTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTT GTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGC CTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCT ACGAACCTGTAA (SEQ ID NO: 225)
Protein :
DKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSD EDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPAS VSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSR RKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPML APNLMNYGRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFTQMGSGCTRENLESIITDFLNHLGIDFKI VGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGIS TNL* (SEQ ID NO: 226)
CpkRS Methanosarcina maze! PylRS (A302S)
DNA:
GATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATC
AAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGC CGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGAT GAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCT ACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAG GCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGT GTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATT ACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTG CTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGT CGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACC CGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATG GGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTT TCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGC CCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGT TCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATT GTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTT GTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGC CTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCT ACGAACCTGTAA (SEQ ID NO: 227)
Protein :
DKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSD EDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPAS VSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSR RKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPML SPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKI VGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGIS TNL* (SEQ ID NO: 228)
tRNAPYl ' CGA Pyrrolysyl tRNA (for Serine codon , anticodon bold)
(Methanosarcina mazei)
DNA: GGAAACCTGATCATGTAGATCGAATGGACTCGAAATCCGTTCAGCCGGGTTAGATTCCCGGGGTTTCCG (SEQ ID NO: 229)
tRNAPYl ' CGG Pyrrolysyl tRNA (for Proline codon , anticodon in bold)
(Methanosarcina mazei)
DNA: GGAAACCTGATCATGTAGATCGAATGGACTCGGAATCCGTTCAGCCGGGTTAGATTCCCGGGGTTTCCG (SEQ ID NO: 230)
tRNAPYl ' UAA Pyrrolysyl tRNA (for Leucine codon , anticodon in bold)
(Methanosarcina mazei)
DNA: GGAAACCTGATCATGTAGATCGAATGGACTTAAAATCCGTTCAGCCGGGTTAGATTCCCGGGGTTTCCG
(SEQ ID NO: 231)
tRNAPYl ' UAG Pyrrolysyl tRNA (for Leucine codon , anticodon in bold)
(Methanosarcina mazei)
DNA: GGAAACCTGATCATGTAGATCGAATGGACTTAGAATCCGTTCAGCCGGGTTAGATTCCCGGGGTTTCCG
(SEQ ID NO: 232)
tRNAPYl ' CCG Pyrrolysyl tRNA (for Arginine codon , anticodon in bold)
(Methanosarcina mazei)
DNA: GGAAACCTGATCATGTAGATCGAATGGACTCCGAATCCGTTCAGCCGGGTTAGATTCCCGGGGTTTCCG (SEQ ID NO: 233)
tRNAPYl ' AUA Pyrrolysyl tRNA (for Isoleucine codon , anticodon in purple)
(Methanosarcina mazei)
DNA: GGAAACCTGATCATGTAGATCGAATGGACTATAAATCCGTTCAGCCGGGTTAGATTCCCGGGGTTTCCG
(SEQ ID NO: 234) OMeRS Pyrrolysyl tRNA Synthetase mutant: A302T, Y384F, N346V, C348W,
V401L (Methanosarcina mazei)
DNA:
ATGGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACC
CTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGT
TCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGT
GCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTG
ACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATG
CCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAA
TTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGT
ATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTT
CAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATC
AGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAA
ATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGC
TTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAA
CTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAACACCAAATCTGTATAACTAT
CTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCC
GACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGTCTTTTGGCAAATGGGTTCAGGTTGTACTCGTGAGAAC
CTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTG
TTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCCTTGTGGGCCCAATCCCGCTGGAT
CGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGAC
TTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA ( SEQ ID
NO: 235)
Protein :
MACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTAR ALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSK FSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEI SLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTE LSKQIFRVDKNFCLRPMLTPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLVFWQMGSGCTREN LESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSALVGPIPLDREWGIDKPWIGAGFGLERLLKVKHD FKNIKRAARSESYYNGISTNL (SEQ ID NO: 236)
GFP66TAG GFp with Amber site
DNA: (MS2-stem loops, Amber codon)
ATGGGCCGCCTGGAGAGCACCCCCCCCAAGAAGAAGCGCAAGGTGGAGGACAGCGCCAGCGACTACAAGGACGAC GACGACAAGGTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTGGTGCCCATCCTGGTGGAGCTGGACGGCGACGTG AACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGACGCCACCTATGGCAAGCTGACCCTGAAGTTCATC TGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTGACCACCCTGACCTAGGGCGTGCAGTGTTTCAGC CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAGCGCACC ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAACAGC CACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC AACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGCGACCACATGGTGCTGCTGGAG TTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTGTACAAGCATCACCATCACCATCACTAAGCGGCC GCATCGATTATAAGCTTTGTACACCTAGAAAACArGAGGATCACCCArG!TCTGCACCTCGACACTAGAAAACArG AGGATCACCCATGT (SEQ ID NO: 237)
Protein: X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVSKGEELFTGWPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFI CTTGKLPVPWPTLVTTLTJfGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNR IELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKA FKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPD NHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHH ( SEQ ID NO: 238)
GFP66TCG GFP with Serine site
DNA:
Figure imgf000206_0001
Serine codon) ATGGGCCGCCTGGAGAGCACCCCCCCCAAGAAGAAGCGCAAGGTGGAGGACAGCGCCAGCGACTACAAGGACGAC GACGACAAGGTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTGGTGCCCATCCTGGTGGAGCTGGACGGCGACGTG AACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGACGCCACCTATGGCAAGCTGACCCTGAAGTTCATC TGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTGACCACCCTGACCTCGGGCGTGCAGTGTTTCAGC CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAGCGCACC ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAACAGC CACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC AACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGCGACCACATGGTGCTGCTGGAG TTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTGTACAAGCATCACCATCACCATCACTAAGCGGCC GCATCGATTATAAGCTTTGTACACCTAGAAAACArGAGGATCACCCArG!TCTGCACCTCGACACTAGAAAACArG AGGATCACCCATGT (SEQ ID NO: 239)
Protein: X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVSKGEELFTGWPILVELDGDWGHKFSVSGEGEGDATYGKLTLKFI CTTGKLPVPWPTLVTTLTXGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLWR IELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPD NHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHH ( SEQ ID NO: 240)
GFP66CCG GFP with Proline site
DNA: ( , Proline codon)
ATGGGCCGCCTGGAGAGCACCCCCCCCAAGAAGAAGCGCAAGGTGGAGGACAGCGCCAGCGACTACAAGGACGAC GACGACAAGGTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTGGTGCCCATCCTGGTGGAGCTGGACGGCGACGTG AACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGACGCCACCTATGGCAAGCTGACCCTGAAGTTCATC TGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTGACCACCCTGACCCCGGGCGTGCAGTGTTTCAGC CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAGCGCACC ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAACAGC CACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC AACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGCGACCACATGGTGCTGCTGGAG TTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTGTACAAGCATCACCATCACCATCACTAAGCGGCC GCATCGATTATAAGCTTTGTACACCTAGAAA , ,
Figure imgf000207_0001
CTGCACCTCGACACTAGAAAACAfFS AGGATCACCCATGT (SEQ ID NO: 241)
Protein : X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVSKGEELFTGWPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFI CTTGKLPVPWPTLVTTLTXGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNR IELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPD NHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHH (SEQ ID NO: 242)
GFP66CTA GFp with Leucine site
DNA: (MS2-stem loops, Leucine codon)
ATGGGCCGCCTGGAGAGCACCCCCCCCAAGAAGAAGCGCAAGGTGGAGGACAGCGCCAGCGACTACAAGGACGAC GACGACAAGGTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTGGTGCCCATCCTGGTGGAGCTGGACGGCGACGTG AACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGACGCCACCTATGGCAAGCTGACCCTGAAGTTCATC TGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTGACCACCCTGACCCTAGGCGTGCAGTGTTTCAGC CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAGCGCACC ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAACAGC CACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC AACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGCGACCACATGGTGCTGCTGGAG TTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTGTACAAGCATCACCATCACCATCACTAAGCGGCC GCATCGATTATAAGCTTTGTACACCTAGAAAACArGAGGATCACCCArGTCTGCACCTCGACACTAGAAAACATC AGGATCACCCATGT (SEQ ID NO: 243) Protein: X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVSKGEELFTGWPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFI CTTGKLPVPWPTLVTTLTXGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNR IELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPD NHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHH (SEQ ID NO: 244)
GFP66TTA GFp with Leucine site
DNA: , Leucine codon)
ATGGGCCGCCTGGAGAGCACCCCCCCCAAGAAGAAGCGCAAGGTGGAGGACAGCGCCAGCGACTACAAGGACGAC GACGACAAGGTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTGGTGCCCATCCTGGTGGAGCTGGACGGCGACGTG AACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGACGCCACCTATGGCAAGCTGACCCTGAAGTTCATC TGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTGACCACCCTGACCTTAGGCGTGCAGTGTTTCAGC CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAGCGCACC ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAACAGC CACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC AACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGCGACCACATGGTGCTGCTGGAG TTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTGTACAAGCATCACCATCACCATCACTAAGCGGCC GCATCGATTATAAGCTTTGTACACCTAGAAAACATGAGGATCACCCATGTCTGCACCTCGACACTAGAAAACATG AGGATCACCCATGT (SEQ ID NO: 245)
Protein: X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVSKGEELFTGWPILVELDGDWGHKFSVSGEGEGDATYGKLTLKFI CTTGKLPVPWPTLVTTLTXGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLWR IELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPD NHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHH (SEQ ID NO: 246)
GFP66ATA GFP with Isoleucine site
DNA: (MS2-stem loops, Isoleucine codon)
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGATTACAAAGATGAT GATGATAAAGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTA AACGGCCACAAGTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTATGGCAAGCTGACCCTGAAGTTCATC TGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCATAGGCGTGCAGTGCTTCAGC CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACC ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGC CACAACGTCTATATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC AACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAG TTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGCATCACCATCACCATCACTAAGCGGCC GCATCGATTATAAGCTTTGTACACCTAGAAAACArGAGGArCACCCAlTGrCTGCACCTCGACACTAGAAAACAirG AGGATCACCCATGT (SEQ ID NO: 247)
Protein : X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVSKGEELFTGWPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFI CTTGKLPVPWPTLVTTLTXGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNR IELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPD NHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHH (SEQ ID NO: 248)
GFP66CGG GFp with Arginine site
DNA: , Arginine codon)
ATGGGCCGCCTGGAGAGCACCCCCCCCAAGAAGAAGCGCAAGGTGGAGGACAGCGCCAGCGACTACAAGGACGAC
GACGACAAGGTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTGGTGCCCATCCTGGTGGAGCTGGACGGCGACGTG
AACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGACGCCACCTATGGCAAGCTGACCCTGAAGTTCATC
TGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTGACCACCCTGACCCGGGGCGTGCAGTGTTTCAGC
CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAGCGCACC
ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAACAGC CACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC AACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGCGACCACATGGTGCTGCTGGAG TTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTGTACAAGCACCACCACCACCACCACTAAAAGCTT TGTACACCTAGAAAACATGAGGATCACCCATGTCTGCACCTCGACACTAGAAAACATGAGGATCACCCATGT
(SEQ ID NO: 249)
Protein : X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVSKGEELFTGWPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFI CTTGKLPVPWPTLVTTLTXGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNR IELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPD NHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHH (SEQ ID NO: 250)
GFP39TCG GFp with serine site
DNA: (MS2-stem loops, Serine codon)
ATGGGCCGCCTGGAGAGCACCCCCCCCAAGAAGAAGCGCAAGGTGGAGGACAGCGCCAGCGACTACAAGGACGAC GACGACAAGGTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTGGTGCCCATCCTGGTGGAGCTGGACGGCGACGTG AACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGACGCCACCTCGGGCAAGCTGACCCTGAAGTTCATC TGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTGACCACCCTGACCTACGGCGTGCAGTGTTTCAGC CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAGCGCACC ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAACAGC CACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC AACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGCGACCACATGGTGCTGCTGGAG TTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTGTACAAGCACCACCACCACCACCATAAGGATCCT AAGGATCCTAATTGCCTAGAAAACATGAGGATCACCCATGTCTGCAGGTCGACTCTAGAAAACATGAGGATCACC CATGT (SEQ ID NO: 251)
Protein: X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVSKGEELFTGWPILVELDGDWGHKFSVSGEGEGDATXGKLTLKFI CTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLWR IELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPD NHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHHK (SEQ ID NO: 252)
GFP39CCG GFP with Proline site
DNA: (MS2-sterm loops, Proline codon
ATGGGCCGCCTGGAGAGCACCCCCCCCAAGAAGAAGCGCAAGGTGGAGGACAGCGCCAGCGACTACAAGGACGAC GACGACAAGGTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTGGTGCCCATCCTGGTGGAGCTGGACGGCGACGTG AACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGACGCCACCCCGGGCAAGCTGACCCTGAAGTTCATC TGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTGACCACCCTGACCTACGGCGTGCAGTGTTTCAGC CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAGCGCACC ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAACAGC CACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC AACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGCGACCACATGGTGCTGCTGGAG TTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTGTACAAGCACCACCACCACCACCACTAAAGATCC TAAGGATCCTAATTGCCTAGAAAACATGAGGATCACCCATGTCTGCAGGTCGACTCTAGAAAACATGAGGATCAC CCATGT (SEQ ID NO: 253)
Protein : X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVSKGEELFTGWPILVELDGDVNGHKFSVSGEGEGDATXGKLTLKFI CTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNR IELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPD NHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHHK (SEQ ID NO: 254) GFP39CTA GFP with Leucine site
DNA: ( , Leucine codon)
ATGGGCCGCCTGGAGAGCACCCCCCCCAAGAAGAAGCGCAAGGTGGAGGACAGCGCCAGCGACTACAAGGACGAC GACGACAAGGTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTGGTGCCCATCCTGGTGGAGCTGGACGGCGACGTG AACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGACGCCACCCTAGGCAAGCTGACCCTGAAGTTCATC TGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTGACCACCCTGACCTACGGCGTGCAGTGTTTCAGC CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAGCGCACC ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAACAGC CACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC AACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGCGACCACATGGTGCTGCTGGAG TTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTGTACAAGCACCACCACCACCACCACTAAAGATCC TAAGGATCCTAATTGCCTAGAAAACATGAGGATCACCCATGTCTGCAGGTCGACTCTAGAAAACATGAGGATCAC CCATGT (SEQ ID NO: 255)
Protein : X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVSKGEELFTGWPILVELDGDVNGHKFSVSGEGEGDATXGKLTLKFI CTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNR IELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKA FKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPD NHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHHK (SEQ ID NO: 256)
GFP39CGG GFp with Arginine site
DNA: (MS2~stem loops, Arginine codon)
ATGGGCCGCCTGGAGAGCACCCCCCCCAAGAAGAAGCGCAAGGTGGAGGACAGCGCCAGCGACTACAAGGACGAC GACGACAAGGTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTGGTGCCCATCCTGGTGGAGCTGGACGGCGACGTG AACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGACGCCACCCGGGGCAAGCTGACCCTGAAGTTCATC TGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTGACCACCCTGACCTACGGCGTGCAGTGTTTCAGC CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAGCGCACC ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAACAGC CACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC AACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGCGACCACATGGTGCTGCTGGAG TTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTGTACAAGCACCACCACCACCACCACTAAAGATCC TAAGGATCCTAATTGCCTAGAAAACATGAGGATCACCCATGTCTGCAGGTCGACTCTAGAAAACATGAGGATCAC CCATGTC (SEQ ID NO: 257)
Protein : X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVSKGEELFTGWPILVELDGDVNGHKFSVSGEGEGDATXGKLTLKFI CTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNR IELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPD NHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHHK (SEQ ID NO: 258)
mCherry72TAG mCherry with Amber site
DNA: ( , Amber codon)
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGATTACAAAGATGAT GATGATAAAGTGGCGTACGCGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTC AAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAG GGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAG TTCATGTAGGGCTCCAAGGCCTACGTGAAGCACCCCGCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAG GGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAG GACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAG ACGATGGGCTGGGAGGCCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGG CTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACAAGGCCAAGAAGCCCGTGCAGCTG CCCGGCGCCTACAACGTCAACATCAAGTTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGAACAGTAC GAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGCTGTACAAGCATCACCATCACCATCACTAAGCG GCCGCATCGATTATAAGCTTTGTACACCTAGAAAACATGAGGATCACCCATGTCTGCACCTCGACACTAGAAAAC ATGAGGATCACCCATGT (SEQ ID NO: 259)
Protein: X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVAYAVSKGEEDNMAIIKEFMRFKVHMEGSWGHEFEIEGEGEGRPYE GTQTAKLKVTKGGPLPFAWDILSPQFMXGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGWTVTQDSSLQ DGEFIYKVKLRGTNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKAKKPVQL PGAYNWIKLDITSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH (SEQ ID NO: 260)
mCherry72TCG mCherry with Serine site
DNA: ( stem loops, Serine codon)
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGATTACAAAGATGAT GATGATAAAGTGGCGTACGCGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTC AAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAG GGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAG TTCATGTCGGGCTCCAAGGCCTACGTGAAGCACCCCGCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAG GGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAG GACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAG ACGATGGGCTGGGAGGCCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGG CTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACAAGGCCAAGAAGCCCGTGCAGCTG CCCGGCGCCTACAACGTCAACATCAAGTTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGAACAGTAC GAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGCTGTACAAGCATCACCATCACCATCACTAAGCG GCCGCATCGATTATAAGCTTTGTACACCTAGAAAACArGAGGA!TCACCCArGTCTGCACCTCGACACTAGAAAAC ATGAGGATCACCCATGT (SEQ ID NO: 261)
Protein : X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVAYAVSKGEEDNMAIIKEFMRFKVHMEGSVNGHEFEIEGEGEGRPYE GTQTAKLKVTKGGPLPFAWDILSPQFMXGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGWTVTQDSSLQ DGEFIYKVKLRGTNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKAKKPVQL PGAYNVNIKLDITSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH (SEQ ID NO: 262)
mCherry72CCG mCherry with Proline site
DNA: (MS2-stem loops, Proline codon)
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGATTACAAAGATGAT GATGATAAAGTGGCGTACGCGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTC AAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAG GGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAG TTCATGCCGGGCTCCAAGGCCTACGTGAAGCACCCCGCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAG GGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAG GACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAG ACGATGGGCTGGGAGGCCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGG CTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACAAGGCCAAGAAGCCCGTGCAGCTG CCCGGCGCCTACAACGTCAACATCAAGTTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGAACAGTAC GAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGCTGTACAAGCATCACCATCACCATCACTAAGCG GCCGCATCGATTATAAGCTTTGTACACCTAGAAAACA!PGAGGArCACCCA!PGrCTGCACCTCGACACTAGAAAAC ATGAGGATCACCCATGT (SEQ ID NO: 263)
Protein: X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVAYAVSKGEEDNMAIIKEFMRFKVHMEGSVNGHEFEIEGEGEGRPYE GTQTAKLKVTKGGPLPFAWDILSPQFMXGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGWTVTQDSSLQ DGEFIYKVKLRGTNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKAKKPVQL PGAYNVNIKLDITSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH (SEQ ID NO: 264)
mCherry72CTA mCherry with Leucine site
DNA: ( , Leucine codon)
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGATTACAAAGATGAT
GATGATAAAGTGGCGTACGCGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTC
AAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAG GGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAG TTCATGCTAGGCTCCAAGGCCTACGTGAAGCACCCCGCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAG GGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAG GACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAG ACGATGGGCTGGGAGGCCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGG CTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACAAGGCCAAGAAGCCCGTGCAGCTG CCCGGCGCCTACAACGTCAACATCAAGTTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGAACAGTAC GAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGCTGTACAAGCATCACCATCACCATCACTAAGCG GCCGCATCGATTATAAGCT AC ATGAGGATCACCCATGT (
Figure imgf000212_0001
Protein: X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVAYAVSKGEEDNMAIIKEFMRFKVHMEGSWGHEFEIEGEGEGRPYE GTQTAKLKVTKGGPLPFAWDILSPQFMXGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGWTVTQDSSLQ DGEFIYKVKLRGTNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKAKKPVQL PGAYNWIKLDITSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH (SEQ ID NO: 266)
mCherry72TTA mCherry with Leucine site
DNA: (MS2-stem loops, Leucine codon)
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGATTACAAAGATGAT GATGATAAAGTGGCGTACGCGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTC AAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAG GGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAG TTCATGTTAGGCTCCAAGGCCTACGTGAAGCACCCCGCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAG GGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAG GACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAG ACGATGGGCTGGGAGGCCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGG CTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACAAGGCCAAGAAGCCCGTGCAGCTG CCCGGCGCCTACAACGTCAACATCAAGTTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGAACAGTAC GAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGCTGTACAAGCATCACCATCACCATCACTAAGCG GCCGCATCGATTATAAGCTTTGTACACCTAGAAAACATGAGGATCACCCARSTCTGCACCTCGACACTAGAAAAC ATGAGGATCACCCATGT (SEQ ID NO: 267)
Protein: X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVAYAVSKGEEDNMAIIKEFMRFKVHMEGSVNGHEFEIEGEGEGRPYE GTQTAKLKVTKGGPLPFAWDILSPQFMXGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGWTVTQDSSLQ DGEFIYKVKLRGTNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKAKKPVQL PGAYNVNIKLDITSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH (SEQ ID NO: 268)
mCherry72ATA mCherry with Isoleucine site
DNA: (MS2-stem loops, Isoleucine codon)
ATGGGCCGCCTGGAAAGCACCCCGCCGAAAAAAAAACGCAAAGTGGAAGATAGCGCGAGCGATTACAAAGATGAT GATGATAAAGTGGCGTACGCGGTGAGCAAGGGCGAGGAGGATAACATGGCCATCATCAAGGAGTTCATGCGCTTC AAGGTGCACATGGAGGGCTCCGTGAACGGCCACGAGTTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAG GGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGTGGCCCCCTGCCCTTCGCCTGGGACATCCTGTCCCCTCAG TTCATGATAGGCTCCAAGGCCTACGTGAAGCACCCCGCCGACATCCCCGACTACTTGAAGCTGTCCTTCCCCGAG GGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGACGGCGGCGTGGTGACCGTGACCCAGGACTCCTCCCTGCAG GACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGCACCAACTTCCCCTCCGACGGCCCCGTAATGCAGAAGAAG ACGATGGGCTGGGAGGCCTCCTCCGAGCGGATGTACCCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGAGG CTGAAGCTGAAGGACGGCGGCCACTACGACGCTGAGGTCAAGACCACCTACAAGGCCAAGAAGCCCGTGCAGCTG CCCGGCGCCTACAACGTCAACATCAAGTTGGACATCACCTCCCACAACGAGGACTACACCATCGTGGAACAGTAC GAACGCGCCGAGGGCCGCCACTCCACCGGCGGCATGGACGAGCTGTACAAGCATCACCATCACCATCACTAAGCG GCCGCATCGATTATAAGCTTTGTACACCTAGAAAACArGAGGATCACCCArGTCTGCACCTCGACACTAGAAAAC ATGAGGATCACCCATGT (SEQ ID NO: 269)
Protein: X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVAYAVSKGEEDNMAIIKEFMRFKVHMEGSVNGHEFEIEGEGEGRPYE
GTQTAKLKVTKGGPLPFAWDILSPQFMXGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGWTVTQDSSLQ DGEFIYKVKLRGTNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKAKKPVQL PGAYNVNIKLDITSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH (SEQ ID NO: 270)
mCherry185TCG mCherry with Serine site
DNA: (Serine codon)
ATGGGCCGCCTGGAGAGCACCCCCCCCAAGAAGAAGCGCAAGGTGGAGGACAGCGCCAGCGTGAGCAAGGGCGAG GAGGACAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCAGCGTGAACGGCCACGAG TTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGC GGCCCCCTGCCCTTCGCCTGGGACATCCTGAGCCCCCAGTTCATGTACGGCAGCAAGGCCTACGTGAAGCACCCC GCCGACATCCCCGACTACCTGAAGCTGAGCTTCCCCGAGGGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGAC GGCGGCGTGGTGACCGTGACCCAGGACAGCAGCCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGC ACCAACTTCCCCAGCGACGGCCCCGTGATGCAGAAGAAGACCATGGGCTGGGAGGCCAGCAGCGAGCGCATGTAC CCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGCGCCTGAAGCTGAAGGACGGCGGCCACTACGACGCCGAG GTGAAGACCACCTACAAGGCCAAGTCGCCCGTGCAGCTGCCCGGCGCCTACAACGTGAACATCAAGCTGGACATC ACCAGCCACAACGAGGACTACACCATCGTGGAGCAGTACGAGCGCGCCGAGGGCCGCCACAGCACCGGCGGCATG GACGAGCTGTACAAGCACCACCACCACCACCACTAA (SEQ ID NO: 271)
Protein: X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASVSKGEEDNMAIIKEFMRFKVHMEGSWGHEFEIEGEGEGRPYEGTQTAKLKVTKG GPLPFAWDILSPQFMYGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGWTVTQDSSLQDGEFIYKVKLRG TNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKAKXPVQLPGAYNWIKLDI TSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH (SEQ ID NO: 272)
mCherry185CCG mCherry with Proline site
DNA: (Proline codon)
ATGGGCCGCCTGGAGAGCACCCCCCCCAAGAAGAAGCGCAAGGTGGAGGACAGCGCCAGCGTGAGCAAGGGCGAG GAGGACAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCAGCGTGAACGGCCACGAG TTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGC GGCCCCCTGCCCTTCGCCTGGGACATCCTGAGCCCCCAGTTCATGTACGGCAGCAAGGCCTACGTGAAGCACCCC GCCGACATCCCCGACTACCTGAAGCTGAGCTTCCCCGAGGGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGAC GGCGGCGTGGTGACCGTGACCCAGGACAGCAGCCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGC ACCAACTTCCCCAGCGACGGCCCCGTGATGCAGAAGAAGACCATGGGCTGGGAGGCCAGCAGCGAGCGCATGTAC CCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGCGCCTGAAGCTGAAGGACGGCGGCCACTACGACGCCGAG GTGAAGACCACCTACAAGGCCAAGCCGCCCGTGCAGCTGCCCGGCGCCTACAACGTGAACATCAAGCTGGACATC ACCAGCCACAACGAGGACTACACCATCGTGGAGCAGTACGAGCGCGCCGAGGGCCGCCACAGCACCGGCGGCATG GACGAGCTGTACAAGCACCACCACCACCACCACTAA (SEQ ID NO: 273)
Protein: X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASVSKGEEDNMAIIKEFMRFKVHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKG GPLPFAWDILSPQFMYGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGWTVTQDSSLQDGEFIYKVKLRG TNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKAKXPVQLPGAYNVNIKLDI TSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH (SEQ ID NO: 274)
mCherry185CTA mCherry with Leucine site
DNA: ( Leucine codon)
ATGGGCCGCCTGGAGAGCACCCCCCCCAAGAAGAAGCGCAAGGTGGAGGACAGCGCCAGCGTGAGCAAGGGCGAG GAGGACAACATGGCCATCATCAAGGAGTTCATGCGCTTCAAGGTGCACATGGAGGGCAGCGTGAACGGCCACGAG TTCGAGATCGAGGGCGAGGGCGAGGGCCGCCCCTACGAGGGCACCCAGACCGCCAAGCTGAAGGTGACCAAGGGC GGCCCCCTGCCCTTCGCCTGGGACATCCTGAGCCCCCAGTTCATGTACGGCAGCAAGGCCTACGTGAAGCACCCC GCCGACATCCCCGACTACCTGAAGCTGAGCTTCCCCGAGGGCTTCAAGTGGGAGCGCGTGATGAACTTCGAGGAC GGCGGCGTGGTGACCGTGACCCAGGACAGCAGCCTGCAGGACGGCGAGTTCATCTACAAGGTGAAGCTGCGCGGC ACCAACTTCCCCAGCGACGGCCCCGTGATGCAGAAGAAGACCATGGGCTGGGAGGCCAGCAGCGAGCGCATGTAC CCCGAGGACGGCGCCCTGAAGGGCGAGATCAAGCAGCGCCTGAAGCTGAAGGACGGCGGCCACTACGACGCCGAG GTGAAGACCACCTACAAGGCCAAGCTACCCGTGCAGCTGCCCGGCGCCTACAACGTGAACATCAAGCTGGACATC ACCAGCCACAACGAGGACTACACCATCGTGGAGCAGTACGAGCGCGCCGAGGGCCGCCACAGCACCGGCGGCATG GACGAGCTGTACAAGCACCACCACCACCACCACTAA (SEQ ID NO: 275)
Protein: X indicates non-canonical amino acid MGRLESTPPKKKRKVEDSASVSKGEEDNMAIIKEFMRFKVHMEGSVNGHEFEIEGEGEGRPYEGTQTAKLKVTKG GPLPFAWDILSPQFMYGSKAYVKHPADIPDYLKLSFPEGFKWERVMNFEDGGWTVTQDSSLQDGEFIYKVKLRG TNFPSDGPVMQKKTMGWEASSERMYPEDGALKGEIKQRLKLKDGGHYDAEVKTTYKAKXPVQLPGAYNVNIKLDI TSHNEDYTIVEQYERAEGRHSTGGMDELYKHHHHHH (SEQ ID NO: 276)
GFP39TCG LCK-GFP with Serine site
DNA: (11S2 -s cei& loops , Serine codon in red,
Figure imgf000214_0001
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTG GTGCCCATCCTGGTGGAGCTGGACGGCGACGTGAACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGAC GCCACCTCGGGCAAGCTGACCCTGAAGTTCATCTGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTG ACCACCCTGACCTACGGCGTGCAGTGTTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGC GCCATGCCCGAGGGCTACGTGCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAG GTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATC CTGGGCCACAAGCTGGAGTACAACTACAACAGCCACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATC AAGGCCAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACC CCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCC AACGAGAAGCGCGACCACATGGTGCTGCTGGAGTTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTG TACAAGCACCACCACCACCACCACTAAAAGCTTTGTACACC TAGAAAACATG&GGATCA-CCCATGTCTGCACCTC GACACTAGAAAACATGAGGATCACCCATGT (SEQ ID NO: 277)
Protein: X indicates non-canonical amino acid
MGCVCSSNPEGTELVSKGEELFTGWPILVELDGDWGHKFSVSGEGEGDATXGKLTLKFICTTGKLPVPWPTLV TTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLWRIELKGIDFKEDGNI LGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDP NEKRDHMVLLEFVTAAGITLGMDELYKHHHHHH (SEQ ID NO: 278)
GFP39CCG LCK-GFP with Proline site
DNA: , Proline codon , LCK )
Figure imgf000214_0003
GTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTG GTGCCCATCCTGGTGGAGCTGGACGGCGACGTGAACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGAC GCCACCCCGGGCAAGCTGACCCTGAAGTTCATCTGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTG ACCACCCTGACCTACGGCGTGCAGTGTTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGC GCCATGCCCGAGGGCTACGTGCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAG GTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATC CTGGGCCACAAGCTGGAGTACAACTACAACAGCCACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATC AAGGCCAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACC CCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCC AACGAGAAGCGCGACCACATGGTGCTGCTGGAGTTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTG TACAAGCACCA GACACTAGAAA
Figure imgf000214_0002
Protein: X indicates non-canonical amino acid
MGCVCSSNPEGTELVSKGEELFTGWPILVELDGDVNGHKFSVSGEGEGDATXGKLTLKFICTTGKLPVPWPTLV TTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNI LGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDP NEKRDHMVLLEFVTAAGITLGMDELYKHHHHHH (SEQ ID NO: 280)
GFP39CTA LCK-GFP with Leucine site
DNA: , Leucine codon , LCK)
ATGGGd yr-'i O TGCA-A 'AQCAACCCCGAGGGTACCGAGCTCGTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTG GTGCCCATCCTGGTGGAGCTGGACGGCGACGTGAACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGAC GCCACCCTAGGCAAGCTGACCCTGAAGTTCATCTGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTG ACCACCCTGACCTACGGCGTGCAGTGTTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGC GCCATGCCCGAGGGCTACGTGCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAG GTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATC CTGGGCCACAAGCTGGAGTACAACTACAACAGCCACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATC AAGGCCAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACC CCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCC AACGAGAAGCGCGACCACATGGTGCTGCTGGAGTTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTG TACAAGCACCACCACCACCACCACTAAAAGCTTTGTACACCTAGAAAACArGAGGATCACCCArG!TCTGCACCTC GACACTAGAAAACATGAGGATCACCCATGT (SEQ ID NO: 281)
Protein: X indicates non-canonical amino acid
MGCVCSSNPEGTELVSKGEELFTGWPILVELDGDWGHKFSVSGEGEGDATXGKLTLKFICTTGKLPVPWPTLV TTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLWRIELKGIDFKEDGNI LGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDP NEKRDHMVLLEFVTAAGITLGMDELYKHHHHHH (SEQ ID NO: 282)
Extended GFP39TCG GFP with Serine site at position 39 genetically fused to
Qjp66CCG
DNA: (MS2-stsm loops, Serine codon
Figure imgf000215_0001
ATGGGCCGCCTGGAGAGCACCCCCCCCAAGAAGAAGCGCAAGGTGGAGGACAGCGCCAGCGACTACAAGGACGAC GACGACAAGGTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTGGTGCCCATCCTGGTGGAGCTGGACGGCGACGTG AACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGACGCCACCTCGGGCAAGCTGACCCTGAAGTTCATC TGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTGACCACCCTGACCTACGGCGTGCAGTGTTTCAGC CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAGCGCACC ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAACAGC CACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC AACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGCGACCACATGGTGCTGCTGGAG TTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTGTACAAGGGAGCACCAGGAAGTGCTGGTTCTGCT GCTGGTAGTGGAGTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTGGTGCCCATCCTGGTGGAGCTGGACGGCGAC GTGAACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGACGCCACCTATGGCAAGCTGACCCTGAAGTTC ATCTGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTGACCACCCTGACCCCGGGCGTGCAGTGTTTC AGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAGCGC ACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAAC CGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAAC AGCCACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATC GAGGACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCC GACAACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGCGACCACATGGTGCTGCTG GAGTTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTGTACAAGCACCACCACCACCACCACTAAAAG CTTTGTACACCTAGAAAACATGAGGATCACCCATGTCTGCACCTCGACACTAGAAAACATGAGGATCACCCATGT
(SEQ ID NO: 283)
Protein: X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVSKGEELFTGWPILVELDGDVNGHKFSVSGEGEGDATXGKLTLKFI
CTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNR
IELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPD
NHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKGAPGSAGSAAGSGVSKGEELFTGWPILVELDGD
VNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLTPGVQCFSRYPDHMKQHDFFKSAMPEGYVQER
TIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNI
EDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHH
(SEQ ID NO: 284)
Extended GFP39CCG GFP with Proline site at position 39 genetically fused toGFP66TCG
Figure imgf000215_0002
ATGGGCCGCCTGGAGAGCACCCCCCCCAAGAAGAAGCGCAAGGTGGAGGACAGCGCCAGCGACTACAAGGACGAC GACGACAAGGTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTGGTGCCCATCCTGGTGGAGCTGGACGGCGACGTG AACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGACGCCACCCCGGGCAAGCTGACCCTGAAGTTCATC TGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTGACCACCCTGACCTACGGCGTGCAGTGTTTCAGC CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAGCGCACC ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAACAGC CACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC AACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGCGACCACATGGTGCTGCTGGAG TTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTGTACAAGGGAGCACCAGGAAGTGCTGGTTCTGCT GCTGGTAGTGGAGTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTGGTGCCCATCCTGGTGGAGCTGGACGGCGAC GTGAACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGACGCCACCTATGGCAAGCTGACCCTGAAGTTC ATCTGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTGACCACCCTGACCTCGGGCGTGCAGTGTTTC AGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAGCGC ACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAAC CGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAAC AGCCACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATC GAGGACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCC GACAACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGCGACCACATGGTGCTGCTG GAGTTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTGTACAAGCATCACCATCACCATCACTAAGCG GCCGCATCGATTATAAGCTTTGTACACCTAGAAAACATGAGGATCACCCATGTCTGCACCTCGACACTAGAAAAC ATGAGGATCACCCATGT (SEQ ID NO: 285)
Protein: X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVSKGEELFTGWPILVELDGDWGHKFSVSGEGEGDATXGKLTLKFI
CTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLWR
IELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPD
NHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKGAPGSAGSAAGSGVSKGEELFTGWPILVELDGD
WGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLTSGVQCFSRYPDHMKQHDFFKSAMPEGYVQER
TIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNI
EDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHH
(SEQ ID NO: 286)
Extended GFP39CTA GFP with Leucine site at position 39 genetically fused toGFP66TCG
DNA: (MS2-ste loops, Leucine codon,
Figure imgf000216_0001
ATGGGCCGCCTGGAGAGCACCCCCCCCAAGAAGAAGCGCAAGGTGGAGGACAGCGCCAGCGACTACAAGGACGAC GACGACAAGGTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTGGTGCCCATCCTGGTGGAGCTGGACGGCGACGTG AACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGACGCCACCCTAGGCAAGCTGACCCTGAAGTTCATC TGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTGACCACCCTGACCTACGGCGTGCAGTGTTTCAGC CGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAGCGCACC ATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGC ATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAACAGC CACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATCGAG GACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGAC AACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGCGACCACATGGTGCTGCTGGAG TTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTGTACAAGGGAGCACCAGGAAGTGCTGGTTCTGCT GCTGGTAGTGGAGTGAGCAAGGGCGAGGAGCTGTTCACCGGCGTGGTGCCCATCCTGGTGGAGCTGGACGGCGAC GTGAACGGCCACAAGTTCAGCGTGAGCGGCGAGGGCGAGGGCGACGCCACCTATGGCAAGCTGACCCTGAAGTTC ATCTGTACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTGGTGACCACCCTGACCTCGGGCGTGCAGTGTTTC AGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGAGCGCCATGCCCGAGGGCTACGTGCAGGAGCGC ACCATCTTCTTCAAGGACGACGGCAACTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAAC CGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGCCACAAGCTGGAGTACAACTACAAC AGCCACAACGTGTACATCATGGCCGACAAGCAGAAGAACGGCATCAAGGCCAACTTCAAGATCCGCCACAACATC GAGGACGGCAGCGTGCAGCTGGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCC GACAACCACTACCTGAGCACCCAGAGCGCCCTGAGCAAGGACCCCAACGAGAAGCGCGACCACATGGTGCTGCTG GAGTTCGTGACCGCCGCCGGCATCACCCTGGGCATGGACGAGCTGTACAAGCATCACCATCACCATCACTAAGCG GCCGCATCGATTATAAGCTTTGTACACCTAGAAAACArGAGGATCACCCArGTCTGCACCTCGACACTAGAAAAC ATGAGGATCACCCATGT (SEQ ID NO: 287)
Protein: X indicates non-canonical amino acid
MGRLESTPPKKKRKVEDSASDYKDDDDKVSKGEELFTGWPILVELDGDVNGHKFSVSGEGEGDAT.XGKLTLKFI
CTTGKLPVPWPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNR
IELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPD NHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKGAPGSAGSAAGSGVSKGEELFTGWPILVELDGD VNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTLTSGVQCFSRYPDHMKQHDFFKSAMPEGYVQER TIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKANFKIRHNI EDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITLGMDELYKHHHHHH
(SEQ ID NO: 288)
pp7 Bacteriophage PP7 RNA stem-loops: (version 1)
DNA:
GGAGCAGACGATATGGCGTCGCTCC (SEQ ID NO: 289) pp7 Bacteriophage PP7 RNA stem-loops: (version 2)
DNA sequence of
DNA:
CCAGCAGAGCATATGGGCTCGCTGG (SEQ ID NO: 290)
EBAG9 : Receptor-binding cancer antigen expressed on SiSo cells (Homo sapiens , Uniprot: 000559)
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGCAGAGGACGGAAATTAAGTGGAGACCAAATAACTTTGCCAACTACAGTTGATTATTCATCA GTTCCTAAGCAGACAGATGTTGAAGAGTGGACTTCCTGGGATGAAGATGCACCCACCAGTGTAAAGATCGAAGGA GGGAATGGGAATGTGGCAACACAACAAAATTCTTTGGAACAACTGGAACCTGACTATTTTAAGGACATGACACCA ACTATTAGGAAAACTCAGAAAATTGTTATTAAGAAGAGAGAACCATTGAATTTTGGCATCCCAGATGGGAGCACA GGTTTCTCTAGTAGATTAGCAGCTACACAAGATCTGCCTTTTATTCATCAGTCTTCTGAATTAGGTGACTTAGAT ACCTGGCAGGAAAATACCAATGCATGGGAAGAAGAAGAAGATGCAGCCTGGCAAGCAGAAGAAGTTCTGAGACAG CAGAAACTAGCAGACAGAGAAAAGAGAGCAGCCGAACAACAAAGGAAGAAAATGGAAAAGGAAGCACAACGGCTA ATGAAGAAGGAACAAAACAAAATTGGTGTGAAACTTTCA (SEQ ID NO: 291)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGRGRKLSGDQITLPTTVDYSSVPKQTDVEEWTSWDEDAPTSVKIEG GNGNVATQQNSLEQLEPDYFKDMTPTIRKTQKIVIKKREPLNFGIPDGSTGFSSRLAATQDLPFIHQSSELGDLD TWQENTNAWEEEEDAAWQAEEVLRQQKLADREKRAAEQQRKKMEKEAQRLMKKEQNKIGVKLS (SEQ ID NO: 292)
EBAG9I-29: Receptor-binding cancer antigen expressed on SiSo cells (Homo sapiens , Uniprot: 000559)
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCT (SEQ ID NO: 293)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRS (SEQ ID NO: 294)
CMP-SaTr/SLC35Al : CMP-sialic acid transporter (Homo sapiens , Uniprot: P78382)
DNA:
ATGGCTGCCCCGAGAGACAATGTCACTTTATTATTCAAGTTATACTGCTTGGCAGTGATGACCCTGATGGCTGCA GTCTATACCATAGCTTTAAGATACACAAGGACATCAGACAAAGAACTCTACTTTTCAACGACAGCCGTGTGTATC ACAGAAGTTATAAAGTTATTGCTAAGTGTGGGAATTTTAGCTAAAGAAACTGGTAGTCTGGGTAGATTCAAAGCA TCTTTAAGAGAAAATGTCTTGGGGAGCCCCAAGGAACTGTTGAAGTTAAGTGTGCCATCGTTAGTGTATGCTGTT CAGAACAACATGGCTTTCCTAGCTCTTAGCAATCTGGATGCAGCAGTGTACCAGGTGACCTACCAGTTGAAGATT CCGTGTACTGCTTTATGCACTGTTTTAATGTTAAATCGGACACTCAGCAAATTACAGTGGGTTTCAGTTTTTATG CTGTGTGCTGGAGTTACGCTTGTACAGTGGAAACCAGCCCAAGCTACAAAAGTGGTGGTGGAACAAAATCCATTA TTAGGGTTTGGCGCTATAGCTATTGCTGTATTGTGCTCAGGATTTGCAGGAGTATATTTTGAAAAAGTTTTAAAG AGTTCAGATACTTCTCTTTGGGTGAGAAACATTCAAATGTATCTATCAGGGATTATTGTGACATTAGCTGGCGTC TACTTGTCAGATGGAGCTGAAATTAAAGAAAAAGGATTTTTCTATGGTTACACATATTATGTCTGGTTTGTCATC TTTCTTGCAAGTGTTGGTGGCCTCTACACTTCTGTTGTGGTTAAGTACACAGACAACATCATGAAAGGCTTTTCT GCAGCAGCGGCCATTGTCCTTTCCACCATTGCTTCAGTAATGCTGTTTGGATTACAGATAACACTCACCTTTGCC CTGGGTACTCTTCTTGTATGTGTTTCCATATATCTCTATGGATTACCCAGACAAGACACTACATCCATCCAACAA GGAGAAACAGCTTCAAAGGAGAGAGTTATTGGTGTG (SEQ ID NO: 295)
Protein :
MAAPRDNVTLLFKLYCLAVMTLMAAVYTIALRYTRTSDKELYFSTTAVCITEVIKLLLSVGILAKETGSLGRFKA SLRENVLGSPKELLKLSVPSLVYAVQNNMAFLALSNLDAAVYQVTYQLKIPCTALCTVLMLNRTLSKLQWVSVFM LCAGVTLVQWKPAQATKVWEQNPLLGFGAIAIAVLCSGFAGVYFEKVLKSSDTSLWVRNIQMYLSGIIVTLAGV YLSDGAEIKEKGFFYGYTYYVWFVIFLASVGGLYTSVWKYTDNIMKGFSAAAAIVLSTIASVMLFGLQITLTFA LGTLLVCVSIYLYGLPRQDTTSIQQGETASKERVIGV (SEQ ID NO: 296)
P450 2C1I-27 : Cytochrome P450 2C1 (Oryctolagus cuniculus , Uniprot: P00180)
DNA:
ATGGACCCCGTGGTCGTGCTGGGCCTGTGCCTGTCATGCCTGCTGCTGCTGAGCCTGTGGAAGCAGAGCTACGGC GGAGGC (SEQ ID NO: 297)
Protein :
MDPVWLGLCLSCLLLLSLWKQSYGGG (SEQ ID NO: 298)
P450 2C1I-29 : Cytochrome P450 2C1 (Oryctolagus cuniculus, Uniprot: P00180)
DNA:
ATGGACCCCGTGGTCGTGCTGGGCCTGTGCCTGTCATGCCTGCTGCTGCTGAGCCTGTGGAAGCAGAGCTACGGC GGAGGCAAGCTG (SEQ ID NO: 299)
Protein :
MDPVWLGLCLSCLLLLSLWKQSYGGGKL (SEQ ID NO: 300)
EB1 : Microtubule-associated protein RP/EB family member 1 (Homo sapiens ,
Uniprot: Q8WQ86)
DNA:
ATGGCAGTGAACGTATACTCAACGTCAGTGACCAGTGATAACCTAAGTCGACATGACATGCTGGCCTGGATCAAT GAGTCTCTGCAGTTGAATCTGACAAAGATCGAACAGTTGTGCTCAGGGGCTGCGTATTGTCAGTTTATGGACATG CTGTTCCCTGGCTCCATTGCCTTGAAGAAAGTGAAATTCCAAGCTAAGCTAGAACACGAGTACATCCAGAACTTC AAAATACTACAAGCAGGTTTTAAGAGAATGGGTGTTGACAAAATAATTCCTGTGGACAAATTAGTAAAAGGAAAG TTTCAGGACAATTTTGAATTCGTTCAGTGGTTCAAGAAGTTTTTCGATGCAAACTATGATGGAAAAGACTATGAC CCTGTGGCTGCCAGACAAGGTCAAGAAACTGCAGTGGCTCCTTCCCTTGTTGCTCCAGCTCTGAATAAACCGAAG AAACCTCTCACTTCTAGCAGTGCAGCTCCCCAGAGGCCCATCTCAACACAGAGAACCGCTGCGGCTCCTAAGGCT GGCCCTGGTGTGGTGCGAAAGAACCCTGGTGTGGGCAACGGAGATGACGAGGCAGCTGAGTTGATGCAGCAGGTC AACGTATTGAAACTTACTGTTGAAGACTTGGAGAAAGAGAGGGATTTCTACTTCGGAAAGCTACGGAACATTGAA TTGATTTGCCAGGAGAACGAGGGGGAAAACGACCCTGTATTGCAGAGGATTGTAGACATTCTGTATGCCACAGAT GAAGGCTTTGTGATACCTGATGAAGGGGGCCCACAGGAGGAGCAAGAAGAGTAT (SEQ ID NO: 301)
Protein :
MAVNVYSTSVTSDNLSRHDMLAWINESLQLNLTKIEQLCSGAAYCQFMDMLFPGSIALKKVKFQAKLEHEYIQNF KILQAGFKRMGVDKIIPVDKLVKGKFQDNFEFVQWFKKFFDANYDGKDYDPVAARQGQETAVAPSLVAPALNKPK KPLTSSSAAPQRPISTQRTAAAPKAGPGWRKNPGVGNGDDEAAELMQQVNVLKLTVEDLEKERDFYFGKLRNIE LICQENEGENDPVLQRIVDILYATDEGFVIPDEGGPQEEQEEY (SEQ ID NO: 302)
CGI: Nucleoporin NUP42 (Homo sapiens , Uniprot: 015504)
DNA: ATGGCCATTTGTCAATTCTTCCTTCAAGGCCGGTGCCGCTTTGGAGATCGGTGCTGGAACGAACATCCCGGTGCT AGGGGTGCAGGAGGAGGACGGCAGCAACCGCAGCAGCAGCCTTCAGGTAATAATAGACGTGGATGGAATACAACT AGCCAGAGATATTCCAATGTCATCCAGCCATCCAGTTTCTCCAAATCCACACCATGGGGGGGCAGCAGAGATCAA GAAAAGCCATATTTCAGTTCTTTTGATTCTGGAGCTTCAACTAACAGGAAGGAAGGCTTTGGATTGTCTGAGAAC CCATTTGCTTCACTTAGTCCTGATGAGCAGAAAGATGAAAAGAAACTTCTGGAAGGAATTGTAAAAGATATGGAG GTTTGGGAATCATCAGGGCAGTGGATGTTTTCTGTTTATTCACCAGTGAAAAAGAAACCTAATATTTCAGGTTTT ACAGACATTTCACCAGAGGAATTGAGGCTTGAATACCATAACTTCTTAACCAGCAATAACTTACAGAGTTATCTA AATTCTGTCCAACGTTTAATAAATCAATGGAGGAACAGGGTAAATGAACTGAAAAGTCTAAATATATCAACTAAA GTAGCTTTGCTCTCTGATGTAAAGGATGGAGTAAATCAAGCAGCACCTGCATTTGGATTTGGCAGCAGTCAAGCA GCAACATTTATGTCGCCAGGCTTTCCAGTCAATAACAGCAGCAGTGATAATGCTCAGAACTTTAGTTTTAAAACA AACTCTGGATTTGCTGCTGCCTCTTCTGGAAGCCCTGCTGGTTTTGGGAGTTCCCCAGCATTTGGAGCTGCAGCC TCTACCAGTTCAGGTATCTCTACTTCTGCTCCAGCTTTTGGATTTGGGAAGCCTGAAGTCACATCGGCTGCATCA TTTTCATTCAAAAGCCCTGCAGCTTCCAGTTTTGGATCACCTGGATTTTCAGGACTTCCAGCTTCCTTGGCAACA GGTCCTGTCAGAGCTCCAGTGGCCCCAGCCTTTGGAGGTGGCAGTTCTGTGGCTGGTTTTGGTAGTCCGGGCTCA CATTCTCACACTGCTTTTTCTAAGCCATCCAGTGACACTTTTGGAAATAGCAGCATATCCACTTCTCTGTCAGCC TCAAGCAGCATCATTGCAACAGATAATGTGTTATTCACACCCAGAGATAAACTAACAGTAGAAGAACTGGAACAA TTTCAATCCAAGAAATTTACTCTGGGAAAAATTCCATTAAAGCCTCCACCTCTGGAACTTCTAAATGTT ( SEQ ID NO: 303)
Protein :
MAICQFFLQGRCRFGDRCWNEHPGARGAGGGRQQPQQQPSGNNRRGWNTTSQRYSNVIQPSSFSKSTPWGGSRDQ EKPYFSSFDSGASTNRKEGFGLSENPFASLSPDEQKDEKKLLEGIVKDMEVWESSGQWMFSVYSPVKKKPNISGF TDISPEELRLEYHNFLTSNNLQSYLNSVQRLINQWRNRWELKSLNISTKVALLSDVKDGWQAAPAFGFGSSQA ATFMSPGFPWNSSSDNAQNFSFKTNSGFAAASSGSPAGFGSSPAFGAAASTSSGISTSAPAFGFGKPEVTSAAS FSFKSPAASSFGSPGFSGLPASLATGPVRAPVAPAFGGGSSVAGFGSPGSHSHTAFSKPSSDTFGNSSISTSLSA SSSIIATDNVLFTPRDKLTVEELEQFQSKKFTLGKIPLKPPPLELLNV (SEQ ID NO: 304)
PCP: Cytochrome P450 2C1 (Oryctolagus cuniculus , Uniprot: P00180)
DNA:
TCCAAAACCATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAG ATCTTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAG ACCGCGTATCGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACT CAGGTATGGTCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACC AAGTCCCTCGTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGT (SEQ ID NO: 305)
Protein :
SKTIVLSVGEATRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRVNLKLDQADWDSGLPKVRYT QVWSHDVTIVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGR (SEQ ID NO: 306)
LAF-1: ATP-dependent RNA helicase laf-1 (RGG domain, 1-168)
(Caenorhabditis elegans , Uniprot: D0PV95)
DNA:
ATGGAAAGCAACCAGAGCAACAACGGCGGCTCTGGCAACGCCGCTCTGAACAGAGGCGGCAGATACGTGCCCCCC CACCTGAGAGGAGGCGACGGCGGCGCCGCCGCCGCTGCATCTGCCGGCGGAGATGACAGAAGAGGCGGAGCCGGA GGCGGCGGCTATAGACGGGGAGGCGGAAACAGCGGCGGCGGAGGCGGAGGCGGCTACGACAGAGGCTACAACGAC AACCGGGACGACCGGGACAACAGAGGCGGCAGCGGCGGATACGGCAGAGATCGAAACTACGAGGACAGAGGCTAC AATGGCGGAGGCGGAGGCGGCGGCAACCGGGGCTACAACAACAACAGAGGAGGCGGCGGCGGCGGCTACAACCGC CAGGACAGAGGCGATGGCGGATCTAGCAATTTCAGCAGAGGCGGCTACAACAACCGGGACGAGGGCAGCGACAAC AGAGGCAGCGGAAGAAGCTACAACAATGACCGGAGAGATAATGGCGGAGATGGC (SEQ ID NO: 307)
Protein :
MESNQSNNGGSGNAALNRGGRYVPPHLRGGDGGAAAAASAGGDDRRGGAGGGGYRRGGGNSGGGGGGGYDRGYND NRDDRDNRGGSGGYGRDRNYEDRGYNGGGGGGGNRGYNNNRGGGGGGYNRQDRGDGGSSNFSRGGYNNRDEGSDN RGSGRSYNNDRRDNGGDG (SEQ ID NO: 308) SLP3 : Stomatin-like protein 3, aa 1-59 (Homo sapiens , Uniprot: Q8TAV4)
DNA:
ATGGATTCTAGGGTGTCTTCACCTGAGAAGCAAGATAAAGAGAATTTCGTGGGTGTCAACAATAAACGGCTTGGT GTATGTGGCTGGATCCTGTTTTCCCTCTCTTTCCTGTTGGTGATCATTACCTTCCCCATCTCCATATGGATGTGC TTGAAGATCATTAAGGAGTATGAACGT (SEQ ID NO: 309)
Protein :
MDSRVSSPEKQDKENFVGVNNKRLGVCGWILFSLSFLLVIITFPISIWMCLKIIKEYER (SEQ ID NO: 310)
SYNZIP1: Synthetic coiled-coil peptide 1 (Synthetic)
DNA:
AATCTGGTGGCCCAGCTGGAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAACCTG CACAAGAAGGACCTGATCGCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAA (SEQ ID NO: 311)
Protein :
NLVAQLENEVASLENENETLKKKNLHKKDLIAYLEKEIANLRKKIEE (SEQ ID NO: 312)
SYNZIP2 : Synthetic coiled-coil peptide 1 (Synthetic)
DNA:
GCTAGAAACGCCTACCTGAGAAAGAAAATCGCCAGACTGAAGAAGGACAACCTGCAGCTGGAAAGAGACGAGCAG
AACCTGGAAAAGATCATCGCCAACCTCAGAGATGAGATCGCCAGACTGGAAAACGAGGTGGCCAGCCACGAGCAG
(SEQ ID NO: 313)
Protein :
ARNAYLRKKIARLKKDNLQLERDEQNLEKIIANLRDEIARLENEVASHEQ (SEQ ID NO: 314)
SYNZIP3: Synthetic coiled-coil peptide 1 (Synthetic)
DNA:
AATGAGGTGACCACCCTGGAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATCGCC AGACTGAGAAAGGAAAAGGCCGCTCTGCGGAACAGACTGGCCCACAAGAAG (SEQ ID NO: 315)
Protein :
NEVTTLENDAAFIENENAYLEKEIARLRKEKAALRNRLAHKK (SEQ ID NO: 316)
SYNZIP4: Synthetic coiled-coil peptide 1 (Synthetic)
DNA:
CAAAAGGTGGCTGAACTGAAAAATAGAGTGGCCGTGAAGCTGAACCGGAACGAGCAGCTGAAGAACAAGGTGGAA GAGCTGAAGAACAGAAACGCCTACCTGAAGAATGAGCTGGCCACCCTGGAAAACGAGGTGGCCAGACTGGAAAAC GACGTGGCCGAG (SEQ ID NO: 317)
Protein :
QKVAELKNRVAVKLNRNEQLKNKVEELKNRNAYLKNELATLENEVARLENDVAE (SEQ ID NO: 318)
2. _Further Fusion proteis :
EBAG9I-29 : : FUS : : PylRS (AF)
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA
ATATGCAGATCTGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAA
CAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTAC GGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTAT GGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGT AGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGC ACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAG CCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAG AACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCC ATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTAT GGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAAC CGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGAC CGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGAC AACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATT GGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTG AAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAA TTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGT GGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGA GGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGT GAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGA CCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAGGCGCC GATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTG CCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACT GGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATT GAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCAC AAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAG GACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCT CGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATT CCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCC ACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCA GCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGC AAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAA CGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAA TCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATT TTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGAC CGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACAT CTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATC ACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTG GATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATC GACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAA CGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 319)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPY GQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSS TSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSS MSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSD RGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKL KGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRG GFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAGA DYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYI EMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVA RAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAP ALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIK SPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEH LEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGI DKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 320)
EBAG9 : : PylRS (AF)
DNA: ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGCAGAGGACGGAAATTAAGTGGAGACCAAATAACTTTGCCAACTACAGTTGATTATTCATCA GTTCCTAAGCAGACAGATGTTGAAGAGTGGACTTCCTGGGATGAAGATGCACCCACCAGTGTAAAGATCGAAGGA GGGAATGGGAATGTGGCAACACAACAAAATTCTTTGGAACAACTGGAACCTGACTATTTTAAGGACATGACACCA ACTATTAGGAAAACTCAGAAAATTGTTATTAAGAAGAGAGAACCATTGAATTTTGGCATCCCAGATGGGAGCACA GGTTTCTCTAGTAGATTAGCAGCTACACAAGATCTGCCTTTTATTCATCAGTCTTCTGAATTAGGTGACTTAGAT ACCTGGCAGGAAAATACCAATGCATGGGAAGAAGAAGAAGATGCAGCCTGGCAAGCAGAAGAAGTTCTGAGACAG CAGAAACTAGCAGACAGAGAAAAGAGAGCAGCCGAACAACAAAGGAAGAAAATGGAAAAGGAAGCACAACGGCTA ATGAAGAAGGAACAAAACAAAATTGGTGTGAAACTTTCAGGCGCCGATTACAAGGACGATGATGACAAGGGAGCA CCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTG ACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATT CATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTG AACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGT GTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTT AGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAA GCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTT CCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACC AATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTG GAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTG CTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGT GAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATC GAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGC CCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTC GAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGC CAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGAC TTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCT AGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGT CTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAAT GGTATTTCTACTAACCTGTAA (SEQ ID NO: 321)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGRGRKLSGDQITLPTTVDYSSVPKQTDVEEWTSWDEDAPTSVKIEG GNGNVATQQNSLEQLEPDYFKDMTPTIRKTQKIVIKKREPLNFGIPDGSTGFSSRLAATQDLPFIHQSSELGDLD TWQENTNAWEEEEDAAWQAEEVLRQQKLADREKRAAEQQRKKMEKEAQRLMKKEQNKIGVKLSGADYKDDDDKGA PGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLW NNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTE AAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRL EVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYI ERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFC QMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFG LERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 322)
EBAG9 : : FUS : : PylRS (AF)
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGCAGAGGACGGAAATTAAGTGGAGACCAAATAACTTTGCCAACTACAGTTGATTATTCATCA GTTCCTAAGCAGACAGATGTTGAAGAGTGGACTTCCTGGGATGAAGATGCACCCACCAGTGTAAAGATCGAAGGA GGGAATGGGAATGTGGCAACACAACAAAATTCTTTGGAACAACTGGAACCTGACTATTTTAAGGACATGACACCA ACTATTAGGAAAACTCAGAAAATTGTTATTAAGAAGAGAGAACCATTGAATTTTGGCATCCCAGATGGGAGCACA GGTTTCTCTAGTAGATTAGCAGCTACACAAGATCTGCCTTTTATTCATCAGTCTTCTGAATTAGGTGACTTAGAT ACCTGGCAGGAAAATACCAATGCATGGGAAGAAGAAGAAGATGCAGCCTGGCAAGCAGAAGAAGTTCTGAGACAG CAGAAACTAGCAGACAGAGAAAAGAGAGCAGCCGAACAACAAAGGAAGAAAATGGAAAAGGAAGCACAACGGCTA ATGAAGAAGGAACAAAACAAAATTGGTGTGAAACTTTCAGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCC GGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGC TATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGA TATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGA TATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGC TATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAG CCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTAT AATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGA GGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAG AGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGC GGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGA GGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGT CATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAG TCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTG TACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCA GCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGAC TTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGT GGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGT GACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCC CCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGT GGCAGAGGTGGTGCGATCGCAGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCT GCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAA CCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCAC GAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCT CGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTG AACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACT AAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCG TCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACC AGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATG TCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCG AAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAA GACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTC GTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGAC AATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAAT CTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTAT CGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGT ACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGAC AGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCA ATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAA GTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTG TAA (SEQ ID NO: 323)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGRGRKLSGDQITLPTTVDYSSVPKQTDVEEWTSWDEDAPTSVKIEG GNGNVATQQNSLEQLEPDYFKDMTPTIRKTQKIVIKKREPLNFGIPDGSTGFSSRLAATQDLPFIHQSSELGDLD TWQENTNAWEEEEDAAWQAEEVLRQQKLADREKRAAEQQRKKMEKEAQRLMKKEQNKIGVKLSGAPGSAGSAAGS GMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQG YGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSY NPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGG GGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIE SVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRAD FNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKA PKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKK PLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDL NKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVST SISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKK DLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPN LANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGD SCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL
* (SEQ ID NO: 324)
EBAG9 : :MCP
DNA: ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGCAGAGGACGGAAATTAAGTGGAGACCAAATAACTTTGCCAACTACAGTTGATTATTCATCA GTTCCTAAGCAGACAGATGTTGAAGAGTGGACTTCCTGGGATGAAGATGCACCCACCAGTGTAAAGATCGAAGGA GGGAATGGGAATGTGGCAACACAACAAAATTCTTTGGAACAACTGGAACCTGACTATTTTAAGGACATGACACCA ACTATTAGGAAAACTCAGAAAATTGTTATTAAGAAGAGAGAACCATTGAATTTTGGCATCCCAGATGGGAGCACA GGTTTCTCTAGTAGATTAGCAGCTACACAAGATCTGCCTTTTATTCATCAGTCTTCTGAATTAGGTGACTTAGAT ACCTGGCAGGAAAATACCAATGCATGGGAAGAAGAAGAAGATGCAGCCTGGCAAGCAGAAGAAGTTCTGAGACAG CAGAAACTAGCAGACAGAGAAAAGAGAGCAGCCGAACAACAAAGGAAGAAAATGGAAAAGGAAGCACAACGGCTA ATGAAGAAGGAACAAAACAAAATTGGTGTGAAACTTTCAGCGATCGCATATCCCTATGATGTGCCGGATTATGCT GGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAAT GGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCG CGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTC GAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGAC TGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAAC TCCGGCATCTACTAA (SEQ ID NO: 325)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGRGRKLSGDQITLPTTVDYSSVPKQTDVEEWTSWDEDAPTSVKIEG GNGNVATQQNSLEQLEPDYFKDMTPTIRKTQKIVIKKREPLNFGIPDGSTGFSSRLAATQDLPFIHQSSELGDLD TWQENTNAWEEEEDAAWQAEEVLRQQKLADREKRAAEQQRKKMEKEAQRLMKKEQNKIGVKLSAIAYPYDVPDYA GAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKV EVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY* (SEQ ID NO: 326)
EBAG9 : : EWSR1 : : MCP
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGCAGAGGACGGAAATTAAGTGGAGACCAAATAACTTTGCCAACTACAGTTGATTATTCATCA GTTCCTAAGCAGACAGATGTTGAAGAGTGGACTTCCTGGGATGAAGATGCACCCACCAGTGTAAAGATCGAAGGA GGGAATGGGAATGTGGCAACACAACAAAATTCTTTGGAACAACTGGAACCTGACTATTTTAAGGACATGACACCA ACTATTAGGAAAACTCAGAAAATTGTTATTAAGAAGAGAGAACCATTGAATTTTGGCATCCCAGATGGGAGCACA GGTTTCTCTAGTAGATTAGCAGCTACACAAGATCTGCCTTTTATTCATCAGTCTTCTGAATTAGGTGACTTAGAT ACCTGGCAGGAAAATACCAATGCATGGGAAGAAGAAGAAGATGCAGCCTGGCAAGCAGAAGAAGTTCTGAGACAG CAGAAACTAGCAGACAGAGAAAAGAGAGCAGCCGAACAACAAAGGAAGAAAATGGAAAAGGAAGCACAACGGCTA ATGAAGAAGGAACAAAACAAAATTGGTGTGAAACTTTCAATGGCGTCCACGGATTACAGTACCTATAGCCAAGCT GCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGG CAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGG CAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGC CAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTAT GCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACA AGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGC CTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCT CCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAAC ACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCC ACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGG CAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGA CCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATG AGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGT GGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCA ATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTT AAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGAT GCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGG AGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAG GGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATG GGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGA GGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGA ACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGAT CGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATG TTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGA GGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATCGCATATCCCTATGATGTGCCG GATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTC GTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGC TCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACC ATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACG AATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATC GCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 327)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGRGRKLSGDQITLPTTVDYSSVPKQTDVEEWTSWDEDAPTSVKIEG GNGNVATQQNSLEQLEPDYFKDMTPTIRKTQKIVIKKREPLNFGIPDGSTGFSSRLAATQDLPFIHQSSELGDLD TWQENTNAWEEEEDAAWQAEEVLRQQKLADREKRAAEQQRKKMEKEAQRLMKKEQNKIGVKLSMASTDYSTYSQA AAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYS QPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPS LGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPP TSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGM SRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGW KMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPRE GRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWR TECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGG GRRGGPGGPPGPLMEQAIAYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWIS SNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAI AANSGIY* (SEQ ID NO: 328)
EBAG9I-29 : : PylRS (AF)
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA
ATATGCAGATCTGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGT
AGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAAT
ACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGC
CGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCA
CGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTC
CTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCA
ATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGC
AAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGC
AGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCG
GTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAA
ATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAA
CAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGT
GGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACC
GAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAAC
TATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAG
TCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAG
AACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATG
GTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTG
GATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACAC
GACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ
ID NO: 329)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLN TLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKF LTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSIS SISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQ QIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLAN YLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCM VFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 330)
EBAG9I-29 : : FUS : : PylRS (AF)
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAA CAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTAC GGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTAT GGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGT AGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGC ACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAG CCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAG AACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCC ATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTAT GGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAAC CGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGAC CGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGAC AACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATT GGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTG AAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAA TTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGT GGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGA GGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGT GAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGA CCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAGGCGCC GATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTG CCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACT GGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATT GAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCAC AAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAG GACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCT CGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATT CCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCC ACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCA GCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGC AAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAA CGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAA TCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATT TTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGAC CGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACAT CTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATC ACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTG GATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATC GACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAA CGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 331)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPY GQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSS TSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSS MSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSD RGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKL KGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRG GFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAGA DYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYI EMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVA RAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAP ALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIK SPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEH LEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGI DKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 332)
EBAG9I-29 : :MCP
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGCGATCGCATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCT GCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCC CCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGT AGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCG TACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAA GGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 333)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSAIAYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVA PSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQ GLLKDGNPIPSAIAANSGIY* (SEQ ID NO: 334)
EBAG9I-29 : : EWSR1 : :MCP
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA
ATATGCAGATCTATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTAC
ACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAG
CCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGA
CAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGT
GCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAG
CCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACT
GAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGT
TATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC
TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGA
CAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCC
TACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCAC
CCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGC
CCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGC
GGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTT
GATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGT
GTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCC
ATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCT
ACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGG
AAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGT
GGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGC
TTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGAC
TGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCA
AAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATG
CGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGA
GGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCC
CCTGGACCTTTGATGGAACAGGCGATCGCATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCT
GGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTG
ACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAA
GTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCC
TGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAG
GCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA
(SEQ ID NO: 335) Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQ PTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQ PAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYS STQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDH PSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDL DLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPP TAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGG FPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGM RGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAYPYDVPDYAGAPGSA GSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGA WRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAA SGIY* (SEQ ID NO: 336)
EBAG9 : : EWSR1 : : 4clN22
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGCAGAGGACGGAAATTAAGTGGAGACCAAATAACTTTGCCAACTACAGTTGATTATTCATCA GTTCCTAAGCAGACAGATGTTGAAGAGTGGACTTCCTGGGATGAAGATGCACCCACCAGTGTAAAGATCGAAGGA GGGAATGGGAATGTGGCAACACAACAAAATTCTTTGGAACAACTGGAACCTGACTATTTTAAGGACATGACACCA ACTATTAGGAAAACTCAGAAAATTGTTATTAAGAAGAGAGAACCATTGAATTTTGGCATCCCAGATGGGAGCACA GGTTTCTCTAGTAGATTAGCAGCTACACAAGATCTGCCTTTTATTCATCAGTCTTCTGAATTAGGTGACTTAGAT ACCTGGCAGGAAAATACCAATGCATGGGAAGAAGAAGAAGATGCAGCCTGGCAAGCAGAAGAAGTTCTGAGACAG CAGAAACTAGCAGACAGAGAAAAGAGAGCAGCCGAACAACAAAGGAAGAAAATGGAAAAGGAAGCACAACGGCTA ATGAAGAAGGAACAAAACAAAATTGGTGTGAAACTTTCAATGGCGTCCACGGATTACAGTACCTATAGCCAAGCT GCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGG CAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGG CAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGC CAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTAT GCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACA AGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGC CTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCT CCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAAC ACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCC ACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGG CAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGA CCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATG AGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGT GGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCA ATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTT AAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGAT GCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGG AGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAG GGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATG GGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGA GGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGA ACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGAT CGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATG TTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGA GGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATCGCAGGAGCACCAGGAAGTGCT GGTTCTGCTGCTGGTAGTGGAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCACAAACA CGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCT GGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCT GAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGA GCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAA GCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATG GACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAG TCTAGAGGGCCCGTTTAA (SEQ ID NO: 337)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGRGRKLSGDQITLPTTVDYSSVPKQTDVEEWTSWDEDAPTSVKIEG GNGNVATQQNSLEQLEPDYFKDMTPTIRKTQKIVIKKREPLNFGIPDGSTGFSSRLAATQDLPFIHQSSELGDLD TWQENTNAWEEEEDAAWQAEEVLRQQKLADREKRAAEQQRKKMEKEAQRLMKKEQNKIGVKLSMASTDYSTYSQA AAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYS QPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPS LGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPP TSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGM SRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGW KMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPRE GRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWR TECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGG GRRGGPGGPPGPLMEQAIAGAPGSAGSAAGSGEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANPPLDGAGA GAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWK AANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLESRGPV* (SEQ ID NO: 338)
EBAG9I-29 : : EWSR1 : : 4clN22
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA
ATATGCAGATCTATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTAC
ACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAG
CCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGA
CAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGT
GCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAG
CCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACT
GAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGT
TATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC
TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGA
CAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCC
TACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCAC
CCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGC
CCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGC
GGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTT
GATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGT
GTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCC
ATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCT
ACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGG
AAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGT
GGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGC
TTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGAC
TGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCA
AAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATG
CGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGA
GGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCC
CCTGGACCTTTGATGGAACAGGCGATCGCAGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGAGCAG
AAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAA
CAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGC
GGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCA
AACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCA
CAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCC
GGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGT
CGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCTAGAGGGCCCGTTTAA ( SEQ ID
NO: 339)
Protein : MAITQFRLFKFCTCLATVFSFLKRLICRSMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQ PTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQ PAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYS STQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDH PSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDL DLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPP TAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGG FPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGM RGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAGAPGSAGSAAGSGEQ KLISEEDLLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAA NPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRER RAEKQAQWKAA PPLESRGPV* (SEQ ID NO: 340)
EBAG9 : : PylRS (AA)
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGCAGAGGACGGAAATTAAGTGGAGACCAAATAACTTTGCCAACTACAGTTGATTATTCATCA GTTCCTAAGCAGACAGATGTTGAAGAGTGGACTTCCTGGGATGAAGATGCACCCACCAGTGTAAAGATCGAAGGA GGGAATGGGAATGTGGCAACACAACAAAATTCTTTGGAACAACTGGAACCTGACTATTTTAAGGACATGACACCA ACTATTAGGAAAACTCAGAAAATTGTTATTAAGAAGAGAGAACCATTGAATTTTGGCATCCCAGATGGGAGCACA GGTTTCTCTAGTAGATTAGCAGCTACACAAGATCTGCCTTTTATTCATCAGTCTTCTGAATTAGGTGACTTAGAT ACCTGGCAGGAAAATACCAATGCATGGGAAGAAGAAGAAGATGCAGCCTGGCAAGCAGAAGAAGTTCTGAGACAG CAGAAACTAGCAGACAGAGAAAAGAGAGCAGCCGAACAACAAAGGAAGAAAATGGAAAAGGAAGCACAACGGCTA ATGAAGAAGGAACAAAACAAAATTGGTGTGAAACTTTCAGGCGCCGATTACAAGGACGATGATGACAAGGGAGCA CCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTG ACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATT CATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTG AACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGT GTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTT AGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAA GCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTT CCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACC AATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTG GAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTG CTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGT GAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATC GAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGC CCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTC GAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCC CAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGAC TTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCT AGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGT CTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAAC GGGATCTCTACGAACCTGTAA (SEQ ID NO: 341)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGRGRKLSGDQITLPTTVDYSSVPKQTDVEEWTSWDEDAPTSVKIEG GNGNVATQQNSLEQLEPDYFKDMTPTIRKTQKIVIKKREPLNFGIPDGSTGFSSRLAATQDLPFIHQSSELGDLD TWQENTNAWEEEEDAAWQAEEVLRQQKLADREKRAAEQQRKKMEKEAQRLMKKEQNKIGVKLSGADYKDDDDKGA PGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLW NNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTE AAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRL EVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYI ERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFA QMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFG LERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 342) EBAG9 : : PylRS (AAAF)
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGCAGAGGACGGAAATTAAGTGGAGACCAAATAACTTTGCCAACTACAGTTGATTATTCATCA
GTTCCTAAGCAGACAGATGTTGAAGAGTGGACTTCCTGGGATGAAGATGCACCCACCAGTGTAAAGATCGAAGGA GGGAATGGGAATGTGGCAACACAACAAAATTCTTTGGAACAACTGGAACCTGACTATTTTAAGGACATGACACCA ACTATTAGGAAAACTCAGAAAATTGTTATTAAGAAGAGAGAACCATTGAATTTTGGCATCCCAGATGGGAGCACA GGTTTCTCTAGTAGATTAGCAGCTACACAAGATCTGCCTTTTATTCATCAGTCTTCTGAATTAGGTGACTTAGAT ACCTGGCAGGAAAATACCAATGCATGGGAAGAAGAAGAAGATGCAGCCTGGCAAGCAGAAGAAGTTCTGAGACAG
CAGAAACTAGCAGACAGAGAAAAGAGAGCAGCCGAACAACAAAGGAAGAAAATGGAAAAGGAAGCACAACGGCTA ATGAAGAAGGAACAAAACAAAATTGGTGTGAAACTTTCAGGCGCCGATTACAAGGACGATGATGACAAGGGAGCA CCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTG ACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATT CATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTG AACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGT GTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTT AGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAA GCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTT CCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACC AATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTG GAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTG CTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGT GAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATC GAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGC CCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTC GAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCC CAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGAC TTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCT AGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGT CTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAAT GGTATTTCTACTAACCTGTAA (SEQ ID NO: 343)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGRGRKLSGDQITLPTTVDYSSVPKQTDVEEWTSWDEDAPTSVKIEG GNGNVATQQNSLEQLEPDYFKDMTPTIRKTQKIVIKKREPLNFGIPDGSTGFSSRLAATQDLPFIHQSSELGDLD TWQENTNAWEEEEDAAWQAEEVLRQQKLADREKRAAEQQRKKMEKEAQRLMKKEQNKIGVKLSGADYKDDDDKGA PGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLW NNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTE AAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRL
EVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYI ERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFA QMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFG LERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 344)
EBAG9 : : FUS : : PylRS (AA)
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGCAGAGGACGGAAATTAAGTGGAGACCAAATAACTTTGCCAACTACAGTTGATTATTCATCA
GTTCCTAAGCAGACAGATGTTGAAGAGTGGACTTCCTGGGATGAAGATGCACCCACCAGTGTAAAGATCGAAGGA GGGAATGGGAATGTGGCAACACAACAAAATTCTTTGGAACAACTGGAACCTGACTATTTTAAGGACATGACACCA ACTATTAGGAAAACTCAGAAAATTGTTATTAAGAAGAGAGAACCATTGAATTTTGGCATCCCAGATGGGAGCACA GGTTTCTCTAGTAGATTAGCAGCTACACAAGATCTGCCTTTTATTCATCAGTCTTCTGAATTAGGTGACTTAGAT ACCTGGCAGGAAAATACCAATGCATGGGAAGAAGAAGAAGATGCAGCCTGGCAAGCAGAAGAAGTTCTGAGACAG CAGAAACTAGCAGACAGAGAAAAGAGAGCAGCCGAACAACAAAGGAAGAAAATGGAAAAGGAAGCACAACGGCTA ATGAAGAAGGAACAAAACAAAATTGGTGTGAAACTTTCAGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCC GGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGC TATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGA TATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGA TATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGC TATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAG CCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTAT AATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGA GGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAG AGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGC GGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGA GGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGT CATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAG TCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTG TACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCA GCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGAC TTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGT GGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGT GACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCC CCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGT GGCAGAGGTGGTGCGATCGCAGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCT GCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAA CCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCAC GAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCT CGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTG AACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACT AAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCG TCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACC AGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATG TCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCG AAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAA GACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTC GTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGAC AATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAAT CTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTAT CGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGT ACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGAC AGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCA ATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAA GTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTG TAA (SEQ ID NO: 345)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGRGRKLSGDQITLPTTVDYSSVPKQTDVEEWTSWDEDAPTSVKIEG GNGNVATQQNSLEQLEPDYFKDMTPTIRKTQKIVIKKREPLNFGIPDGSTGFSSRLAATQDLPFIHQSSELGDLD TWQENTNAWEEEEDAAWQAEEVLRQQKLADREKRAAEQQRKKMEKEAQRLMKKEQNKIGVKLSGAPGSAGSAAGS GMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQG YGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSY NPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGG GGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIE SVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRAD FNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKA PKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKK PLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDL NKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVST SISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKK DLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPN LYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGD SCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL
* (SEQ ID NO: 346) EBAG9 : : FUS : : PylRS (AAAF)
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGCAGAGGACGGAAATTAAGTGGAGACCAAATAACTTTGCCAACTACAGTTGATTATTCATCA GTTCCTAAGCAGACAGATGTTGAAGAGTGGACTTCCTGGGATGAAGATGCACCCACCAGTGTAAAGATCGAAGGA GGGAATGGGAATGTGGCAACACAACAAAATTCTTTGGAACAACTGGAACCTGACTATTTTAAGGACATGACACCA ACTATTAGGAAAACTCAGAAAATTGTTATTAAGAAGAGAGAACCATTGAATTTTGGCATCCCAGATGGGAGCACA GGTTTCTCTAGTAGATTAGCAGCTACACAAGATCTGCCTTTTATTCATCAGTCTTCTGAATTAGGTGACTTAGAT ACCTGGCAGGAAAATACCAATGCATGGGAAGAAGAAGAAGATGCAGCCTGGCAAGCAGAAGAAGTTCTGAGACAG CAGAAACTAGCAGACAGAGAAAAGAGAGCAGCCGAACAACAAAGGAAGAAAATGGAAAAGGAAGCACAACGGCTA ATGAAGAAGGAACAAAACAAAATTGGTGTGAAACTTTCAGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCC GGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGC TATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGA TATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGA TATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGC TATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAG CCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTAT AATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGA GGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAG AGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGC GGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGA GGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGT CATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAG TCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTG TACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCA GCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGAC TTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGT GGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGT GACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCC CCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGT GGCAGAGGTGGTGCGATCGCAGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCT GCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAA CCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCAC GAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCT CGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTG AACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACT AAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCG TCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACC AGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATG TCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCG AAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAA GACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTC GTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGAC AATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAAT CTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTAT CGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGT ACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGAC AGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCA ATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAA GTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTG TAA (SEQ ID NO: 347)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGRGRKLSGDQITLPTTVDYSSVPKQTDVEEWTSWDEDAPTSVKIEG
GNGNVATQQNSLEQLEPDYFKDMTPTIRKTQKIVIKKREPLNFGIPDGSTGFSSRLAATQDLPFIHQSSELGDLD
TWQENTNAWEEEEDAAWQAEEVLRQQKLADREKRAAEQQRKKMEKEAQRLMKKEQNKIGVKLSGAPGSAGSAAGS GMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQG YGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSY NPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGG GGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIE SVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRAD FNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKA PKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKK PLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDL NKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVST SISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKK DLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPN LANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGD SCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL
* (SEQ ID NO: 348)
EBAG9I-29 : : FUS : : PylRS (AA)
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAA CAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTAC GGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTAT GGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGT AGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGC ACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAG CCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAG AACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCC ATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTAT GGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAAC CGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGAC CGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGAC AACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATT GGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTG AAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAA TTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGT GGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGA GGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGT GAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGA CCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAGGCGCC GATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTG CCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACT GGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATT GAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCAC AAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAG GACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCT CGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATT CCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCC ACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCA GCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGC AAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAA CGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAA TCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATT TTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGAC CGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACAT CTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATC ACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTG GATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATC GACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAA CGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 349)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPY GQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSS TSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSS MSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSD RGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKL KGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRG GFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAGA DYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYI EMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVA RAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAP ALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIK SPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEH LEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGI DKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 350)
EBAG9I-29 : : FUS : : PylRS (AAAF)
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAA CAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTAC GGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTAT GGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGT AGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGC ACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAG CCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAG AACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCC ATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTAT GGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAAC CGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGAC CGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGAC AACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATT GGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTG AAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAA TTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGT GGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGA GGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGT GAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGA CCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAGGCGCC GATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTG CCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACT GGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATT GAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCAC AAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAG GACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCT CGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATT CCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCC ACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCA GCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGC AAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAA CGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAA TCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATT TTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGAC CGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACAT CTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATC ACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTG GATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATC GACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAA CGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 351)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPY GQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSS TSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSS MSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSD RGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKL KGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRG GFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAGA DYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYI EMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVA RAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAP ALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIK SPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEH LEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGI DKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 352)
EBAG9I-29 : : FUS : : MCP : : PylRS (AF)
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAA CAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTAC GGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTAT GGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGT AGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGC ACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAG CCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAG AACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCC ATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTAT GGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAAC CGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGAC CGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGAC AACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATT GGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTG AAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAA TTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGT GGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGA GGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGT GAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGA CCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCATATCCC TATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACT CAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCT GAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAAT CGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCA ATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATT CCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGA AGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTG GATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAA ATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAAT AGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCC GATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCT CCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCA CAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCA AGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCG ATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTT CTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCA CGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATC ACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGT ATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATG CTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATC GGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATG GGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAA ATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCC GTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAG CGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATT TCTACTAACCTGTAA (SEQ ID NO: 353)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPY GQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSS TSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSS MSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSD RGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKL KGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRG GFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAYP YDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQN RKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGADYKDDDDKGAPG SAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWN SRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAA QAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEV LLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIER MGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQM GSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLE RLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 354)
CGI : : PylRS (AF)
DNA:
ATGGCCATTTGTCAATTCTTCCTTCAAGGCCGGTGCCGCTTTGGAGATCGGTGCTGGAACGAACATCCCGGTGCT AGGGGTGCAGGAGGAGGACGGCAGCAACCGCAGCAGCAGCCTTCAGGTAATAATAGACGTGGATGGAATACAACT AGCCAGAGATATTCCAATGTCATCCAGCCATCCAGTTTCTCCAAATCCACACCATGGGGGGGCAGCAGAGATCAA GAAAAGCCATATTTCAGTTCTTTTGATTCTGGAGCTTCAACTAACAGGAAGGAAGGCTTTGGATTGTCTGAGAAC CCATTTGCTTCACTTAGTCCTGATGAGCAGAAAGATGAAAAGAAACTTCTGGAAGGAATTGTAAAAGATATGGAG GTTTGGGAATCATCAGGGCAGTGGATGTTTTCTGTTTATTCACCAGTGAAAAAGAAACCTAATATTTCAGGTTTT ACAGACATTTCACCAGAGGAATTGAGGCTTGAATACCATAACTTCTTAACCAGCAATAACTTACAGAGTTATCTA AATTCTGTCCAACGTTTAATAAATCAATGGAGGAACAGGGTAAATGAACTGAAAAGTCTAAATATATCAACTAAA GTAGCTTTGCTCTCTGATGTAAAGGATGGAGTAAATCAAGCAGCACCTGCATTTGGATTTGGCAGCAGTCAAGCA GCAACATTTATGTCGCCAGGCTTTCCAGTCAATAACAGCAGCAGTGATAATGCTCAGAACTTTAGTTTTAAAACA AACTCTGGATTTGCTGCTGCCTCTTCTGGAAGCCCTGCTGGTTTTGGGAGTTCCCCAGCATTTGGAGCTGCAGCC TCTACCAGTTCAGGTATCTCTACTTCTGCTCCAGCTTTTGGATTTGGGAAGCCTGAAGTCACATCGGCTGCATCA TTTTCATTCAAAAGCCCTGCAGCTTCCAGTTTTGGATCACCTGGATTTTCAGGACTTCCAGCTTCCTTGGCAACA GGTCCTGTCAGAGCTCCAGTGGCCCCAGCCTTTGGAGGTGGCAGTTCTGTGGCTGGTTTTGGTAGTCCGGGCTCA CATTCTCACACTGCTTTTTCTAAGCCATCCAGTGACACTTTTGGAAATAGCAGCATATCCACTTCTCTGTCAGCC TCAAGCAGCATCATTGCAACAGATAATGTGTTATTCACACCCAGAGATAAACTAACAGTAGAAGAACTGGAACAA TTTCAATCCAAGAAATTTACTCTGGGAAAAATTCCATTAAAGCCTCCACCTCTGGAACTTCTAAATGTTGGCGCC GATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTG CCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACT GGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATT GAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCAC AAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAG GACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCT CGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATT CCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCC ACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCA GCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGC AAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAA CGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAA TCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATT TTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGAC CGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACAT CTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATC ACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTG GATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATC GACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAA CGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 355)
Protein :
MAICQFFLQGRCRFGDRCWNEHPGARGAGGGRQQPQQQPSGNNRRGWNTTSQRYSNVIQPSSFSKSTPWGGSRDQ EKPYFSSFDSGASTNRKEGFGLSENPFASLSPDEQKDEKKLLEGIVKDMEVWESSGQWMFSVYSPVKKKPNISGF TDISPEELRLEYHNFLTSNNLQSYLNSVQRLINQWRNRWELKSLNISTKVALLSDVKDGWQAAPAFGFGSSQA ATFMSPGFPWNSSSDNAQNFSFKTNSGFAAASSGSPAGFGSSPAFGAAASTSSGISTSAPAFGFGKPEVTSAAS FSFKSPAASSFGSPGFSGLPASLATGPVRAPVAPAFGGGSSVAGFGSPGSHSHTAFSKPSSDTFGNSSISTSLSA SSSIIATDNVLFTPRDKLTVEELEQFQSKKFTLGKIPLKPPPLELLNVGADYKDDDDKGAPGSAGSAAGSGACPV PLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHH KYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSG KPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQI FRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESII TDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIK RAARSESYYNGISTNL* (SEQ ID NO: 356)
CGI : : PylRS (AA)
DNA:
ATGGCCATTTGTCAATTCTTCCTTCAAGGCCGGTGCCGCTTTGGAGATCGGTGCTGGAACGAACATCCCGGTGCT AGGGGTGCAGGAGGAGGACGGCAGCAACCGCAGCAGCAGCCTTCAGGTAATAATAGACGTGGATGGAATACAACT AGCCAGAGATATTCCAATGTCATCCAGCCATCCAGTTTCTCCAAATCCACACCATGGGGGGGCAGCAGAGATCAA GAAAAGCCATATTTCAGTTCTTTTGATTCTGGAGCTTCAACTAACAGGAAGGAAGGCTTTGGATTGTCTGAGAAC CCATTTGCTTCACTTAGTCCTGATGAGCAGAAAGATGAAAAGAAACTTCTGGAAGGAATTGTAAAAGATATGGAG GTTTGGGAATCATCAGGGCAGTGGATGTTTTCTGTTTATTCACCAGTGAAAAAGAAACCTAATATTTCAGGTTTT ACAGACATTTCACCAGAGGAATTGAGGCTTGAATACCATAACTTCTTAACCAGCAATAACTTACAGAGTTATCTA AATTCTGTCCAACGTTTAATAAATCAATGGAGGAACAGGGTAAATGAACTGAAAAGTCTAAATATATCAACTAAA GTAGCTTTGCTCTCTGATGTAAAGGATGGAGTAAATCAAGCAGCACCTGCATTTGGATTTGGCAGCAGTCAAGCA GCAACATTTATGTCGCCAGGCTTTCCAGTCAATAACAGCAGCAGTGATAATGCTCAGAACTTTAGTTTTAAAACA AACTCTGGATTTGCTGCTGCCTCTTCTGGAAGCCCTGCTGGTTTTGGGAGTTCCCCAGCATTTGGAGCTGCAGCC TCTACCAGTTCAGGTATCTCTACTTCTGCTCCAGCTTTTGGATTTGGGAAGCCTGAAGTCACATCGGCTGCATCA TTTTCATTCAAAAGCCCTGCAGCTTCCAGTTTTGGATCACCTGGATTTTCAGGACTTCCAGCTTCCTTGGCAACA GGTCCTGTCAGAGCTCCAGTGGCCCCAGCCTTTGGAGGTGGCAGTTCTGTGGCTGGTTTTGGTAGTCCGGGCTCA CATTCTCACACTGCTTTTTCTAAGCCATCCAGTGACACTTTTGGAAATAGCAGCATATCCACTTCTCTGTCAGCC TCAAGCAGCATCATTGCAACAGATAATGTGTTATTCACACCCAGAGATAAACTAACAGTAGAAGAACTGGAACAA TTTCAATCCAAGAAATTTACTCTGGGAAAAATTCCATTAAAGCCTCCACCTCTGGAACTTCTAAATGTTGGCGCC GATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTG CCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACT GGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATT GAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCAC AAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAG GACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCT CGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATT CCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCC ACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCA GCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGC AAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAA CGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAA TCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATT TTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGAC CGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACAT CTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATC ACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTG GATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATC GACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAA CGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 357)
Protein :
MAICQFFLQGRCRFGDRCWNEHPGARGAGGGRQQPQQQPSGNNRRGWNTTSQRYSNVIQPSSFSKSTPWGGSRDQ EKPYFSSFDSGASTNRKEGFGLSENPFASLSPDEQKDEKKLLEGIVKDMEVWESSGQWMFSVYSPVKKKPNISGF TDISPEELRLEYHNFLTSNNLQSYLNSVQRLINQWRNRWELKSLNISTKVALLSDVKDGWQAAPAFGFGSSQA ATFMSPGFPWNSSSDNAQNFSFKTNSGFAAASSGSPAGFGSSPAFGAAASTSSGISTSAPAFGFGKPEVTSAAS FSFKSPAASSFGSPGFSGLPASLATGPVRAPVAPAFGGGSSVAGFGSPGSHSHTAFSKPSSDTFGNSSISTSLSA SSSIIATDNVLFTPRDKLTVEELEQFQSKKFTLGKIPLKPPPLELLNVGADYKDDDDKGAPGSAGSAAGSGACPV PLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHH KYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSG KPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQI FRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESII TDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIK RAARSESYYNGISTNL* (SEQ ID NO: 358)
CGI : : PylRS (AAAF)
DNA:
ATGGCCATTTGTCAATTCTTCCTTCAAGGCCGGTGCCGCTTTGGAGATCGGTGCTGGAACGAACATCCCGGTGCT AGGGGTGCAGGAGGAGGACGGCAGCAACCGCAGCAGCAGCCTTCAGGTAATAATAGACGTGGATGGAATACAACT AGCCAGAGATATTCCAATGTCATCCAGCCATCCAGTTTCTCCAAATCCACACCATGGGGGGGCAGCAGAGATCAA GAAAAGCCATATTTCAGTTCTTTTGATTCTGGAGCTTCAACTAACAGGAAGGAAGGCTTTGGATTGTCTGAGAAC CCATTTGCTTCACTTAGTCCTGATGAGCAGAAAGATGAAAAGAAACTTCTGGAAGGAATTGTAAAAGATATGGAG GTTTGGGAATCATCAGGGCAGTGGATGTTTTCTGTTTATTCACCAGTGAAAAAGAAACCTAATATTTCAGGTTTT ACAGACATTTCACCAGAGGAATTGAGGCTTGAATACCATAACTTCTTAACCAGCAATAACTTACAGAGTTATCTA AATTCTGTCCAACGTTTAATAAATCAATGGAGGAACAGGGTAAATGAACTGAAAAGTCTAAATATATCAACTAAA GTAGCTTTGCTCTCTGATGTAAAGGATGGAGTAAATCAAGCAGCACCTGCATTTGGATTTGGCAGCAGTCAAGCA GCAACATTTATGTCGCCAGGCTTTCCAGTCAATAACAGCAGCAGTGATAATGCTCAGAACTTTAGTTTTAAAACA AACTCTGGATTTGCTGCTGCCTCTTCTGGAAGCCCTGCTGGTTTTGGGAGTTCCCCAGCATTTGGAGCTGCAGCC TCTACCAGTTCAGGTATCTCTACTTCTGCTCCAGCTTTTGGATTTGGGAAGCCTGAAGTCACATCGGCTGCATCA TTTTCATTCAAAAGCCCTGCAGCTTCCAGTTTTGGATCACCTGGATTTTCAGGACTTCCAGCTTCCTTGGCAACA GGTCCTGTCAGAGCTCCAGTGGCCCCAGCCTTTGGAGGTGGCAGTTCTGTGGCTGGTTTTGGTAGTCCGGGCTCA CATTCTCACACTGCTTTTTCTAAGCCATCCAGTGACACTTTTGGAAATAGCAGCATATCCACTTCTCTGTCAGCC TCAAGCAGCATCATTGCAACAGATAATGTGTTATTCACACCCAGAGATAAACTAACAGTAGAAGAACTGGAACAA TTTCAATCCAAGAAATTTACTCTGGGAAAAATTCCATTAAAGCCTCCACCTCTGGAACTTCTAAATGTTGGCGCC GATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTG CCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACT GGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATT GAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCAC AAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAG GACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCT CGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATT CCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCC ACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCA GCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGC AAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAA CGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAA TCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATT TTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGAC CGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACAT CTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATC ACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTG GATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATC GACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAA CGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 359)
Protein :
MAICQFFLQGRCRFGDRCWNEHPGARGAGGGRQQPQQQPSGNNRRGWNTTSQRYSNVIQPSSFSKSTPWGGSRDQ EKPYFSSFDSGASTNRKEGFGLSENPFASLSPDEQKDEKKLLEGIVKDMEVWESSGQWMFSVYSPVKKKPNISGF TDISPEELRLEYHNFLTSNNLQSYLNSVQRLINQWRNRWELKSLNISTKVALLSDVKDGWQAAPAFGFGSSQA ATFMSPGFPWNSSSDNAQNFSFKTNSGFAAASSGSPAGFGSSPAFGAAASTSSGISTSAPAFGFGKPEVTSAAS FSFKSPAASSFGSPGFSGLPASLATGPVRAPVAPAFGGGSSVAGFGSPGSHSHTAFSKPSSDTFGNSSISTSLSA SSSIIATDNVLFTPRDKLTVEELEQFQSKKFTLGKIPLKPPPLELLNVGADYKDDDDKGAPGSAGSAAGSGACPV PLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHH KYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSG KPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQI FRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESII TDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIK RAARSESYYNGISTNL* (SEQ ID NO: 360)
CGI : : FUS : : PylRS (AA)
DNA:
ATGGCCATTTGTCAATTCTTCCTTCAAGGCCGGTGCCGCTTTGGAGATCGGTGCTGGAACGAACATCCCGGTGCT AGGGGTGCAGGAGGAGGACGGCAGCAACCGCAGCAGCAGCCTTCAGGTAATAATAGACGTGGATGGAATACAACT AGCCAGAGATATTCCAATGTCATCCAGCCATCCAGTTTCTCCAAATCCACACCATGGGGGGGCAGCAGAGATCAA GAAAAGCCATATTTCAGTTCTTTTGATTCTGGAGCTTCAACTAACAGGAAGGAAGGCTTTGGATTGTCTGAGAAC CCATTTGCTTCACTTAGTCCTGATGAGCAGAAAGATGAAAAGAAACTTCTGGAAGGAATTGTAAAAGATATGGAG GTTTGGGAATCATCAGGGCAGTGGATGTTTTCTGTTTATTCACCAGTGAAAAAGAAACCTAATATTTCAGGTTTT ACAGACATTTCACCAGAGGAATTGAGGCTTGAATACCATAACTTCTTAACCAGCAATAACTTACAGAGTTATCTA AATTCTGTCCAACGTTTAATAAATCAATGGAGGAACAGGGTAAATGAACTGAAAAGTCTAAATATATCAACTAAA GTAGCTTTGCTCTCTGATGTAAAGGATGGAGTAAATCAAGCAGCACCTGCATTTGGATTTGGCAGCAGTCAAGCA GCAACATTTATGTCGCCAGGCTTTCCAGTCAATAACAGCAGCAGTGATAATGCTCAGAACTTTAGTTTTAAAACA AACTCTGGATTTGCTGCTGCCTCTTCTGGAAGCCCTGCTGGTTTTGGGAGTTCCCCAGCATTTGGAGCTGCAGCC TCTACCAGTTCAGGTATCTCTACTTCTGCTCCAGCTTTTGGATTTGGGAAGCCTGAAGTCACATCGGCTGCATCA TTTTCATTCAAAAGCCCTGCAGCTTCCAGTTTTGGATCACCTGGATTTTCAGGACTTCCAGCTTCCTTGGCAACA GGTCCTGTCAGAGCTCCAGTGGCCCCAGCCTTTGGAGGTGGCAGTTCTGTGGCTGGTTTTGGTAGTCCGGGCTCA CATTCTCACACTGCTTTTTCTAAGCCATCCAGTGACACTTTTGGAAATAGCAGCATATCCACTTCTCTGTCAGCC TCAAGCAGCATCATTGCAACAGATAATGTGTTATTCACACCCAGAGATAAACTAACAGTAGAAGAACTGGAACAA TTTCAATCCAAGAAATTTACTCTGGGAAAAATTCCATTAAAGCCTCCACCTCTGGAACTTCTAAATGTTGGAGCA CCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTAT GGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGT GGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACA GGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCG TCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGT AGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAG CAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGC AGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGC AGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGA GGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTAT GAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAA TTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTG CAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAAC AAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTC TCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATC AAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGA GGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGT GGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCT TGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATG GGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAGGCGCCGATTACAAGGACGATGAT GACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCG CTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGT ACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGAT CATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGT AAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAA GTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTG GAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAG TCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTT AAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAA ACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTG GAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGG AAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCT CTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAAC TTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCT ATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATG CTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCAC CTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGAC CTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGA GCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAA TCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 361)
Protein :
MAICQFFLQGRCRFGDRCWNEHPGARGAGGGRQQPQQQPSGNNRRGWNTTSQRYSNVIQPSSFSKSTPWGGSRDQ EKPYFSSFDSGASTNRKEGFGLSENPFASLSPDEQKDEKKLLEGIVKDMEVWESSGQWMFSVYSPVKKKPNISGF TDISPEELRLEYHNFLTSNNLQSYLNSVQRLINQWRNRWELKSLNISTKVALLSDVKDGWQAAPAFGFGSSQA ATFMSPGFPWNSSSDNAQNFSFKTNSGFAAASSGSPAGFGSSPAFGAAASTSSGISTSAPAFGFGKPEVTSAAS FSFKSPAASSFGSPGFSGLPASLATGPVRAPVAPAFGGGSSVAGFGSPGSHSHTAFSKPSSDTFGNSSISTSLSA SSSIIATDNVLFTPRDKLTVEELEQFQSKKFTLGKIPLKPPPLELLNVGAPGSAGSAAGSGMASNDYTQQATQSY GAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQS SYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSS SGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGY EPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTN KKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRG GPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHM GGNYGDDRRGGRGGAIAGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSR TGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVK VKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALV KGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLG KLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDP IKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGD LELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 362)
CGI : : FUS : : PylRS (AAAF)
DNA:
ATGGCCATTTGTCAATTCTTCCTTCAAGGCCGGTGCCGCTTTGGAGATCGGTGCTGGAACGAACATCCCGGTGCT AGGGGTGCAGGAGGAGGACGGCAGCAACCGCAGCAGCAGCCTTCAGGTAATAATAGACGTGGATGGAATACAACT AGCCAGAGATATTCCAATGTCATCCAGCCATCCAGTTTCTCCAAATCCACACCATGGGGGGGCAGCAGAGATCAA GAAAAGCCATATTTCAGTTCTTTTGATTCTGGAGCTTCAACTAACAGGAAGGAAGGCTTTGGATTGTCTGAGAAC CCATTTGCTTCACTTAGTCCTGATGAGCAGAAAGATGAAAAGAAACTTCTGGAAGGAATTGTAAAAGATATGGAG GTTTGGGAATCATCAGGGCAGTGGATGTTTTCTGTTTATTCACCAGTGAAAAAGAAACCTAATATTTCAGGTTTT ACAGACATTTCACCAGAGGAATTGAGGCTTGAATACCATAACTTCTTAACCAGCAATAACTTACAGAGTTATCTA AATTCTGTCCAACGTTTAATAAATCAATGGAGGAACAGGGTAAATGAACTGAAAAGTCTAAATATATCAACTAAA GTAGCTTTGCTCTCTGATGTAAAGGATGGAGTAAATCAAGCAGCACCTGCATTTGGATTTGGCAGCAGTCAAGCA GCAACATTTATGTCGCCAGGCTTTCCAGTCAATAACAGCAGCAGTGATAATGCTCAGAACTTTAGTTTTAAAACA AACTCTGGATTTGCTGCTGCCTCTTCTGGAAGCCCTGCTGGTTTTGGGAGTTCCCCAGCATTTGGAGCTGCAGCC TCTACCAGTTCAGGTATCTCTACTTCTGCTCCAGCTTTTGGATTTGGGAAGCCTGAAGTCACATCGGCTGCATCA TTTTCATTCAAAAGCCCTGCAGCTTCCAGTTTTGGATCACCTGGATTTTCAGGACTTCCAGCTTCCTTGGCAACA GGTCCTGTCAGAGCTCCAGTGGCCCCAGCCTTTGGAGGTGGCAGTTCTGTGGCTGGTTTTGGTAGTCCGGGCTCA CATTCTCACACTGCTTTTTCTAAGCCATCCAGTGACACTTTTGGAAATAGCAGCATATCCACTTCTCTGTCAGCC TCAAGCAGCATCATTGCAACAGATAATGTGTTATTCACACCCAGAGATAAACTAACAGTAGAAGAACTGGAACAA TTTCAATCCAAGAAATTTACTCTGGGAAAAATTCCATTAAAGCCTCCACCTCTGGAACTTCTAAATGTTGGAGCA CCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTAT GGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGT GGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACA GGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCG TCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGT AGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAG CAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGC AGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGC AGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGA GGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTAT GAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAA TTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTG CAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAAC AAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTC TCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATC AAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGA GGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGT GGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCT TGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATG GGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAGGCGCCGATTACAAGGACGATGAT GACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCG CTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGT ACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGAT CATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGT AAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAA GTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTG GAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAG TCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTT AAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAA ACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTG GAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGG AAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCT CTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAAC TTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCT ATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATG CTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCAC CTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGAC CTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGT GCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAG TCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 363)
Protein :
MAICQFFLQGRCRFGDRCWNEHPGARGAGGGRQQPQQQPSGNNRRGWNTTSQRYSNVIQPSSFSKSTPWGGSRDQ
EKPYFSSFDSGASTNRKEGFGLSENPFASLSPDEQKDEKKLLEGIVKDMEVWESSGQWMFSVYSPVKKKPNISGF TDISPEELRLEYHNFLTSNNLQSYLNSVQRLINQWRNRVNELKSLNISTKVALLSDVKDGVNQAAPAFGFGSSQA ATFMSPGFPVNNSSSDNAQNFSFKTNSGFAAASSGSPAGFGSSPAFGAAASTSSGISTSAPAFGFGKPEVTSAAS FSFKSPAASSFGSPGFSGLPASLATGPVRAPVAPAFGGGSSVAGFGSPGSHSHTAFSKPSSDTFGNSSISTSLSA SSSIIATDNVLFTPRDKLTVEELEQFQSKKFTLGKIPLKPPPLELLNVGAPGSAGSAAGSGMASNDYTQQATQSY GAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQS SYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSS SGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGY EPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTN KKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRG GPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHM GGNYGDDRRGGRGGAIAGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSR TGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVK VKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALV KGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLG KLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDP IKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGD LELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 364)
CGI: :MCP
DNA:
ATGGCCATTTGTCAATTCTTCCTTCAAGGCCGGTGCCGCTTTGGAGATCGGTGCTGGAACGAACATCCCGGTGCT AGGGGTGCAGGAGGAGGACGGCAGCAACCGCAGCAGCAGCCTTCAGGTAATAATAGACGTGGATGGAATACAACT AGCCAGAGATATTCCAATGTCATCCAGCCATCCAGTTTCTCCAAATCCACACCATGGGGGGGCAGCAGAGATCAA GAAAAGCCATATTTCAGTTCTTTTGATTCTGGAGCTTCAACTAACAGGAAGGAAGGCTTTGGATTGTCTGAGAAC CCATTTGCTTCACTTAGTCCTGATGAGCAGAAAGATGAAAAGAAACTTCTGGAAGGAATTGTAAAAGATATGGAG GTTTGGGAATCATCAGGGCAGTGGATGTTTTCTGTTTATTCACCAGTGAAAAAGAAACCTAATATTTCAGGTTTT ACAGACATTTCACCAGAGGAATTGAGGCTTGAATACCATAACTTCTTAACCAGCAATAACTTACAGAGTTATCTA AATTCTGTCCAACGTTTAATAAATCAATGGAGGAACAGGGTAAATGAACTGAAAAGTCTAAATATATCAACTAAA GTAGCTTTGCTCTCTGATGTAAAGGATGGAGTAAATCAAGCAGCACCTGCATTTGGATTTGGCAGCAGTCAAGCA GCAACATTTATGTCGCCAGGCTTTCCAGTCAATAACAGCAGCAGTGATAATGCTCAGAACTTTAGTTTTAAAACA AACTCTGGATTTGCTGCTGCCTCTTCTGGAAGCCCTGCTGGTTTTGGGAGTTCCCCAGCATTTGGAGCTGCAGCC TCTACCAGTTCAGGTATCTCTACTTCTGCTCCAGCTTTTGGATTTGGGAAGCCTGAAGTCACATCGGCTGCATCA TTTTCATTCAAAAGCCCTGCAGCTTCCAGTTTTGGATCACCTGGATTTTCAGGACTTCCAGCTTCCTTGGCAACA GGTCCTGTCAGAGCTCCAGTGGCCCCAGCCTTTGGAGGTGGCAGTTCTGTGGCTGGTTTTGGTAGTCCGGGCTCA CATTCTCACACTGCTTTTTCTAAGCCATCCAGTGACACTTTTGGAAATAGCAGCATATCCACTTCTCTGTCAGCC TCAAGCAGCATCATTGCAACAGATAATGTGTTATTCACACCCAGAGATAAACTAACAGTAGAAGAACTGGAACAA TTTCAATCCAAGAAATTTACTCTGGGAAAAATTCCATTAAAGCCTCCACCTCTGGAACTTCTAAATGTTGCGATC GCATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCT AACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAAC GGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCT GCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTA ACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGA AACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 365)
Protein :
MAICQFFLQGRCRFGDRCWNEHPGARGAGGGRQQPQQQPSGNNRRGWNTTSQRYSNVIQPSSFSKSTPWGGSRDQ EKPYFSSFDSGASTNRKEGFGLSENPFASLSPDEQKDEKKLLEGIVKDMEVWESSGQWMFSVYSPVKKKPNISGF TDISPEELRLEYHNFLTSNNLQSYLNSVQRLINQWRNRVNELKSLNISTKVALLSDVKDGVNQAAPAFGFGSSQA ATFMSPGFPVNNSSSDNAQNFSFKTNSGFAAASSGSPAGFGSSPAFGAAASTSSGISTSAPAFGFGKPEVTSAAS FSFKSPAASSFGSPGFSGLPASLATGPVRAPVAPAFGGGSSVAGFGSPGSHSHTAFSKPSSDTFGNSSISTSLSA SSSIIATDNVLFTPRDKLTVEELEQFQSKKFTLGKIPLKPPPLELLNVAIAYPYDVPDYAGAPGSAGSAAGSGAS NFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMEL TIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY* (SEQ ID NO: 366)
CGI : : EWSR1 : : MCP DNA:
ATGGCCATTTGTCAATTCTTCCTTCAAGGCCGGTGCCGCTTTGGAGATCGGTGCTGGAACGAACATCCCGGTGCT AGGGGTGCAGGAGGAGGACGGCAGCAACCGCAGCAGCAGCCTTCAGGTAATAATAGACGTGGATGGAATACAACT AGCCAGAGATATTCCAATGTCATCCAGCCATCCAGTTTCTCCAAATCCACACCATGGGGGGGCAGCAGAGATCAA GAAAAGCCATATTTCAGTTCTTTTGATTCTGGAGCTTCAACTAACAGGAAGGAAGGCTTTGGATTGTCTGAGAAC CCATTTGCTTCACTTAGTCCTGATGAGCAGAAAGATGAAAAGAAACTTCTGGAAGGAATTGTAAAAGATATGGAG GTTTGGGAATCATCAGGGCAGTGGATGTTTTCTGTTTATTCACCAGTGAAAAAGAAACCTAATATTTCAGGTTTT ACAGACATTTCACCAGAGGAATTGAGGCTTGAATACCATAACTTCTTAACCAGCAATAACTTACAGAGTTATCTA AATTCTGTCCAACGTTTAATAAATCAATGGAGGAACAGGGTAAATGAACTGAAAAGTCTAAATATATCAACTAAA GTAGCTTTGCTCTCTGATGTAAAGGATGGAGTAAATCAAGCAGCACCTGCATTTGGATTTGGCAGCAGTCAAGCA GCAACATTTATGTCGCCAGGCTTTCCAGTCAATAACAGCAGCAGTGATAATGCTCAGAACTTTAGTTTTAAAACA AACTCTGGATTTGCTGCTGCCTCTTCTGGAAGCCCTGCTGGTTTTGGGAGTTCCCCAGCATTTGGAGCTGCAGCC TCTACCAGTTCAGGTATCTCTACTTCTGCTCCAGCTTTTGGATTTGGGAAGCCTGAAGTCACATCGGCTGCATCA TTTTCATTCAAAAGCCCTGCAGCTTCCAGTTTTGGATCACCTGGATTTTCAGGACTTCCAGCTTCCTTGGCAACA GGTCCTGTCAGAGCTCCAGTGGCCCCAGCCTTTGGAGGTGGCAGTTCTGTGGCTGGTTTTGGTAGTCCGGGCTCA CATTCTCACACTGCTTTTTCTAAGCCATCCAGTGACACTTTTGGAAATAGCAGCATATCCACTTCTCTGTCAGCC TCAAGCAGCATCATTGCAACAGATAATGTGTTATTCACACCCAGAGATAAACTAACAGTAGAAGAACTGGAACAA TTTCAATCCAAGAAATTTACTCTGGGAAAAATTCCATTAAAGCCTCCACCTCTGGAACTTCTAAATGTTATGGCG TCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAA GGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTAT ACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTAT ACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACT GCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTAT GGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAA TCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGG AGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGT TATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGT CAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGT CAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTT TATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGG GGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCT GGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTA GATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTG GCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTG GACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTG GAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAAC AGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCA GGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCC CGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCG GGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTC CCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGC CTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGC CGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAA CAGGCGATCGCATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGT GGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAAC TTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGT CAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAAT ATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTA AAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 367)
Protein :
MAICQFFLQGRCRFGDRCWNEHPGARGAGGGRQQPQQQPSGNNRRGWNTTSQRYSNVIQPSSFSKSTPWGGSRDQ
EKPYFSSFDSGASTNRKEGFGLSENPFASLSPDEQKDEKKLLEGIVKDMEVWESSGQWMFSVYSPVKKKPNISGF
TDISPEELRLEYHNFLTSNNLQSYLNSVQRLINQWRNRVNELKSLNISTKVALLSDVKDGVNQAAPAFGFGSSQA
ATFMSPGFPVNNSSSDNAQNFSFKTNSGFAAASSGSPAGFGSSPAFGAAASTSSGISTSAPAFGFGKPEVTSAAS
FSFKSPAASSFGSPGFSGLPASLATGPVRAPVAPAFGGGSSVAGFGSPGSHSHTAFSKPSSDTFGNSSISTSLSA
SSSIIATDNVLFTPRDKLTVEELEQFQSKKFTLGKIPLKPPPLELLNVMASTDYSTYSQAAAQQGYSAYTAQPTQ
GYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTT ATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPG
SYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPS
QYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSA
GERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYL
DKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGP
GGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFL
PPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLME
QAIAYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVR
QSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY* (SEQ
ID NO: 368)
CGI : : FUS : : PylRS (AF)
DNA:
ATGGCCATTTGTCAATTCTTCCTTCAAGGCCGGTGCCGCTTTGGAGATCGGTGCTGGAACGAACATCCCGGTGCT AGGGGTGCAGGAGGAGGACGGCAGCAACCGCAGCAGCAGCCTTCAGGTAATAATAGACGTGGATGGAATACAACT AGCCAGAGATATTCCAATGTCATCCAGCCATCCAGTTTCTCCAAATCCACACCATGGGGGGGCAGCAGAGATCAA GAAAAGCCATATTTCAGTTCTTTTGATTCTGGAGCTTCAACTAACAGGAAGGAAGGCTTTGGATTGTCTGAGAAC CCATTTGCTTCACTTAGTCCTGATGAGCAGAAAGATGAAAAGAAACTTCTGGAAGGAATTGTAAAAGATATGGAG GTTTGGGAATCATCAGGGCAGTGGATGTTTTCTGTTTATTCACCAGTGAAAAAGAAACCTAATATTTCAGGTTTT ACAGACATTTCACCAGAGGAATTGAGGCTTGAATACCATAACTTCTTAACCAGCAATAACTTACAGAGTTATCTA AATTCTGTCCAACGTTTAATAAATCAATGGAGGAACAGGGTAAATGAACTGAAAAGTCTAAATATATCAACTAAA GTAGCTTTGCTCTCTGATGTAAAGGATGGAGTAAATCAAGCAGCACCTGCATTTGGATTTGGCAGCAGTCAAGCA GCAACATTTATGTCGCCAGGCTTTCCAGTCAATAACAGCAGCAGTGATAATGCTCAGAACTTTAGTTTTAAAACA AACTCTGGATTTGCTGCTGCCTCTTCTGGAAGCCCTGCTGGTTTTGGGAGTTCCCCAGCATTTGGAGCTGCAGCC TCTACCAGTTCAGGTATCTCTACTTCTGCTCCAGCTTTTGGATTTGGGAAGCCTGAAGTCACATCGGCTGCATCA TTTTCATTCAAAAGCCCTGCAGCTTCCAGTTTTGGATCACCTGGATTTTCAGGACTTCCAGCTTCCTTGGCAACA GGTCCTGTCAGAGCTCCAGTGGCCCCAGCCTTTGGAGGTGGCAGTTCTGTGGCTGGTTTTGGTAGTCCGGGCTCA CATTCTCACACTGCTTTTTCTAAGCCATCCAGTGACACTTTTGGAAATAGCAGCATATCCACTTCTCTGTCAGCC TCAAGCAGCATCATTGCAACAGATAATGTGTTATTCACACCCAGAGATAAACTAACAGTAGAAGAACTGGAACAA TTTCAATCCAAGAAATTTACTCTGGGAAAAATTCCATTAAAGCCTCCACCTCTGGAACTTCTAAATGTTGGAGCA CCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTAT GGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGT GGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACA GGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCG TCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGT AGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAG CAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGC AGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGC AGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGA GGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTAT GAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAA TTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTG CAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAAC AAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTC TCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATC AAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGA GGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGT GGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCT TGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATG GGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAGGCGCCGATTACAAGGACGATGAT GACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCG CTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGT ACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGAT CATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGT AAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAA GTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTG GAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAG TCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTT AAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAA ACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTG GAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGG AAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCT CTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAAC TTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCT ATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATG CTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCAC CTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGAC CTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGT GCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAG TCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 369)
Protein :
MAICQFFLQGRCRFGDRCWNEHPGARGAGGGRQQPQQQPSGNNRRGWNTTSQRYSNVIQPSSFSKSTPWGGSRDQ EKPYFSSFDSGASTNRKEGFGLSENPFASLSPDEQKDEKKLLEGIVKDMEVWESSGQWMFSVYSPVKKKPNISGF TDISPEELRLEYHNFLTSNNLQSYLNSVQRLINQWRNRWELKSLNISTKVALLSDVKDGWQAAPAFGFGSSQA ATFMSPGFPWNSSSDNAQNFSFKTNSGFAAASSGSPAGFGSSPAFGAAASTSSGISTSAPAFGFGKPEVTSAAS FSFKSPAASSFGSPGFSGLPASLATGPVRAPVAPAFGGGSSVAGFGSPGSHSHTAFSKPSSDTFGNSSISTSLSA SSSIIATDNVLFTPRDKLTVEELEQFQSKKFTLGKIPLKPPPLELLNVGAPGSAGSAAGSGMASNDYTQQATQSY GAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQS SYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSS SGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGY EPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTN KKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRG GPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHM GGNYGDDRRGGRGGAIAGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSR TGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVK VKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALV KGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLG KLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDP IKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGD LELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 370)
CGI : : FUS : : MCP : : PylRS (AF)
DNA:
ATGGCCATTTGTCAATTCTTCCTTCAAGGCCGGTGCCGCTTTGGAGATCGGTGCTGGAACGAACATCCCGGTGCT AGGGGTGCAGGAGGAGGACGGCAGCAACCGCAGCAGCAGCCTTCAGGTAATAATAGACGTGGATGGAATACAACT AGCCAGAGATATTCCAATGTCATCCAGCCATCCAGTTTCTCCAAATCCACACCATGGGGGGGCAGCAGAGATCAA GAAAAGCCATATTTCAGTTCTTTTGATTCTGGAGCTTCAACTAACAGGAAGGAAGGCTTTGGATTGTCTGAGAAC CCATTTGCTTCACTTAGTCCTGATGAGCAGAAAGATGAAAAGAAACTTCTGGAAGGAATTGTAAAAGATATGGAG GTTTGGGAATCATCAGGGCAGTGGATGTTTTCTGTTTATTCACCAGTGAAAAAGAAACCTAATATTTCAGGTTTT ACAGACATTTCACCAGAGGAATTGAGGCTTGAATACCATAACTTCTTAACCAGCAATAACTTACAGAGTTATCTA AATTCTGTCCAACGTTTAATAAATCAATGGAGGAACAGGGTAAATGAACTGAAAAGTCTAAATATATCAACTAAA GTAGCTTTGCTCTCTGATGTAAAGGATGGAGTAAATCAAGCAGCACCTGCATTTGGATTTGGCAGCAGTCAAGCA GCAACATTTATGTCGCCAGGCTTTCCAGTCAATAACAGCAGCAGTGATAATGCTCAGAACTTTAGTTTTAAAACA AACTCTGGATTTGCTGCTGCCTCTTCTGGAAGCCCTGCTGGTTTTGGGAGTTCCCCAGCATTTGGAGCTGCAGCC TCTACCAGTTCAGGTATCTCTACTTCTGCTCCAGCTTTTGGATTTGGGAAGCCTGAAGTCACATCGGCTGCATCA TTTTCATTCAAAAGCCCTGCAGCTTCCAGTTTTGGATCACCTGGATTTTCAGGACTTCCAGCTTCCTTGGCAACA GGTCCTGTCAGAGCTCCAGTGGCCCCAGCCTTTGGAGGTGGCAGTTCTGTGGCTGGTTTTGGTAGTCCGGGCTCA CATTCTCACACTGCTTTTTCTAAGCCATCCAGTGACACTTTTGGAAATAGCAGCATATCCACTTCTCTGTCAGCC TCAAGCAGCATCATTGCAACAGATAATGTGTTATTCACACCCAGAGATAAACTAACAGTAGAAGAACTGGAACAA TTTCAATCCAAGAAATTTACTCTGGGAAAAATTCCATTAAAGCCTCCACCTCTGGAACTTCTAAATGTTGGAGCA CCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTAT GGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGT GGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACA
GGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCG
TCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGT
AGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAG
CAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGC
AGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGC
AGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGA
GGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTAT
GAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAA
TTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTG
CAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAAC
AAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTC
TCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATC
AAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGA
GGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGT
GGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCT
TGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATG
GGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCATATCCCTATGATGTGCCGGATTAT
GCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGAC
AATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAAC
TCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAA
GTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCC
GACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCA
AACTCCGGCATCTACGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCT
GGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTG
AATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTT
AGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACA
GCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAA
TTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAA
GCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGA
AGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATT
AGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCC
CCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGAC
GAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTG
CAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGAT
CGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGAT
ACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCT
AACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAA
GAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGT
GAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGT
ATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCG
CTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAA
CACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA
(SEQ ID NO: 371)
Protein :
MAICQFFLQGRCRFGDRCWNEHPGARGAGGGRQQPQQQPSGNNRRGWNTTSQRYSNVIQPSSFSKSTPWGGSRDQ
EKPYFSSFDSGASTNRKEGFGLSENPFASLSPDEQKDEKKLLEGIVKDMEVWESSGQWMFSVYSPVKKKPNISGF
TDISPEELRLEYHNFLTSNNLQSYLNSVQRLINQWRNRWELKSLNISTKVALLSDVKDGWQAAPAFGFGSSQA
ATFMSPGFPWNSSSDNAQNFSFKTNSGFAAASSGSPAGFGSSPAFGAAASTSSGISTSAPAFGFGKPEVTSAAS
FSFKSPAASSFGSPGFSGLPASLATGPVRAPVAPAFGGGSSVAGFGSPGSHSHTAFSKPSSDTFGNSSISTSLSA
SSSIIATDNVLFTPRDKLTVEELEQFQSKKFTLGKIPLKPPPLELLNVGAPGSAGSAAGSGMASNDYTQQATQSY
GAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQS
SYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSS
SGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGY
EPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTN
KKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRG
GPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHM GGNYGDDRRGGRGGAIAYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSN SRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAA NSGIYGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEV SRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKK AMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSA PVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVD RGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRK ESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIP LDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 372)
CMP-SaTr : : PylRS (AF)
DNA:
ATGGCTGCCCCGAGAGACAATGTCACTTTATTATTCAAGTTATACTGCTTGGCAGTGATGACCCTGATGGCTGCA GTCTATACCATAGCTTTAAGATACACAAGGACATCAGACAAAGAACTCTACTTTTCAACCACAGCCGTGTGTATC ACAGAAGTTATAAAGTTATTGCTAAGTGTGGGAATTTTAGCTAAAGAAACTGGTAGTCTGGGTAGATTCAAAGCA TCTTTAAGAGAAAATGTCTTGGGGAGCCCCAAGGAACTGTTGAAGTTAAGTGTGCCATCGTTAGTGTATGCTGTT CAGAACAACATGGCTTTCCTAGCTCTTAGCAATCTGGATGCAGCAGTGTACCAGGTGACCTACCAGTTGAAGATT CCGTGTACTGCTTTATGCACTGTTTTAATGTTAAACCGGACACTCAGCAAATTACAGTGGGTTTCAGTTTTTATG CTGTGTGCTGGAGTTACGCTTGTACAGTGGAAACCAGCCCAAGCTACAAAAGTGGTGGTGGAACAAAATCCATTA TTAGGGTTTGGCGCTATAGCTATTGCTGTATTGTGCTCAGGATTTGCAGGAGTATATTTTGAAAAAGTTTTAAAG AGTTCAGATACTTCTCTTTGGGTGAGAAACATTCAAATGTATCTATCAGGGATTATTGTGACATTAGCTGGCGTC TACTTGTCAGATGGAGCTGAAATTAAAGAAAAAGGATTTTTCTATGGTTACACATATTATGTCTGGTTTGTCATC TTTCTTGCAAGTGTTGGTGGCCTCTACACTTCTGTTGTGGTTAAGTACACAGACAATATCATGAAAGGCTTTTCT GCAGCAGCGGCCATTGTCCTTTCCACCATTGCTTCAGTAATGCTGTTTGGATTACAGATAACACTTACCTTTGCC CTGGGTACTCTTCTTGTATGTGTTTCCATATATCTCTATGGATTACCCAGACAAGACACTACATCCATCCAACAA GGAGAAACAGCTTCAAAGGAGAGAGTTATTGGTGTGGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCA GGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACC CTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGC GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTG GAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGT ATTTCTACTAACCTGTAA (SEQ ID NO: 373)
Protein :
MAAPRDNVTLLFKLYCLAVMTLMAAVYTIALRYTRTSDKELYFSTTAVCITEVIKLLLSVGILAKETGSLGRFKA SLRENVLGSPKELLKLSVPSLVYAVQNNMAFLALSNLDAAVYQVTYQLKIPCTALCTVLMLNRTLSKLQWVSVFM LCAGVTLVQWKPAQATKVWEQNPLLGFGAIAIAVLCSGFAGVYFEKVLKSSDTSLWVRNIQMYLSGIIVTLAGV YLSDGAEIKEKGFFYGYTYYVWFVIFLASVGGLYTSVWKYTDNIMKGFSAAAAIVLSTIASVMLFGLQITLTFA LGTLLVCVSIYLYGLPRQDTTSIQQGETASKERVIGVGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLT LDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRV SDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVP ASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELL SRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRP MLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDF KIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNG ISTNL* (SEQ ID NO: 374)
CMP-SaTr : : PylRS (Aft.)
DNA:
ATGGCTGCCCCGAGAGACAATGTCACTTTATTATTCAAGTTATACTGCTTGGCAGTGATGACCCTGATGGCTGCA GTCTATACCATAGCTTTAAGATACACAAGGACATCAGACAAAGAACTCTACTTTTCAACCACAGCCGTGTGTATC ACAGAAGTTATAAAGTTATTGCTAAGTGTGGGAATTTTAGCTAAAGAAACTGGTAGTCTGGGTAGATTCAAAGCA TCTTTAAGAGAAAATGTCTTGGGGAGCCCCAAGGAACTGTTGAAGTTAAGTGTGCCATCGTTAGTGTATGCTGTT CAGAACAACATGGCTTTCCTAGCTCTTAGCAATCTGGATGCAGCAGTGTACCAGGTGACCTACCAGTTGAAGATT CCGTGTACTGCTTTATGCACTGTTTTAATGTTAAACCGGACACTCAGCAAATTACAGTGGGTTTCAGTTTTTATG CTGTGTGCTGGAGTTACGCTTGTACAGTGGAAACCAGCCCAAGCTACAAAAGTGGTGGTGGAACAAAATCCATTA TTAGGGTTTGGCGCTATAGCTATTGCTGTATTGTGCTCAGGATTTGCAGGAGTATATTTTGAAAAAGTTTTAAAG AGTTCAGATACTTCTCTTTGGGTGAGAAACATTCAAATGTATCTATCAGGGATTATTGTGACATTAGCTGGCGTC TACTTGTCAGATGGAGCTGAAATTAAAGAAAAAGGATTTTTCTATGGTTACACATATTATGTCTGGTTTGTCATC TTTCTTGCAAGTGTTGGTGGCCTCTACACTTCTGTTGTGGTTAAGTACACAGACAATATCATGAAAGGCTTTTCT GCAGCAGCGGCCATTGTCCTTTCCACCATTGCTTCAGTAATGCTGTTTGGATTACAGATAACACTTACCTTTGCC CTGGGTACTCTTCTTGTATGTGTTTCCATATATCTCTATGGATTACCCAGACAAGACACTACATCCATCCAACAA GGAGAAACAGCTTCAAAGGAGAGAGTTATTGGTGTGGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCA GGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACC CTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAAC AGCGTGAAAGTGAAAGTCGTTAGC GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTAGAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCGAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTG GAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGG ATCTCTACGAACCTGTAA (SEQ ID NO: 375)
Protein :
MAAPRDNVTLLFKLYCLAVMTLMAAVYTIALRYTRTSDKELYFSTTAVCITEVIKLLLSVGILAKETGSLGRFKA SLRENVLGSPKELLKLSVPSLVYAVQNNMAFLALSNLDAAVYQVTYQLKIPCTALCTVLMLNRTLSKLQWVSVFM LCAGVTLVQWKPAQATKVWEQNPLLGFGAIAIAVLCSGFAGVYFEKVLKSSDTSLWVRNIQMYLSGIIVTLAGV YLSDGAEIKEKGFFYGYTYYVWFVIFLASVGGLYTSVWKYTDNIMKGFSAAAAIVLSTIASVMLFGLQITLTFA LGTLLVCVSIYLYGLPRQDTTSIQQGETASKERVIGVGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLT LDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRV SDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVP ASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELL SRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRP MLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDF KIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNG ISTNL* (SEQ ID NO: 376)
CMP-SaTr : : PylRS (AAAF)
DNA: ATGGCTGCCCCGAGAGACAATGTCACTTTATTATTCAAGTTATACTGCTTGGCAGTGATGACCCTGATGGCTGCA GTCTATACCATAGCTTTAAGATACACAAGGACATCAGACAAAGAACTCTACTTTTCAACCACAGCCGTGTGTATC ACAGAAGTTATAAAGTTATTGCTAAGTGTGGGAATTTTAGCTAAAGAAACTGGTAGTCTGGGTAGATTCAAAGCA TCTTTAAGAGAAAATGTCTTGGGGAGCCCCAAGGAACTGTTGAAGTTAAGTGTGCCATCGTTAGTGTATGCTGTT CAGAACAACATGGCTTTCCTAGCTCTTAGCAATCTGGATGCAGCAGTGTACCAGGTGACCTACCAGTTGAAGATT CCGTGTACTGCTTTATGCACTGTTTTAATGTTAAACCGGACACTCAGCAAATTACAGTGGGTTTCAGTTTTTATG CTGTGTGCTGGAGTTACGCTTGTACAGTGGAAACCAGCCCAAGCTACAAAAGTGGTGGTGGAACAAAATCCATTA TTAGGGTTTGGCGCTATAGCTATTGCTGTATTGTGCTCAGGATTTGCAGGAGTATATTTTGAAAAAGTTTTAAAG AGTTCAGATACTTCTCTTTGGGTGAGAAACATTCAAATGTATCTATCAGGGATTATTGTGACATTAGCTGGCGTC TACTTGTCAGATGGAGCTGAAATTAAAGAAAAAGGATTTTTCTATGGTTACACATATTATGTCTGGTTTGTCATC TTTCTTGCAAGTGTTGGTGGCCTCTACACTTCTGTTGTGGTTAAGTACACAGACAATATCATGAAAGGCTTTTCT GCAGCAGCGGCCATTGTCCTTTCCACCATTGCTTCAGTAATGCTGTTTGGATTACAGATAACACTTACCTTTGCC CTGGGTACTCTTCTTGTATGTGTTTCCATATATCTCTATGGATTACCCAGACAAGACACTACATCCATCCAACAA GGAGAAACAGCTTCAAAGGAGAGAGTTATTGGTGTGGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCA GGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACC CTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGC GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTG GAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGT ATTTCTACTAACCTGTAA (SEQ ID NO: 377)
Protein :
MAAPRDNVTLLFKLYCLAVMTLMAAVYTIALRYTRTSDKELYFSTTAVCITEVIKLLLSVGILAKETGSLGRFKA SLRENVLGSPKELLKLSVPSLVYAVQNNMAFLALSNLDAAVYQVTYQLKIPCTALCTVLMLNRTLSKLQWVSVFM LCAGVTLVQWKPAQATKVWEQNPLLGFGAIAIAVLCSGFAGVYFEKVLKSSDTSLWVRNIQMYLSGIIVTLAGV YLSDGAEIKEKGFFYGYTYYVWFVIFLASVGGLYTSVWKYTDNIMKGFSAAAAIVLSTIASVMLFGLQITLTFA LGTLLVCVSIYLYGLPRQDTTSIQQGETASKERVIGVGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLT LDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRV SDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVP ASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELL SRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRP MLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDF KIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNG ISTNL* (SEQ ID NO: 378)
CMP-SaTr : : FUS : : PylRS (AA)
DNA:
ATGGCTGCCCCGAGAGACAATGTCACTTTATTATTCAAGTTATACTGCTTGGCAGTGATGACCCTGATGGCTGCA GTCTATACCATAGCTTTAAGATACACAAGGACATCAGACAAAGAACTCTACTTTTCAACCACAGCCGTGTGTATC ACAGAAGTTATAAAGTTATTGCTAAGTGTGGGAATTTTAGCTAAAGAAACTGGTAGTCTGGGTAGATTCAAAGCA TCTTTAAGAGAAAATGTCTTGGGGAGCCCCAAGGAACTGTTGAAGTTAAGTGTGCCATCGTTAGTGTATGCTGTT CAGAACAACATGGCTTTCCTAGCTCTTAGCAATCTGGATGCAGCAGTGTACCAGGTGACCTACCAGTTGAAGATT CCGTGTACTGCTTTATGCACTGTTTTAATGTTAAACCGGACACTCAGCAAATTACAGTGGGTTTCAGTTTTTATG CTGTGTGCTGGAGTTACGCTTGTACAGTGGAAACCAGCCCAAGCTACAAAAGTGGTGGTGGAACAAAATCCATTA TTAGGGTTTGGCGCTATAGCTATTGCTGTATTGTGCTCAGGATTTGCAGGAGTATATTTTGAAAAAGTTTTAAAG AGTTCAGATACTTCTCTTTGGGTGAGAAACATTCAAATGTATCTATCAGGGATTATTGTGACATTAGCTGGCGTC TACTTGTCAGATGGAGCTGAAATTAAAGAAAAAGGATTTTTCTATGGTTACACATATTATGTCTGGTTTGTCATC TTTCTTGCAAGTGTTGGTGGCCTCTACACTTCTGTTGTGGTTAAGTACACAGACAATATCATGAAAGGCTTTTCT GCAGCAGCGGCCATTGTCCTTTCCACCATTGCTTCAGTAATGCTGTTTGGATTACAGATAACACTTACCTTTGCC CTGGGTACTCTTCTTGTATGTGTTTCCATATATCTCTATGGATTACCCAGACAAGACACTACATCCATCCAACAA GGAGAAACAGCTTCAAAGGAGAGAGTTATTGGTGTGGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGC ATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTAT TCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATAT GGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATAT GGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTAT GGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCC CAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAAT CCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGT AACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGT GGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGC GGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGC AGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCAT GACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCT GTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTAC ACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCT ATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTT AATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGT GGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGAC TGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCT AAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGC AGAGGTGGTGCGATCGCAGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCT GCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCG CTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAG GTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGT ACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAAC AAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAA AAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCT GGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGC ATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCT GCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAA GACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGAC CTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTG GATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAAT GATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTG TATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGT AAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACT CGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGC TGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATT CCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTG AAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA
(SEQ ID NO: 379)
Protein :
MAAPRDNVTLLFKLYCLAVMTLMAAVYTIALRYTRTSDKELYFSTTAVCITEVIKLLLSVGILAKETGSLGRFKA SLRENVLGSPKELLKLSVPSLVYAVQNNMAFLALSNLDAAVYQVTYQLKIPCTALCTVLMLNRTLSKLQWVSVFM LCAGVTLVQWKPAQATKVWEQNPLLGFGAIAIAVLCSGFAGVYFEKVLKSSDTSLWVRNIQMYLSGIIVTLAGV YLSDGAEIKEKGFFYGYTYYVWFVIFLASVGGLYTSVWKYTDNIMKGFSAAAAIVLSTIASVMLFGLQITLTFA LGTLLVCVSIYLYGLPRQDTTSIQQGETASKERVIGVGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGY SQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGY GQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGG NYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGG RGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLY TDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGG GSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGG RGGAIAGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHE VSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTK KAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMS APVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFV DRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYR KESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPI PLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 380)
CMP-SaTr : : FUS : : PylRS (AAAF)
DNA:
ATGGCTGCCCCGAGAGACAATGTCACTTTATTATTCAAGTTATACTGCTTGGCAGTGATGACCCTGATGGCTGCA GTCTATACCATAGCTTTAAGATACACAAGGACATCAGACAAAGAACTCTACTTTTCAACCACAGCCGTGTGTATC ACAGAAGTTATAAAGTTATTGCTAAGTGTGGGAATTTTAGCTAAAGAAACTGGTAGTCTGGGTAGATTCAAAGCA TCTTTAAGAGAAAATGTCTTGGGGAGCCCCAAGGAACTGTTGAAGTTAAGTGTGCCATCGTTAGTGTATGCTGTT CAGAACAACATGGCTTTCCTAGCTCTTAGCAATCTGGATGCAGCAGTGTACCAGGTGACCTACCAGTTGAAGATT CCGTGTACTGCTTTATGCACTGTTTTAATGTTAAACCGGACACTCAGCAAATTACAGTGGGTTTCAGTTTTTATG CTGTGTGCTGGAGTTACGCTTGTACAGTGGAAACCAGCCCAAGCTACAAAAGTGGTGGTGGAACAAAATCCATTA TTAGGGTTTGGCGCTATAGCTATTGCTGTATTGTGCTCAGGATTTGCAGGAGTATATTTTGAAAAAGTTTTAAAG AGTTCAGATACTTCTCTTTGGGTGAGAAACATTCAAATGTATCTATCAGGGATTATTGTGACATTAGCTGGCGTC TACTTGTCAGATGGAGCTGAAATTAAAGAAAAAGGATTTTTCTATGGTTACACATATTATGTCTGGTTTGTCATC TTTCTTGCAAGTGTTGGTGGCCTCTACACTTCTGTTGTGGTTAAGTACACAGACAATATCATGAAAGGCTTTTCT GCAGCAGCGGCCATTGTCCTTTCCACCATTGCTTCAGTAATGCTGTTTGGATTACAGATAACACTTACCTTTGCC CTGGGTACTCTTCTTGTATGTGTTTCCATATATCTCTATGGATTACCCAGACAAGACACTACATCCATCCAACAA GGAGAAACAGCTTCAAAGGAGAGAGTTATTGGTGTGGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGC ATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTAT TCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATAT GGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATAT GGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTAT GGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCC CAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAAT CCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGT AACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGT GGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGC GGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGC AGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCAT GACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCT GTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTAC ACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCT ATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTT AATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGT GGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGAC TGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCT AAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGC AGAGGTGGTGCGATCGCAGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCT GCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCG CTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAG GTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGT ACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAAC AAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAA AAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCT GGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGC ATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCT GCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAA GACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGAC CTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTG GATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAAT GATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTG GCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGT AAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACT CGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGC TGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATC CCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTA AAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA
(SEQ ID NO: 381)
Protein :
MAAPRDNVTLLFKLYCLAVMTLMAAVYTIALRYTRTSDKELYFSTTAVCITEVIKLLLSVGILAKETGSLGRFKA SLRENVLGSPKELLKLSVPSLVYAVQNNMAFLALSNLDAAVYQVTYQLKIPCTALCTVLMLNRTLSKLQWVSVFM LCAGVTLVQWKPAQATKVWEQNPLLGFGAIAIAVLCSGFAGVYFEKVLKSSDTSLWVRNIQMYLSGIIVTLAGV YLSDGAEIKEKGFFYGYTYYVWFVIFLASVGGLYTSVWKYTDNIMKGFSAAAAIVLSTIASVMLFGLQITLTFA LGTLLVCVSIYLYGLPRQDTTSIQQGETASKERVIGVGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGY SQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGY GQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGG NYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGG RGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLY TDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGG GSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGG RGGAIAGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHE VSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTK KAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMS APVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFV DRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYR KESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPI PLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 382)
CMP-SaTr : :MCP
DNA:
ATGGCTGCCCCGAGAGACAATGTCACTTTATTATTCAAGTTATACTGCTTGGCAGTGATGACCCTGATGGCTGCA GTCTATACCATAGCTTTAAGATACACAAGGACATCAGACAAAGAACTCTACTTTTCAACCACAGCCGTGTGTATC ACAGAAGTTATAAAGTTATTGCTAAGTGTGGGAATTTTAGCTAAAGAAACTGGTAGTCTGGGTAGATTCAAAGCA TCTTTAAGAGAAAATGTCTTGGGGAGCCCCAAGGAACTGTTGAAGTTAAGTGTGCCATCGTTAGTGTATGCTGTT CAGAACAACATGGCTTTCCTAGCTCTTAGCAATCTGGATGCAGCAGTGTACCAGGTGACCTACCAGTTGAAGATT CCGTGTACTGCTTTATGCACTGTTTTAATGTTAAACCGGACACTCAGCAAATTACAGTGGGTTTCAGTTTTTATG CTGTGTGCTGGAGTTACGCTTGTACAGTGGAAACCAGCCCAAGCTACAAAAGTGGTGGTGGAACAAAATCCATTA TTAGGGTTTGGCGCTATAGCTATTGCTGTATTGTGCTCAGGATTTGCAGGAGTATATTTTGAAAAAGTTTTAAAG AGTTCAGATACTTCTCTTTGGGTGAGAAACATTCAAATGTATCTATCAGGGATTATTGTGACATTAGCTGGCGTC TACTTGTCAGATGGAGCTGAAATTAAAGAAAAAGGATTTTTCTATGGTTACACATATTATGTCTGGTTTGTCATC TTTCTTGCAAGTGTTGGTGGCCTCTACACTTCTGTTGTGGTTAAGTACACAGACAATATCATGAAAGGCTTTTCT GCAGCAGCGGCCATTGTCCTTTCCACCATTGCTTCAGTAATGCTGTTTGGATTACAGATAACACTTACCTTTGCC CTGGGTACTCTTCTTGTATGTGTTTCCATATATCTCTATGGATTACCCAGACAAGACACTACATCCATCCAACAA GGAGAAACAGCTTCAAAGGAGAGAGTTATTGGTGTGGCGATCGCATATCCCTATGATGTGCCGGATTATGCTGGA GCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGC GGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGT TCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAG GTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGC GAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCC GGCATCTACTAA (SEQ ID NO: 383)
Protein :
MAAPRDNVTLLFKLYCLAVMTLMAAVYTIALRYTRTSDKELYFSTTAVCITEVIKLLLSVGILAKETGSLGRFKA
SLRENVLGSPKELLKLSVPSLVYAVQNNMAFLALSNLDAAVYQVTYQLKIPCTALCTVLMLNRTLSKLQWVSVFM LCAGVTLVQWKPAQATKVWEQNPLLGFGAIAIAVLCSGFAGVYFEKVLKSSDTSLWVRNIQMYLSGIIVTLAGV YLSDGAEIKEKGFFYGYTYYVWFVIFLASVGGLYTSVWKYTDNIMKGFSAAAAIVLSTIASVMLFGLQITLTFA LGTLLVCVSIYLYGLPRQDTTSIQQGETASKERVIGVAIAYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNG GTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDC ELIVKAMQGLLKDGNPIPSAIAANSGIY* (SEQ ID NO: 384)
CMP-SaTr : :EWSR1: :MCP
DNA:
ATGGCTGCCCCGAGAGACAATGTCACTTTATTATTCAAGTTATACTGCTTGGCAGTGATGACCCTGATGGCTGCA GTCTATACCATAGCTTTAAGATACACAAGGACATCAGACAAAGAACTCTACTTTTCAACCACAGCCGTGTGTATC ACAGAAGTTATAAAGTTATTGCTAAGTGTGGGAATTTTAGCTAAAGAAACTGGTAGTCTGGGTAGATTCAAAGCA TCTTTAAGAGAAAATGTCTTGGGGAGCCCCAAGGAACTGTTGAAGTTAAGTGTGCCATCGTTAGTGTATGCTGTT CAGAACAACATGGCTTTCCTAGCTCTTAGCAATCTGGATGCAGCAGTGTACCAGGTGACCTACCAGTTGAAGATT CCGTGTACTGCTTTATGCACTGTTTTAATGTTAAACCGGACACTCAGCAAATTACAGTGGGTTTCAGTTTTTATG CTGTGTGCTGGAGTTACGCTTGTACAGTGGAAACCAGCCCAAGCTACAAAAGTGGTGGTGGAACAAAATCCATTA TTAGGGTTTGGCGCTATAGCTATTGCTGTATTGTGCTCAGGATTTGCAGGAGTATATTTTGAAAAAGTTTTAAAG AGTTCAGATACTTCTCTTTGGGTGAGAAACATTCAAATGTATCTATCAGGGATTATTGTGACATTAGCTGGCGTC TACTTGTCAGATGGAGCTGAAATTAAAGAAAAAGGATTTTTCTATGGTTACACATATTATGTCTGGTTTGTCATC TTTCTTGCAAGTGTTGGTGGCCTCTACACTTCTGTTGTGGTTAAGTACACAGACAATATCATGAAAGGCTTTTCT GCAGCAGCGGCCATTGTCCTTTCCACCATTGCTTCAGTAATGCTGTTTGGATTACAGATAACACTTACCTTTGCC CTGGGTACTCTTCTTGTATGTGTTTCCATATATCTCTATGGATTACCCAGACAAGACACTACATCCATCCAACAA GGAGAAACAGCTTCAAAGGAGAGAGTTATTGGTGTGATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCA GCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAA CAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAG ACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAG CCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCA GCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGA CCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTA GGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCA TCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACC TATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACT AGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAG CAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCA GGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGC AGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGA CCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATT TATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAG ATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCC ACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGC AAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGC AGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGA GGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGA AACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACA GAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGT GGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTC AGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGA AGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATCGCATATCCCTATGATGTGCCGGAT TATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTC GACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCT AACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATC AAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAAT TCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCA GCAAACTCCGGCATCTACTAA (SEQ ID NO: 385)
Protein :
MAAPRDNVTLLFKLYCLAVMTLMAAVYTIALRYTRTSDKELYFSTTAVCITEVIKLLLSVGILAKETGSLGRFKA
SLRENVLGSPKELLKLSVPSLVYAVQNNMAFLALSNLDAAVYQVTYQLKIPCTALCTVLMLNRTLSKLQWVSVFM LCAGVTLVQWKPAQATKVWEQNPLLGFGAIAIAVLCSGFAGVYFEKVLKSSDTSLWVRNIQMYLSGIIVTLAGV YLSDGAEIKEKGFFYGYTYYVWFVIFLASVGGLYTSVWKYTDNIMKGFSAAAAIVLSTIASVMLFGLQITLTFA LGTLLVCVSIYLYGLPRQDTTSIQQGETASKERVIGVMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQ QSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYA AQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPP SYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQ QSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGG PMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDA TVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMG GRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDR GRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAYPYDVPD YAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTI KVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY* (SEQ ID NO: 386)
CMP-SaTr : : PylRS (AF) EWSR1 : : 4clN22
DNA:
ATGGCTGCCCCGAGAGACAATGTCACTTTATTATTCAAGTTATACTGCTTGGCAGTGATGACCCTGATGGCTGCA GTCTATACCATAGCTTTAAGATACACAAGGACATCAGACAAAGAACTCTACTTTTCAACCACAGCCGTGTGTATC ACAGAAGTTATAAAGTTATTGCTAAGTGTGGGAATTTTAGCTAAAGAAACTGGTAGTCTGGGTAGATTCAAAGCA TCTTTAAGAGAAAATGTCTTGGGGAGCCCCAAGGAACTGTTGAAGTTAAGTGTGCCATCGTTAGTGTATGCTGTT CAGAACAACATGGCTTTCCTAGCTCTTAGCAATCTGGATGCAGCAGTGTACCAGGTGACCTACCAGTTGAAGATT CCGTGTACTGCTTTATGCACTGTTTTAATGTTAAACCGGACACTCAGCAAATTACAGTGGGTTTCAGTTTTTATG CTGTGTGCTGGAGTTACGCTTGTACAGTGGAAACCAGCCCAAGCTACAAAAGTGGTGGTGGAACAAAATCCATTA TTAGGGTTTGGCGCTATAGCTATTGCTGTATTGTGCTCAGGATTTGCAGGAGTATATTTTGAAAAAGTTTTAAAG AGTTCAGATACTTCTCTTTGGGTGAGAAACATTCAAATGTATCTATCAGGGATTATTGTGACATTAGCTGGCGTC TACTTGTCAGATGGAGCTGAAATTAAAGAAAAAGGATTTTTCTATGGTTACACATATTATGTCTGGTTTGTCATC TTTCTTGCAAGTGTTGGTGGCCTCTACACTTCTGTTGTGGTTAAGTACACAGACAATATCATGAAAGGCTTTTCT GCAGCAGCGGCCATTGTCCTTTCCACCATTGCTTCAGTAATGCTGTTTGGATTACAGATAACACTTACCTTTGCC CTGGGTACTCTTCTTGTATGTGTTTCCATATATCTCTATGGATTACCCAGACAAGACACTACATCCATCCAACAA GGAGAAACAGCTTCAAAGGAGAGAGTTATTGGTGTGATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCA GCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAA CAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAG ACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAG CCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCA GCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGA CCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTA GGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCA TCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACC TATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACT AGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAG CAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCA GGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGC AGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGA CCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATT TATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAG ATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCC ACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGC AAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGC AGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGA GGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGA AACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACA GAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGT GGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTC AGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGA AGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATCGCAGGAGCACCAGGAAGTGCTGGT TCTGCTGCTGGTAGTGGAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCACAAACACGA CGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGC GCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAG AAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCT GGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCT GCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGAC GCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCT AGAGGGCCCGTTTAA (SEQ ID NO: 387)
Protein :
MAAPRDNVTLLFKLYCLAVMTLMAAVYTIALRYTRTSDKELYFSTTAVCITEVIKLLLSVGILAKETGSLGRFKA SLRENVLGSPKELLKLSVPSLVYAVQNNMAFLALSNLDAAVYQVTYQLKIPCTALCTVLMLNRTLSKLQWVSVFM LCAGVTLVQWKPAQATKVWEQNPLLGFGAIAIAVLCSGFAGVYFEKVLKSSDTSLWVRNIQMYLSGIIVTLAGV YLSDGAEIKEKGFFYGYTYYVWFVIFLASVGGLYTSVWKYTDNIMKGFSAAAAIVLSTIASVMLFGLQITLTFA LGTLLVCVSIYLYGLPRQDTTSIQQGETASKERVIGVMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQ QSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYA AQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPP SYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQ QSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGG PMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDA TVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMG GRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDR GRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAGAPGSAG SAAGSGEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAE KQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMD AQTRRRERRAEKQAQWKAANPPLESRGPV* (SEQ ID NO: 388)
CMP-SaTr : : FUS : : PylRS (AF)
DNA:
ATGGCTGCCCCGAGAGACAATGTCACTTTATTATTCAAGTTATACTGCTTGGCAGTGATGACCCTGATGGCTGCA GTCTATACCATAGCTTTAAGATACACAAGGACATCAGACAAAGAACTCTACTTTTCAACCACAGCCGTGTGTATC ACAGAAGTTATAAAGTTATTGCTAAGTGTGGGAATTTTAGCTAAAGAAACTGGTAGTCTGGGTAGATTCAAAGCA TCTTTAAGAGAAAATGTCTTGGGGAGCCCCAAGGAACTGTTGAAGTTAAGTGTGCCATCGTTAGTGTATGCTGTT CAGAACAACATGGCTTTCCTAGCTCTTAGCAATCTGGATGCAGCAGTGTACCAGGTGACCTACCAGTTGAAGATT CCGTGTACTGCTTTATGCACTGTTTTAATGTTAAACCGGACACTCAGCAAATTACAGTGGGTTTCAGTTTTTATG CTGTGTGCTGGAGTTACGCTTGTACAGTGGAAACCAGCCCAAGCTACAAAAGTGGTGGTGGAACAAAATCCATTA TTAGGGTTTGGCGCTATAGCTATTGCTGTATTGTGCTCAGGATTTGCAGGAGTATATTTTGAAAAAGTTTTAAAG AGTTCAGATACTTCTCTTTGGGTGAGAAACATTCAAATGTATCTATCAGGGATTATTGTGACATTAGCTGGCGTC TACTTGTCAGATGGAGCTGAAATTAAAGAAAAAGGATTTTTCTATGGTTACACATATTATGTCTGGTTTGTCATC TTTCTTGCAAGTGTTGGTGGCCTCTACACTTCTGTTGTGGTTAAGTACACAGACAATATCATGAAAGGCTTTTCT GCAGCAGCGGCCATTGTCCTTTCCACCATTGCTTCAGTAATGCTGTTTGGATTACAGATAACACTTACCTTTGCC CTGGGTACTCTTCTTGTATGTGTTTCCATATATCTCTATGGATTACCCAGACAAGACACTACATCCATCCAACAA GGAGAAACAGCTTCAAAGGAGAGAGTTATTGGTGTGGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGC ATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTAT TCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATAT GGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATAT GGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTAT GGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCC CAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAAT CCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGT AACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGT GGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGC GGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGC AGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCAT GACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCT GTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTAC ACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCT ATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTT AATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGT GGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGAC TGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCT AAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGC AGAGGTGGTGCGATCGCAGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCT GCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCG CTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAG GTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGT ACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAAC AAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAA AAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCT GGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGC ATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCT GCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAA GACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGAC CTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTG GATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAAT GATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTG GCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGT AAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACT CGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGC TGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATC CCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTA AAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA
(SEQ ID NO: 389)
Protein :
MAAPRDNVTLLFKLYCLAVMTLMAAVYTIALRYTRTSDKELYFSTTAVCITEVIKLLLSVGILAKETGSLGRFKA SLRENVLGSPKELLKLSVPSLVYAVQNNMAFLALSNLDAAVYQVTYQLKIPCTALCTVLMLNRTLSKLQWVSVFM LCAGVTLVQWKPAQATKVWEQNPLLGFGAIAIAVLCSGFAGVYFEKVLKSSDTSLWVRNIQMYLSGIIVTLAGV YLSDGAEIKEKGFFYGYTYYVWFVIFLASVGGLYTSVWKYTDNIMKGFSAAAAIVLSTIASVMLFGLQITLTFA LGTLLVCVSIYLYGLPRQDTTSIQQGETASKERVIGVGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGY SQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGY GQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGG NYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGG RGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLY TDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGG GSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGG RGGAIAGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHE VSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTK KAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMS APVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFV DRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYR KESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPI PLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 390)
P450 2C1I-27 : : PylRS (AF)
DNA:
ATGGACCCCGTGGTCGTGCTGGGCCTGTGCCTGTCATGCCTGCTGCTGCTGAGCCTGTGGAAGCAGAGCTACGGC GGAGGCGCGATCGCAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCT GGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGAT AAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAA CACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGC TCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAG GATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACC CGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCA CAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTG AGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACA AGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTG AATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGT AAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGC TTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGC ATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCA CCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCG TGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCA GGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTG GGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTG GGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTG CTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACT AACCTGTAA (SEQ ID NO: 391)
Protein :
MDPVWLGLCLSCLLLLSLWKQSYGGGAIAGKPIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDD KKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDE DLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASV STSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRR KKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLA PNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIV GDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGIST NL* (SEQ ID NO: 392)
P450 2C1I-27 : :MCP
DNA:
ATGGACCCCGTGGTCGTGCTGGGCCTGTGCCTGTCATGCCTGCTGCTGCTGAGCCTGTGGAAGCAGAGCTACGGC GGAGGCGCGATCGCATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGT AGTGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGC AACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTT CGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTA AATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTC CTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 393)
Protein :
MDPVWLGLCLSCLLLLSLWKQSYGGGAIAYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPS NFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGL LKDGNPIPSAIAANSGIY* (SEQ ID NO: 394)
P450 2C1I-27 : : FUS : : PylRS (AF)
DNA:
ATGGACCCCGTGGTCGTGCTGGGCCTGTGCCTGTCATGCCTGCTGCTGCTGAGCCTGTGGAAGCAGAGCTACGGC GGAGGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAG GGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCA GGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAG GGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCT GGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGG CAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGC TATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGT GGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGAC CAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGC GGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGT GGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCA CGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATT GAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAAT TTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAA GCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCA GACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGA GGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCT GGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAG GCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGT GGTGGCAGAGGAGGCGCGATCGCAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCA GGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACC CTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGC GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTG GAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGT ATTTCTACTAACCTGTAA (SEQ ID NO: 395)
Protein :
MDPVWLGLCLSCLLLLSLWKQSYGGGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTS GYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYG QPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQD QSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGS RHDSEQDNSDNNTI VQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVS DDPPSAK AAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRA GDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAGKPIPNPLLGLDSTGAP GSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVW NSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEA AQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLE VLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIE RMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQ MGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGL ERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 396)
P450 2C1I-27: :EWSR1: :MCP
DNA:
ATGGACCCCGTGGTCGTGCTGGGCCTGTGCCTGTCATGCCTGCTGCTGCTGAGCCTGTGGAAGCAGAGCTACGGC GGAGGCATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCC CAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACT GATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCT CCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTAT GATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCT TATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACT AGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCC CAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACA CAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAG AGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGC CAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGT AGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGAT AACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGA ATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTA
GGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACT
CTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATC
CACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCC
AAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAG
CCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGT
CCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCT
CCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAG
TGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCT
GAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGA
GGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGC
TTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGA
CCTTTGATGGAACAGGCGATCGCATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCT
GCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTC
GCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACC
TGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGT
TCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATG
CAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA ( SEQ ID
NO: 397)
Protein :
MDPVWLGLCLSCLLLLSLWKQSYGGGMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPT DVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPA YPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSST QPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPS SMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDL GPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTA KAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFP PRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRG GRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAYPYDVPDYAGAPGSAGS AAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWR SYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY* (SEQ ID NO: 398)
P450 2C1I-27 : : FUS : : MCP : : PylRS (AF)
DNA:
ATGGACCCCGTGGTCGTGCTGGGCCTGTGCCTGTCATGCCTGCTGCTGCTGAGCCTGTGGAAGCAGAGCTACGGC GGAGGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAG GGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCA GGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAG GGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCT GGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGG CAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGC TATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGT GGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGAC CAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGC GGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGT GGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCA CGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATT GAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAAT TTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAA GCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCA GACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGA GGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCT GGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAG GCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGT GGTGGCAGAGGAGGCGCGATCGCATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCT GCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTC GCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACC TGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGT TCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATG CAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGCGCCGATTAC AAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTG CAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTG TGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATG GCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATAT CGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAA ACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCC CCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTT TCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCT AGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTG ACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCG TTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAG AACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCG ATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGT GTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCC CTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAG GAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGAT TTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTC ATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAA CCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCT GCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 399)
Protein :
MDPVWLGLCLSCLLLLSLWKQSYGGGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTS GYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYG QPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQD QSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGS RHDSEQDNSDNNTI VQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVS DDPPSAK AAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRA GDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAYPYDVPDYAGAPGSAGS AAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWR SYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGADYKDDDDKGAPGSAGSAAGSGACPVPL QLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKY RKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PV STQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKP FRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFR VDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITD FLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRA ARSESYYNGISTNL* (SEQ ID NO: 400)
P450 2C1I-29 : : FUS : : MCP : : PylRS (AF)
DNA:
ATGGACCCCGTGGTCGTGCTGGGCCTGTGCCTGTCATGCCTGCTGCTGCTGAGCCTGTGGAAGCAGAGCTACGGC GGAGGCAAGCTGATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCC GGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGAC ACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACT CCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCC TACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGC TATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAG CAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGT GGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAAT CAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGT GGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGT GGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAA GGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTT ACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATG ATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCA GCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGC CGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGC TATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAG CGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAG TGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGAT CGTCGTGGTGGCAGAGGAGGCGCGATCGCATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCT GGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTG ACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAA GTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCC TGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAG GCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGCGCC GATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTG CCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACT GGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATT GAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCAC AAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAG GACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCT CGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATT CCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCC ACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCA GCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGC AAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAA CGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAA TCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATT TTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGAC CGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACAT CTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATC ACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTG GATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATC GACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAA CGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 401)
Protein :
MDPVWLGLCLSCLLLLSLWKQSYGGGKLMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTD TSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSS YGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGN QDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQ GSRHDSEQDNSDNNTI VQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVS DDPPS AKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQ RAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAYPYDVPDYAGAPGSA GSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGA WRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGADYKDDDDKGAPGSAGSAAGSGACPV PLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHH KYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSG KPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQI FRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESII TDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIK RAARSESYYNGISTNL* (SEQ ID NO: 402)
EB1 : : PylRS (AF)
DNA:
ATGGCAGTGAACGTATACTCAACGTCAGTGACCAGTGATAACCTAAGTCGACATGACATGCTGGCCTGGATCAAT
GAGTCTCTGCAGTTGAATCTGACAAAGATCGAACAGTTGTGCTCAGGGGCTGCGTATTGTCAGTTTATGGACATG CTGTTCCCTGGCTCCATTGCCTTGAAGAAAGTGAAATTCCAAGCTAAGCTAGAACACGAGTACATCCAGAACTTC AAAATACTACAAGCAGGTTTTAAGAGAATGGGTGTTGACAAAATAATTCCTGTGGACAAATTAGTAAAAGGAAAG TTTCAGGACAATTTTGAATTCGTTCAGTGGTTCAAGAAGTTTTTCGATGCAAACTATGATGGAAAAGACTATGAC CCTGTGGCTGCCAGACAAGGTCAAGAAACTGCAGTGGCTCCTTCCCTTGTTGCTCCAGCTCTGAATAAACCGAAG AAACCTCTCACTTCTAGCAGTGCAGCTCCCCAGAGGCCCATCTCAACACAGAGAACCGCTGCGGCTCCTAAGGCT GGCCCTGGTGTGGTGCGAAAGAACCCTGGTGTGGGCAACGGAGATGACGAGGCAGCTGAGTTGATGCAGCAGGTC AACGTATTGAAACTTACTGTTGAAGACTTGGAGAAAGAGAGGGATTTCTACTTCGGAAAGCTACGGAACATTGAA TTGATTTGCCAGGAGAACGAGGGGGAAAACGACCCTGTATTGCAGAGGATTGTAGACATTCTGTATGCCACAGAT GAAGGCTTTGTGATACCTGATGAAGGGGGCCCACAGGAGGAGCAAGAAGAGTATGGCGCCGATTACAAGGACGAT GATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCG CCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGT CGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGC GATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACC TGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTG AAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCA CTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAG GAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTG GTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCC CAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAA CTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTG GGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATT CCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAA AACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGAT CCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACC ATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAAC CACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGC GACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATC GGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCC GAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 403)
Protein :
MAWVYSTSVTSDNLSRHDMLAWINESLQLNLTKIEQLCSGAAYCQFMDMLFPGSIALKKVKFQAKLEHEYIQNF KILQAGFKRMGVDKIIPVDKLVKGKFQDNFEFVQWFKKFFDANYDGKDYDPVAARQGQETAVAPSLVAPALNKPK KPLTSSSAAPQRPISTQRTAAAPKAGPGWRKNPGVGNGDDEAAELMQQWVLKLTVEDLEKERDFYFGKLRNIE LICQENEGENDPVLQRIVDILYATDEGFVIPDEGGPQEEQEEYGADYKDDDDKGAPGSAGSAAGSGACPVPLQLP PLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKT CKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQ ESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRE LESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDK NFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLN HLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARS ESYYNGISTNL* (SEQ ID NO: 404)
EB1 : : PylRS (AA)
DNA:
ATGGCAGTGAACGTATACTCAACGTCAGTGACCAGTGATAACCTAAGTCGACATGACATGCTGGCCTGGATCAAT GAGTCTCTGCAGTTGAATCTGACAAAGATCGAACAGTTGTGCTCAGGGGCTGCGTATTGTCAGTTTATGGACATG CTGTTCCCTGGCTCCATTGCCTTGAAGAAAGTGAAATTCCAAGCTAAGCTAGAACACGAGTACATCCAGAACTTC AAAATACTACAAGCAGGTTTTAAGAGAATGGGTGTTGACAAAATAATTCCTGTGGACAAATTAGTAAAAGGAAAG TTTCAGGACAATTTTGAATTCGTTCAGTGGTTCAAGAAGTTTTTCGATGCAAACTATGATGGAAAAGACTATGAC CCTGTGGCTGCCAGACAAGGTCAAGAAACTGCAGTGGCTCCTTCCCTTGTTGCTCCAGCTCTGAATAAACCGAAG AAACCTCTCACTTCTAGCAGTGCAGCTCCCCAGAGGCCCATCTCAACACAGAGAACCGCTGCGGCTCCTAAGGCT GGCCCTGGTGTGGTGCGAAAGAACCCTGGTGTGGGCAACGGAGATGACGAGGCAGCTGAGTTGATGCAGCAGGTC AACGTATTGAAACTTACTGTTGAAGACTTGGAGAAAGAGAGGGATTTCTACTTCGGAAAGCTACGGAACATTGAA TTGATTTGCCAGGAGAACGAGGGGGAAAACGACCCTGTATTGCAGAGGATTGTAGACATTCTGTATGCCACAGAT GAAGGCTTTGTGATACCTGATGAAGGGGGCCCACAGGAGGAGCAAGAAGAGTATGGCGCCGATTACAAGGACGAT GATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCG CCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGT CGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGC GATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACC TGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTG AAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCA CTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAG GAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTG GTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCC CAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAA CTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTG GGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATT CCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAA AACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGAT CCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACC ATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAAC CACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGC GACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATC GGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCT GAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 405)
Protein :
MAWVYSTSVTSDNLSRHDMLAWINESLQLNLTKIEQLCSGAAYCQFMDMLFPGSIALKKVKFQAKLEHEYIQNF KILQAGFKRMGVDKIIPVDKLVKGKFQDNFEFVQWFKKFFDANYDGKDYDPVAARQGQETAVAPSLVAPALNKPK KPLTSSSAAPQRPISTQRTAAAPKAGPGWRKNPGVGNGDDEAAELMQQWVLKLTVEDLEKERDFYFGKLRNIE LICQENEGENDPVLQRIVDILYATDEGFVIPDEGGPQEEQEEYGADYKDDDDKGAPGSAGSAAGSGACPVPLQLP PLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKT CKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQ ESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRE LESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDK NFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLN HLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARS ESYYNGISTNL* (SEQ ID NO: 406)
EB1 : : PylRS (AAAF)
DNA:
ATGGCAGTGAACGTATACTCAACGTCAGTGACCAGTGATAACCTAAGTCGACATGACATGCTGGCCTGGATCAAT GAGTCTCTGCAGTTGAATCTGACAAAGATCGAACAGTTGTGCTCAGGGGCTGCGTATTGTCAGTTTATGGACATG CTGTTCCCTGGCTCCATTGCCTTGAAGAAAGTGAAATTCCAAGCTAAGCTAGAACACGAGTACATCCAGAACTTC AAAATACTACAAGCAGGTTTTAAGAGAATGGGTGTTGACAAAATAATTCCTGTGGACAAATTAGTAAAAGGAAAG TTTCAGGACAATTTTGAATTCGTTCAGTGGTTCAAGAAGTTTTTCGATGCAAACTATGATGGAAAAGACTATGAC CCTGTGGCTGCCAGACAAGGTCAAGAAACTGCAGTGGCTCCTTCCCTTGTTGCTCCAGCTCTGAATAAACCGAAG AAACCTCTCACTTCTAGCAGTGCAGCTCCCCAGAGGCCCATCTCAACACAGAGAACCGCTGCGGCTCCTAAGGCT GGCCCTGGTGTGGTGCGAAAGAACCCTGGTGTGGGCAACGGAGATGACGAGGCAGCTGAGTTGATGCAGCAGGTC AACGTATTGAAACTTACTGTTGAAGACTTGGAGAAAGAGAGGGATTTCTACTTCGGAAAGCTACGGAACATTGAA TTGATTTGCCAGGAGAACGAGGGGGAAAACGACCCTGTATTGCAGAGGATTGTAGACATTCTGTATGCCACAGAT GAAGGCTTTGTGATACCTGATGAAGGGGGCCCACAGGAGGAGCAAGAAGAGTATGGCGCCGATTACAAGGACGAT GATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCG CCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGT CGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGC GATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACC TGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTG AAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCA CTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAG GAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTG GTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCC CAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAA CTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTG GGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATT CCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAA AACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGAT CCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACC ATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAAC CACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGC GACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATC GGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCC GAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 407)
Protein :
MAWVYSTSVTSDNLSRHDMLAWINESLQLNLTKIEQLCSGAAYCQFMDMLFPGSIALKKVKFQAKLEHEYIQNF KILQAGFKRMGVDKIIPVDKLVKGKFQDNFEFVQWFKKFFDANYDGKDYDPVAARQGQETAVAPSLVAPALNKPK KPLTSSSAAPQRPISTQRTAAAPKAGPGWRKNPGVGNGDDEAAELMQQWVLKLTVEDLEKERDFYFGKLRNIE LICQENEGENDPVLQRIVDILYATDEGFVIPDEGGPQEEQEEYGADYKDDDDKGAPGSAGSAAGSGACPVPLQLP PLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKT CKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQ ESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRE LESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDK NFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLN HLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARS ESYYNGISTNL* (SEQ ID NO: 408)
EB1 : : FUS : : PylRS (AA)
DNA:
ATGGCAGTGAACGTATACTCAACGTCAGTGACCAGTGATAACCTAAGTCGACATGACATGCTGGCCTGGATCAAT GAGTCTCTGCAGTTGAATCTGACAAAGATCGAACAGTTGTGCTCAGGGGCTGCGTATTGTCAGTTTATGGACATG CTGTTCCCTGGCTCCATTGCCTTGAAGAAAGTGAAATTCCAAGCTAAGCTAGAACACGAGTACATCCAGAACTTC AAAATACTACAAGCAGGTTTTAAGAGAATGGGTGTTGACAAAATAATTCCTGTGGACAAATTAGTAAAAGGAAAG TTTCAGGACAATTTTGAATTCGTTCAGTGGTTCAAGAAGTTTTTCGATGCAAACTATGATGGAAAAGACTATGAC CCTGTGGCTGCCAGACAAGGTCAAGAAACTGCAGTGGCTCCTTCCCTTGTTGCTCCAGCTCTGAATAAACCGAAG AAACCTCTCACTTCTAGCAGTGCAGCTCCCCAGAGGCCCATCTCAACACAGAGAACCGCTGCGGCTCCTAAGGCT GGCCCTGGTGTGGTGCGAAAGAACCCTGGTGTGGGCAACGGAGATGACGAGGCAGCTGAGTTGATGCAGCAGGTC AACGTATTGAAACTTACTGTTGAAGACTTGGAGAAAGAGAGGGATTTCTACTTCGGAAAGCTACGGAACATTGAA TTGATTTGCCAGGAGAACGAGGGGGAAAACGACCCTGTATTGCAGAGGATTGTAGACATTCTGTATGCCACAGAT GAAGGCTTTGTGATACCTGATGAAGGGGGCCCACAGGAGGAGCAAGAAGAGTATGGAGCACCCGGCTCCGCCGGC TCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACC CAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCC ACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAG TCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAG TCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGC AGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGA CAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGT GGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTAT GGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGT GGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGT GGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGG GACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAG AATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAG CCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCA CCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCT ACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGT GGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGA CAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGC AACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGG GATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCA GGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACC CTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGC GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTG GAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGG ATCTCTACGAACCTGTAA (SEQ ID NO: 409)
Protein :
MAWVYSTSVTSDNLSRHDMLAWINESLQLNLTKIEQLCSGAAYCQFMDMLFPGSIALKKVKFQAKLEHEYIQNF KILQAGFKRMGVDKIIPVDKLVKGKFQDNFEFVQWFKKFFDANYDGKDYDPVAARQGQETAVAPSLVAPALNKPK KPLTSSSAAPQRPISTQRTAAAPKAGPGWRKNPGVGNGDDEAAELMQQWVLKLTVEDLEKERDFYFGKLRNIE LICQENEGENDPVLQRIVDILYATDEGFVIPDEGGPQEEQEEYGAPGSAGSAAGSGMASNDYTQQATQSYGAYPT QPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQ SSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGG GGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGR GGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQ PMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGR GGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYG DDRRGGRGGAIAGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIH KIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWS APTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTN PITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLERE ITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFE IGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSS AWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 410)
EB1 : : FUS : : PylRS (AAAF)
DNA:
ATGGCAGTGAACGTATACTCAACGTCAGTGACCAGTGATAACCTAAGTCGACATGACATGCTGGCCTGGATCAAT GAGTCTCTGCAGTTGAATCTGACAAAGATCGAACAGTTGTGCTCAGGGGCTGCGTATTGTCAGTTTATGGACATG CTGTTCCCTGGCTCCATTGCCTTGAAGAAAGTGAAATTCCAAGCTAAGCTAGAACACGAGTACATCCAGAACTTC AAAATACTACAAGCAGGTTTTAAGAGAATGGGTGTTGACAAAATAATTCCTGTGGACAAATTAGTAAAAGGAAAG TTTCAGGACAATTTTGAATTCGTTCAGTGGTTCAAGAAGTTTTTCGATGCAAACTATGATGGAAAAGACTATGAC CCTGTGGCTGCCAGACAAGGTCAAGAAACTGCAGTGGCTCCTTCCCTTGTTGCTCCAGCTCTGAATAAACCGAAG AAACCTCTCACTTCTAGCAGTGCAGCTCCCCAGAGGCCCATCTCAACACAGAGAACCGCTGCGGCTCCTAAGGCT GGCCCTGGTGTGGTGCGAAAGAACCCTGGTGTGGGCAACGGAGATGACGAGGCAGCTGAGTTGATGCAGCAGGTC AACGTATTGAAACTTACTGTTGAAGACTTGGAGAAAGAGAGGGATTTCTACTTCGGAAAGCTACGGAACATTGAA TTGATTTGCCAGGAGAACGAGGGGGAAAACGACCCTGTATTGCAGAGGATTGTAGACATTCTGTATGCCACAGAT GAAGGCTTTGTGATACCTGATGAAGGGGGCCCACAGGAGGAGCAAGAAGAGTATGGAGCACCCGGCTCCGCCGGC TCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACC CAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCC ACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAG TCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAG TCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGC AGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGA CAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGT GGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTAT GGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGT GGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGT GGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGG GACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAG AATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAG CCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCA CCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCT ACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGT GGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGA CAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGC AACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGG GATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCA GGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACC CTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGC GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTG GAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGT ATTTCTACTAACCTGTAA (SEQ ID NO: 411)
Protein :
MAWVYSTSVTSDNLSRHDMLAWINESLQLNLTKIEQLCSGAAYCQFMDMLFPGSIALKKVKFQAKLEHEYIQNF KILQAGFKRMGVDKIIPVDKLVKGKFQDNFEFVQWFKKFFDANYDGKDYDPVAARQGQETAVAPSLVAPALNKPK KPLTSSSAAPQRPISTQRTAAAPKAGPGWRKNPGVGNGDDEAAELMQQWVLKLTVEDLEKERDFYFGKLRNIE LICQENEGENDPVLQRIVDILYATDEGFVIPDEGGPQEEQEEYGAPGSAGSAAGSGMASNDYTQQATQSYGAYPT QPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQ SSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGG GGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGR GGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQ PMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGR GGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYG DDRRGGRGGAIAGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIH KIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWS APTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTN PITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLERE ITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFE IGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSS AWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 412)
EB1: :MCP
DNA:
ATGGCAGTGAACGTATACTCAACGTCAGTGACCAGTGATAACCTAAGTCGACATGACATGCTGGCCTGGATCAAT
GAGTCTCTGCAGTTGAATCTGACAAAGATCGAACAGTTGTGCTCAGGGGCTGCGTATTGTCAGTTTATGGACATG
CTGTTCCCTGGCTCCATTGCCTTGAAGAAAGTGAAATTCCAAGCTAAGCTAGAACACGAGTACATCCAGAACTTC
AAAATACTACAAGCAGGTTTTAAGAGAATGGGTGTTGACAAAATAATTCCTGTGGACAAATTAGTAAAAGGAAAG TTTCAGGACAATTTTGAATTCGTTCAGTGGTTCAAGAAGTTTTTCGATGCAAACTATGATGGAAAAGACTATGAC CCTGTGGCTGCCAGACAAGGTCAAGAAACTGCAGTGGCTCCTTCCCTTGTTGCTCCAGCTCTGAATAAACCGAAG AAACCTCTCACTTCTAGCAGTGCAGCTCCCCAGAGGCCCATCTCAACACAGAGAACCGCTGCGGCTCCTAAGGCT GGCCCTGGTGTGGTGCGAAAGAACCCTGGTGTGGGCAACGGAGATGACGAGGCAGCTGAGTTGATGCAGCAGGTC AACGTATTGAAACTTACTGTTGAAGACTTGGAGAAAGAGAGGGATTTCTACTTCGGAAAGCTACGGAACATTGAA TTGATTTGCCAGGAGAACGAGGGGGAAAACGACCCTGTATTGCAGAGGATTGTAGACATTCTGTATGCCACAGAT GAAGGCTTTGTGATACCTGATGAAGGGGGCCCACAGGAGGAGCAAGAAGAGTATGCGATCGCATATCCCTATGAT GTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTC GTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGG ATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAA TACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTC GCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCA GCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 413)
Protein :
MAWVYSTSVTSDNLSRHDMLAWINESLQLNLTKIEQLCSGAAYCQFMDMLFPGSIALKKVKFQAKLEHEYIQNF KILQAGFKRMGVDKIIPVDKLVKGKFQDNFEFVQWFKKFFDANYDGKDYDPVAARQGQETAVAPSLVAPALNKPK KPLTSSSAAPQRPISTQRTAAAPKAGPGWRKNPGVGNGDDEAAELMQQWVLKLTVEDLEKERDFYFGKLRNIE LICQENEGENDPVLQRIVDILYATDEGFVIPDEGGPQEEQEEYAIAYPYDVPDYAGAPGSAGSAAGSGASNFTQF VLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIF ATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY* (SEQ ID NO: 414)
EB1: : EWSR1 : :MCP
DNA:
ATGGCAGTGAACGTATACTCAACGTCAGTGACCAGTGATAACCTAAGTCGACATGACATGCTGGCCTGGATCAAT GAGTCTCTGCAGTTGAATCTGACAAAGATCGAACAGTTGTGCTCAGGGGCTGCGTATTGTCAGTTTATGGACATG CTGTTCCCTGGCTCCATTGCCTTGAAGAAAGTGAAATTCCAAGCTAAGCTAGAACACGAGTACATCCAGAACTTC AAAATACTACAAGCAGGTTTTAAGAGAATGGGTGTTGACAAAATAATTCCTGTGGACAAATTAGTAAAAGGAAAG TTTCAGGACAATTTTGAATTCGTTCAGTGGTTCAAGAAGTTTTTCGATGCAAACTATGATGGAAAAGACTATGAC CCTGTGGCTGCCAGACAAGGTCAAGAAACTGCAGTGGCTCCTTCCCTTGTTGCTCCAGCTCTGAATAAACCGAAG AAACCTCTCACTTCTAGCAGTGCAGCTCCCCAGAGGCCCATCTCAACACAGAGAACCGCTGCGGCTCCTAAGGCT GGCCCTGGTGTGGTGCGAAAGAACCCTGGTGTGGGCAACGGAGATGACGAGGCAGCTGAGTTGATGCAGCAGGTC AACGTATTGAAACTTACTGTTGAAGACTTGGAGAAAGAGAGGGATTTCTACTTCGGAAAGCTACGGAACATTGAA TTGATTTGCCAGGAGAACGAGGGGGAAAACGACCCTGTATTGCAGAGGATTGTAGACATTCTGTATGCCACAGAT GAAGGCTTTGTGATACCTGATGAAGGGGGCCCACAGGAGGAGCAAGAAGAGTATATGGCGTCCACGGATTACAGT ACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACC ACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACC ACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCC CCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACC ACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCA GCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGT TACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAG CCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGT TACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTAT GGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAG AGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCT GGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTT GATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGC TTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGAC TCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAG CAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGA AAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGG AAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGT CTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGA CCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGG AACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAG AACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCG CCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGT GGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGA GGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATCGCATAT CCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTT ACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATC GCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAG AATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATT CCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCG ATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 415)
Protein :
MAWVYSTSVTSDNLSRHDMLAWINESLQLNLTKIEQLCSGAAYCQFMDMLFPGSIALKKVKFQAKLEHEYIQNF KILQAGFKRMGVDKIIPVDKLVKGKFQDNFEFVQWFKKFFDANYDGKDYDPVAARQGQETAVAPSLVAPALNKPK KPLTSSSAAPQRPISTQRTAAAPKAGPGWRKNPGVGNGDDEAAELMQQWVLKLTVEDLEKERDFYFGKLRNIE LICQENEGENDPVLQRIVDILYATDEGFVIPDEGGPQEEQEEYMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQT TQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTT TQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQ PVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQ SSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGG FNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETG KPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGG PMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFP PPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAY PYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQ NRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY* (SEQ ID NO: 416)
EB1 : : EWSR1 : : 4clN22
DNA:
ATGGCAGTGAACGTATACTCAACGTCAGTGACCAGTGATAACCTAAGTCGACATGACATGCTGGCCTGGATCAAT GAGTCTCTGCAGTTGAATCTGACAAAGATCGAACAGTTGTGCTCAGGGGCTGCGTATTGTCAGTTTATGGACATG CTGTTCCCTGGCTCCATTGCCTTGAAGAAAGTGAAATTCCAAGCTAAGCTAGAACACGAGTACATCCAGAACTTC AAAATACTACAAGCAGGTTTTAAGAGAATGGGTGTTGACAAAATAATTCCTGTGGACAAATTAGTAAAAGGAAAG TTTCAGGACAATTTTGAATTCGTTCAGTGGTTCAAGAAGTTTTTCGATGCAAACTATGATGGAAAAGACTATGAC CCTGTGGCTGCCAGACAAGGTCAAGAAACTGCAGTGGCTCCTTCCCTTGTTGCTCCAGCTCTGAATAAACCGAAG AAACCTCTCACTTCTAGCAGTGCAGCTCCCCAGAGGCCCATCTCAACACAGAGAACCGCTGCGGCTCCTAAGGCT GGCCCTGGTGTGGTGCGAAAGAACCCTGGTGTGGGCAACGGAGATGACGAGGCAGCTGAGTTGATGCAGCAGGTC AACGTATTGAAACTTACTGTTGAAGACTTGGAGAAAGAGAGGGATTTCTACTTCGGAAAGCTACGGAACATTGAA TTGATTTGCCAGGAGAACGAGGGGGAAAACGACCCTGTATTGCAGAGGATTGTAGACATTCTGTATGCCACAGAT GAAGGCTTTGTGATACCTGATGAAGGGGGCCCACAGGAGGAGCAAGAAGAGTATATGGCGTCCACGGATTACAGT ACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACC ACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACC ACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCC CCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACC ACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCA GCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGT TACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAG CCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGT TACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTAT GGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAG AGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCT GGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTT GATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGC TTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGAC TCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAG CAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGA AAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGG AAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGT CTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGA CCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGG AACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAG AACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCG CCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGT GGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGA GGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATCGCAGGA GCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACC ATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTC GACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGA CGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCT GGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAA CAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGC GGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCA AACCCACCGCTCGAGTCTAGAGGGCCCGTTTAA (SEQ ID NO: 417)
Protein :
MAWVYSTSVTSDNLSRHDMLAWINESLQLNLTKIEQLCSGAAYCQFMDMLFPGSIALKKVKFQAKLEHEYIQNF KILQAGFKRMGVDKIIPVDKLVKGKFQDNFEFVQWFKKFFDANYDGKDYDPVAARQGQETAVAPSLVAPALNKPK KPLTSSSAAPQRPISTQRTAAAPKAGPGWRKNPGVGNGDDEAAELMQQWVLKLTVEDLEKERDFYFGKLRNIE LICQENEGENDPVLQRIVDILYATDEGFVIPDEGGPQEEQEEYMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQT TQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTT TQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQ PVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQ SSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGG FNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETG KPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGG PMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFP PPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAG APGSAGSAAGSGEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRR RERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAG GLATMDAQTRRRERRAEKQAQWKAANPPLESRGPV* (SEQ ID NO: 418)
EB1 : : FUS : : PylRS (AF)
DNA:
ATGGCAGTGAACGTATACTCAACGTCAGTGACCAGTGATAACCTAAGTCGACATGACATGCTGGCCTGGATCAAT GAGTCTCTGCAGTTGAATCTGACAAAGATCGAACAGTTGTGCTCAGGGGCTGCGTATTGTCAGTTTATGGACATG CTGTTCCCTGGCTCCATTGCCTTGAAGAAAGTGAAATTCCAAGCTAAGCTAGAACACGAGTACATCCAGAACTTC AAAATACTACAAGCAGGTTTTAAGAGAATGGGTGTTGACAAAATAATTCCTGTGGACAAATTAGTAAAAGGAAAG TTTCAGGACAATTTTGAATTCGTTCAGTGGTTCAAGAAGTTTTTCGATGCAAACTATGATGGAAAAGACTATGAC CCTGTGGCTGCCAGACAAGGTCAAGAAACTGCAGTGGCTCCTTCCCTTGTTGCTCCAGCTCTGAATAAACCGAAG AAACCTCTCACTTCTAGCAGTGCAGCTCCCCAGAGGCCCATCTCAACACAGAGAACCGCTGCGGCTCCTAAGGCT GGCCCTGGTGTGGTGCGAAAGAACCCTGGTGTGGGCAACGGAGATGACGAGGCAGCTGAGTTGATGCAGCAGGTC AACGTATTGAAACTTACTGTTGAAGACTTGGAGAAAGAGAGGGATTTCTACTTCGGAAAGCTACGGAACATTGAA TTGATTTGCCAGGAGAACGAGGGGGAAAACGACCCTGTATTGCAGAGGATTGTAGACATTCTGTATGCCACAGAT GAAGGCTTTGTGATACCTGATGAAGGGGGCCCACAGGAGGAGCAAGAAGAGTATGGAGCACCCGGCTCCGCCGGC TCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACC CAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCC ACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAG TCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAG TCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGC AGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGA CAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGT GGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTAT GGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGT GGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGT GGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGG GACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAG AATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAG CCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCA CCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCT ACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGT GGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGA CAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGC AACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGG GATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCA GGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACC CTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGC
GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTG GAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGT ATTTCTACTAACCTGTAA (SEQ ID NO: 419)
Protein :
MAVNVYSTSVTSDNLSRHDMLAWINESLQLNLTKIEQLCSGAAYCQFMDMLFPGSIALKKVKFQAKLEHEYIQNF KILQAGFKRMGVDKIIPVDKLVKGKFQDNFEFVQWFKKFFDANYDGKDYDPVAARQGQETAVAPSLVAPALNKPK KPLTSSSAAPQRPISTQRTAAAPKAGPGWRKNPGVGNGDDEAAELMQQVNVLKLTVEDLEKERDFYFGKLRNIE LICQENEGENDPVLQRIVDILYATDEGFVIPDEGGPQEEQEEYGAPGSAGSAAGSGMASNDYTQQATQSYGAYPT QPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQ SSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGG GGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGR GGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQ PMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGR GGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYG DDRRGGRGGAIAGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIH KIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWS APTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTN
PITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLERE ITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFE IGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSS AWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 420)
EBl : : FUS : : MCP : : PylRS (AF)
DNA:
ATGGCAGTGAACGTATACTCAACGTCAGTGACCAGTGATAACCTAAGTCGACATGACATGCTGGCCTGGATCAAT GAGTCTCTGCAGTTGAATCTGACAAAGATCGAACAGTTGTGCTCAGGGGCTGCGTATTGTCAGTTTATGGACATG CTGTTCCCTGGCTCCATTGCCTTGAAGAAAGTGAAATTCCAAGCTAAGCTAGAACACGAGTACATCCAGAACTTC AAAATACTACAAGCAGGTTTTAAGAGAATGGGTGTTGACAAAATAATTCCTGTGGACAAATTAGTAAAAGGAAAG TTTCAGGACAATTTTGAATTCGTTCAGTGGTTCAAGAAGTTTTTCGATGCAAACTATGATGGAAAAGACTATGAC CCTGTGGCTGCCAGACAAGGTCAAGAAACTGCAGTGGCTCCTTCCCTTGTTGCTCCAGCTCTGAATAAACCGAAG AAACCTCTCACTTCTAGCAGTGCAGCTCCCCAGAGGCCCATCTCAACACAGAGAACCGCTGCGGCTCCTAAGGCT GGCCCTGGTGTGGTGCGAAAGAACCCTGGTGTGGGCAACGGAGATGACGAGGCAGCTGAGTTGATGCAGCAGGTC AACGTATTGAAACTTACTGTTGAAGACTTGGAGAAAGAGAGGGATTTCTACTTCGGAAAGCTACGGAACATTGAA TTGATTTGCCAGGAGAACGAGGGGGAAAACGACCCTGTATTGCAGAGGATTGTAGACATTCTGTATGCCACAGAT GAAGGCTTTGTGATACCTGATGAAGGGGGCCCACAGGAGGAGCAAGAAGAGTATGGAGCACCCGGCTCCGCCGGC TCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACC CAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCC ACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAG TCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAG TCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGC AGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGA CAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGT GGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTAT GGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGT GGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGT GGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGG GACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAG AATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAG CCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCA CCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCT ACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGT GGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGA CAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGC AACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGG GATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGA AGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGC GACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCT TACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAA GGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATT GTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTAC GGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGC CCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCT GCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATC TATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGT CACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCC AATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCC GTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCG GCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACC GGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCA GCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAAT TCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCC GAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAG ATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAA CAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAA CTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAA GAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGC ATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGAC ACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGG GGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAAC ATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 421)
Protein :
MAWVYSTSVTSDNLSRHDMLAWINESLQLNLTKIEQLCSGAAYCQFMDMLFPGSIALKKVKFQAKLEHEYIQNF
KILQAGFKRMGVDKIIPVDKLVKGKFQDNFEFVQWFKKFFDANYDGKDYDPVAARQGQETAVAPSLVAPALNKPK
KPLTSSSAAPQRPISTQRTAAAPKAGPGWRKNPGVGNGDDEAAELMQQWVLKLTVEDLEKERDFYFGKLRNIE
LICQENEGENDPVLQRIVDILYATDEGFVIPDEGGPQEEQEEYGAPGSAGSAAGSGMASNDYTQQATQSYGAYPT
QPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQ
SSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGG GGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGR GGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQ PMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGR GGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYG DDRRGGRGGAIAYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQA YKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY GADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKI YIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKS VARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQAS APALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLE IKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGK EHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREW GIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 422)
TOM20 : : FUS : : PCP : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCT GGTAGTGGAGCATCGATAGAGCAGAAGCTGATCTCAGAGGAGGACCTGATCGAAGGCCGCCATATGCTAGCCTCC AAAACCATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATC TTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACC GCGTATCGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAG GTATGGTCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAG TCCCTCGTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTGCGGATCCGCTAGCC GGTGCTCCTGGTTCAGCAGGAAGCGCAGCAGGATCAGGTGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAA CGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGA ACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTG GTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGT TGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAA GTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAAC ACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTT TCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGC AATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGAT CGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGC GAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTG GAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAG TATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGT CTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAA ATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAAC TTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGC ATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAA CTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGT TTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTAT TACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 423)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAGKPIPNPLLGLDSTGAPGSAGSAAGSGASIEQKLISEEDLIEGRHMLAS KTIVLSVGEATRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRWLKLDQADWDSGLPKVRYTQ VWSHDVTIVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGRADPLAGAPGSAGSAAGSGACPVPLQLPPLE RLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKR CRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESV SVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELES ELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFC LRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLG IDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESY YNGISTNL* (SEQ ID NO: 424)
TOM20 : : FUS : : 2xPCP : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCT GGTAGTGGAGCATCGATAGAGCAGAAGCTGATCTCAGAGGAGGACCTGATCGAAGGCCGCCATATGCTAGCCTCC AAAACCATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATC TTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACC GCGTATCGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAG GTATGGTCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAG TCCCTCGTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTGCGGATCCGCTAGCC TCCAAAACCATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAG ATCTTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAG ACCGCGTATCGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACT CAGGTATGGTCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACC AAGTCCCTCGTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTCCACCGGTCGCC ACCGGTGCTCCTGGTTCAGCAGGAAGCGCAGCAGGATCAGGTGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG GAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG AACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCG GGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCC TATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 425)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAGKPIPNPLLGLDSTGAPGSAGSAAGSGASIEQKLISEEDLIEGRHMLAS KTIVLSVGEATRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRWLKLDQADWDSGLPKVRYTQ VWSHDVTIVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGRADPLASKTIVLSVGEATRTLTEIQSTADRQ IFEEKVGPLVGRLRLTASLRQNGAKTAYRVNLKLDQADWDSGLPKVRYTQVWSHDVTIVANSTEASRKSLYDLT KSLVATSQVEDLVWLVPLGRPPVATGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRT GTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKV KWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVK GNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGK LEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPI KIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDL ELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 426)
TOM20 : : FUS : : 4clN22 : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCT GGTAGTGGAGCATCGATAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCACAAACACGA CGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGC GCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAG AAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCT GGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCT GCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGAC GCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCT AGAGGGCCCGTTGGTGCTCCTGGTTCAGCAGGAAGCGCAGCAGGATCAGGTGCGTGCCCGGTGCCGCTGCAGCTG CCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATG AGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGT GGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAA ACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGC GTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAA CCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACC CAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCC CTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAA TCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGT GAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTAT CTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTG ATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGAT AAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCT GATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTT ACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTG AACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCAC GGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGG ATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGT TCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 427)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAGKPIPNPLLGLDSTGAPGSAGSAAGSGASIEQKLISEEDLLATMDAQTR RRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGA GGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLES RGPVGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMAC GDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPK PLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTK SQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPIL IPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEF TMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPW IGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 428)
LCK : : FUS : : 2xPCP : : CbzRS
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCCTCAAACGATTATACCCAACAAGCAACCCAA AGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGT TACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAG AACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCC CAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGT TACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGT GGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAAC AGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGT GGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGAC CGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGT GGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTC AATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATC TTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAG ACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCA ACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAAT CCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGG CGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGT GGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAAC TTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCT CACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATCGCAGGCGCCGATTACAAGGAC GATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCATCGATAGAGCAGAAGCTGATC TCAGAGGAGGACCTGATCGAAGGCCGCCATATGCTAGCCTCCAAAACCATCGTTCTTTCGGTCGGCGAGGCTACT CGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATCTTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGG CTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACCGCGTATCGCGTCAACCTAAAACTGGATCAGGCG GACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAGGTATGGTCGCACGACGTGACAATCGTTGCGAAT AGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAGTCCCTCGTCGCGACCTCGCAGGTCGAAGATCTT GTCGTCAACCTTGTGCCGCTGGGCCGTGCGGATCCGCTAGCCTCCAAAACCATCGTTCTTTCGGTCGGCGAGGCT ACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATCTTCGAAGAGAAGGTCGGGCCTCTGGTGGGT CGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACCGCGTATCGCGTCAACCTAAAACTGGATCAG GCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAGGTATGGTCGCACGACGTGACAATCGTTGCG AATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAGTCCCTCGTCGCGACCTCGCAGGTCGAAGAT CTTGTCGTCAACCTTGTGCCGCTGGGCCGTCCACCGGTCGCCACCGGTGCTCCTGGTTCAGCAGGAAGCGCAGCA GGATCAGGTGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTG AATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTT AGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACA GCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAA TTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAA GCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGA AGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATT AGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCC CCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGAC GAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTG CAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGAT CGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGAT ACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTTGCACCAAATCTGATG AACTATGGACGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAA GAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTACACAAATGGGTTCAGGTTGTACTCGT GAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGT ATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCG CTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAA CACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 429)
Protein :
MGCVCSSNPEGTELASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQ NTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYG GQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQD RGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTI FVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGN PIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMN FSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAGADYKDDDDKGAPGSAGSAAGSGASIEQKLI SEEDLIEGRHMLASKTIVLSVGEATRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRWLKLDQA DWDSGLPKVRYTQVWSHDVTIVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGRADPLASKTIVLSVGEA TRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRWLKLDQADWDSGLPKVRYTQVWSHDVTIVA NSTEASRKSLYDLTKSLVATSQVEDLVWLVPLGRPPVATGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPL NTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDLNK FLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSI SSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDL QQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLM NYGRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFTQMGSGCTRENLESIITDFLNHLGIDFKIVGDSC MVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 430)
LCK : : FUS : : PCP : : CbzRS
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCCTCAAACGATTATACCCAACAAGCAACCCAA AGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGT TACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAG AACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCC CAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGT TACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGT GGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAAC AGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGT GGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGAC CGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGT GGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTC AATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATC TTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAG ACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCA ACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAAT CCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGG CGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGT GGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAAC TTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCT CACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATCGCAGGCGCCGATTACAAGGAC GATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCATCGATAGAGCAGAAGCTGATC TCAGAGGAGGACCTGATCGAAGGCCGCCATATGCTAGCCTCCAAAACCATCGTTCTTTCGGTCGGCGAGGCTACT CGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATCTTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGG CTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACCGCGTATCGCGTCAACCTAAAACTGGATCAGGCG GACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAGGTATGGTCGCACGACGTGACAATCGTTGCGAAT AGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAGTCCCTCGTCGCGACCTCGCAGGTCGAAGATCTT GTCGTCAACCTTGTGCCGCTGGGCCGTGCGGATCCGCTAGCCGGTGCTCCTGGTTCAGCAGGAAGCGCAGCAGGA TCAGGTGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAAT ACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGC CGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCA CGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTC CTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCA ATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGC AAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGC
AGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCG
GTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAA
ATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAA
CAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGT
GGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACC
GAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTTGCACCAAATCTGATGAAC
TATGGACGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAG
TCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTACACAAATGGGTTCAGGTTGTACTCGTGAG
AACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATG
GTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTG
GACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACAC
GACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ
ID NO: 431)
Protein :
MGCVCSSNPEGTELASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQ NTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYG GQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQD RGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTI FVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGN PIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMN FSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAGADYKDDDDKGAPGSAGSAAGSGASIEQKLI SEEDLIEGRHMLASKTIVLSVGEATRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRWLKLDQA DWDSGLPKVRYTQVWSHDVTIVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGRADPLAGAPGSAGSAAG SGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTA RALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGS KFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDE ISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDT ELSKQIFRVDKNFCLRPMLAPNLMNYGRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFTQMGSGCTRE NLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKH DFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 432)
TOM20 : : FUS : : CbzRS
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCT GGTAGTGGAGCATCGATAGAGCAGAAGCTGATCTCAGAGGAGGACCTGATCGAAGGCCGCCATATGCTAGCCTCC AAAACCATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATC TTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACC GCGTATCGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAG GTATGGTCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAG TCCCTCGTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTGCGGATCCGCTAGCC GGTGCTCCTGGTTCAGCAGGAAGCGCAGCAGGATCAGGTGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAA CGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGA ACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTG GTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGT TGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAA GTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAAC ACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTT TCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGC AATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGAT CGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGC GAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTG GAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAG TATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGT CTGCGCCCTATGCTTGCACCAAATCTGATGAACTATGGACGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAA ATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAAC TTTACACAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGC ATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAA CTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGA TTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTAT TATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 433)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAGKPIPNPLLGLDSTGAPGSAGSAAGSGASIEQKLISEEDLIEGRHMLAS KTIVLSVGEATRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRWLKLDQADWDSGLPKVRYTQ VWSHDVTIVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGRADPLAGAPGSAGSAAGSGACPVPLQLPPLE RLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKR CRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESV SVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELES ELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFC LRPMLAPNLMNYGRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFTQMGSGCTRENLESIITDFLNHLG IDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESY YNGISTNL* (SEQ ID NO: 434)
TOM20 : : FUS : : 2xPCP : : CbzRS
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCT GGTAGTGGAGCATCGATAGAGCAGAAGCTGATCTCAGAGGAGGACCTGATCGAAGGCCGCCATATGCTAGCCTCC AAAACCATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATC TTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACC GCGTATCGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAG GTATGGTCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAG TCCCTCGTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTGCGGATCCGCTAGCC TCCAAAACCATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAG ATCTTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAG ACCGCGTATCGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACT CAGGTATGGTCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACC AAGTCCCTCGTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTCCACCGGTCGCC ACCGGTGCTCCTGGTTCAGCAGGAAGCGCAGCAGGATCAGGTGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG GAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTTGCACCAAATCTGATGAACTATGGACGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG AACTTTACACAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCA GGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCG TATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 435)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND
YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG
YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY
GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG
GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF
KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG
NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP
GGGPGGSHMGGNYGDDRRGGRGGAIAGKPIPNPLLGLDSTGAPGSAGSAAGSGASIEQKLISEEDLIEGRHMLAS
KTIVLSVGEATRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRWLKLDQADWDSGLPKVRYTQ
VWSHDVTIVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGRADPLASKTIVLSVGEATRTLTEIQSTADRQ
IFEEKVGPLVGRLRLTASLRQNGAKTAYRVNLKLDQADWDSGLPKVRYTQVWSHDVTIVANSTEASRKSLYDLT KS LVAT SQVEDLVWLVPLGRP PVATGAPGSAGSAAGS GACPVPLQLP PLERLTLDDKKPLNTLI SATGLWMS RT GT I HKI KHHEVS RS KI YI EMACGDHLWNNS RS S RTARALRHHKYRKTCKRCRVS DEDLNKFLTKANEDQT SVKV KWSAPTRTKKAMPKSVARAPKPLENTEAAQAQP S GS KFS PAI PVSTQESVSVPASVST S I S S I STGATASALVK GNTNP I T SMSAPVQASAPALTKSQTDRLEVLLNPKDEI S LNS GKP FRELES ELLS RRKKDLQQI YAEERENYLGK LEREI TRFFVDRGFLEI KS P I LI PLEYI ERMGI DNDTELS KQI FRVDKNFCLRPMLAPNLMNYGRKLDRALPDP I KI FEI GPCYRKES DGKEHLEEFTMLNFTQMGS GCTRENLES I I TDFLNHLGI DFKIVGDS CMVYGDTLDVMHGDL ELS SAWGP I PLDREWGI DKPWI GAGFGLERLLKVKHDFKNI KRAARS ESYYNGI STNL* ( S EQ I D NO : 436)
TOM20 : : FUS : : 4clN22 : : CbzRS
DNA :
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC C G C AAAAGAC GAAGT GAC C C C AAC T T C AAGAAC AG G C T T C GAGAAC GAAGAAAGAAAC AGAAG C T T G C C AAG GAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT C AG C C C T AC G GAC AG C AGAGT T AC AGT G GT TAT AG C C AGT C C AC G GAC AC T T C AG GAT AT G G C C AGAG C AG C TAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AG C C AG C AG C C T AG C TAT G GT G GAC AG C AG C AAAG C TAT G GAC AG C AG C AAAG C T AT AAT C C C C C T C AG G G C TAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAG C AGAT T G GT AT TAT T AAGAC AAAC AAGAAAAC G G GAC AG C C CAT GAT T AAT T T GT AC AC AGAC AG G GAAAC T GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCT G GT AGT G GAG CAT C GAT AGAG C AGAAG C T GAT C T C AGAG GAG GAC CTGCTAGC C AC CAT G GAC G C AC AAAC AC GA CGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGC GCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAG AAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCT GGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCT GCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGAC GCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCT AGAGGGCCCGTTGGTGCTCCTGGTTCAGCAGGAAGCGCAGCAGGATCAGGTGCGTGCCCGGTGCCGCTGCAGCTG CCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATG AGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGT GGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAA ACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGC GTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAA CCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACC CAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCC CTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAA TCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGT GAAC T G GAGAG C GAAC T G C T GT C AC GT C GT AAAAAAGAC C T G C AAC AAAT C TAT G C C GAAGAAC GT GAGAAC TAT CTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTG ATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGAT AAAAACTTCTGTCTGCGCCCTATGCTTGCACCAAATCTGATGAACTATGGACGCAAACTGGACCGTGCCCTGCCT GATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTT ACCATGCTGAACTTTACACAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTG AACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCAC GGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGG ATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGT TCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 437)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAGKPIPNPLLGLDSTGAPGSAGSAAGSGASIEQKLISEEDLLATMDAQTR RRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGA GGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLES RGPVGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMAC GDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPK PLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTK SQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPIL IPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLMNYGRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEF TMLNFTQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPW IGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 438)
EBAG9I-29 : : FUS : : PCP : : PylRS (AF)
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAA CAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTAC GGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTAT GGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGT AGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGC ACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAG CCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAG AACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCC ATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTAT GGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAAC CGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGAC CGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGAC AACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATT GGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTG AAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAA TTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGT GGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGA GGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGT GAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGA CCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAGGCGCC GATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCATCGATAGAG CAGAAGCTGATCTCAGAGGAGGACCTGATCGAAGGCCGCCATATGCTAGCCTCCAAAACCATCGTTCTTTCGGTC GGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATCTTCGAAGAGAAGGTCGGGCCT CTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACCGCGTATCGCGTCAACCTAAAA CTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAGGTATGGTCGCACGACGTGACA ATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAGTCCCTCGTCGCGACCTCGCAG GTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTGCGGATCCGCTAGCCGGTGCTCCTGGTTCAGCAGGA AGCGCAGCAGGATCAGGTGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAA AAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACAC CACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCT TCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGAT CTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGT ACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAG CCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGC ACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGC ATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAAT CCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAA AAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTT TTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATC GACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCA AATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGT TATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGT TGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGC GACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGC CCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTG AAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAAC CTGTAA (SEQ ID NO: 439)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPY GQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSS TSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSS MSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSD RGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKL KGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRG GFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAGA DYKDDDDKGAPGSAGSAAGSGASIEQKLISEEDLIEGRHMLASKTIVLSVGEATRTLTEIQSTADRQIFEEKVGP LVGRLRLTASLRQNGAKTAYRVNLKLDQADWDSGLPKVRYTQVWSHDVTIVANSTEASRKSLYDLTKSLVATSQ VEDLVWLVPLGRADPLAGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKH HEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTR TKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITS MSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRF FVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPC YRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWG PIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 440)
EBAG9I-29 : : FUS : : 4clN22 : : IFRS1
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAA CAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTAC GGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTAT GGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGT AGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGC ACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAG CCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAG AACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCC ATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTAT GGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAAC CGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGAC CGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGAC AACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATT GGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTG AAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAA TTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGT GGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGA GGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGT GAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGA CCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAGGCGCC GATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCATCGATAGAG CAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAG AAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCT GGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCT GCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGAC GCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGA GCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAG CGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCTAGAGGGCCCGTTGGTGCTCCT GGTTCAGCAGGAAGCGCAGCAGGATCAGGTGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACC CTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGC GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTTGCACCAAATATGCTGAACTATAGCCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGTCTTTTATGCAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTG GAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGG ATCTCTACGAACCTGTAA (SEQ ID NO: 441)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPY GQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSS TSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSS MSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSD RGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKL KGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRG GFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAGA DYKDDDDKGAPGSAGSAAGSGASIEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGA GGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDG AGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLESRGPVGAPGSAGSAAGSGACPVPLQLPPLERLT LDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRV SDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVP ASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELL SRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRP MLAPNMLNYSRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLSFMQMGSGCTRENLESIITDFLNHLGIDF KIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNG ISTNL* (SEQ ID NO: 442)
KIF16B : :EWSR1: :Myc: : 2xPCP
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA ATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCC ACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTC AGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACT GGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACC ACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCA GCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAA CCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTA CCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCG ACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGC TATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCT CCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATG GGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGG GGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGC AGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCA CCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGAT GATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATC TACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCT GCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCA ATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGA GGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGA GGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCC AATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGC TTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGA GGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGT GGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTG ATGGAACAGGCGATCGCAGAGCAGAAGCTGATCTCAGAGGAGGACCTGATCGAAGGCCGCCATATGCTAGCCTCC AAAACCATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATC TTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACC GCGTATCGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAG GTATGGTCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAG TCCCTCGTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTGCGGATCCGCTAGCC TCCAAAACCATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAG ATCTTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAG ACCGCGTATCGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACT CAGGTATGGTCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACC AAGTCCCTCGTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTCCACCGGTCGCC ACCTAA (SEQ ID NO: 443)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDV SYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYP AYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQP T SYDQS SYSQQNTYGQP S SYGQQS SYGQQS SYGQQP PT SYP PQTGSYSQAP SQYSQQS S SYGQQS S FRQDHP S SM GVYGQES GGFS GPGENRSMS GPDNRGRGRGGFDRGGMS RGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGP PVDPDEDS DNSAI YVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMI HI YLDKETGKPKGDATVSYEDP PTAKA AVEWFDGKDFQGS KLKVS LARKKP PMNSMRGGLP PREGRGMP P PLRGGPGGPGGPGGPMGRMGGRGGDRGGFP PR GPRGS RGNP S GGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLP P P FP P PGGDRGRGGPGGMRGGR GGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGP PGPLMEQAIAEQKLI S EEDLI EGRHMLAS KT IVLSVGEATRTLTEI QSTADRQI FEEKVGPLVGRLRLTAS LRQNGAKTAYRWLKLDQADWDS GLPKVRYTQ VWSHDVT IVANSTEAS RKS LYDLTKS LVAT SQVEDLWNLVPLGRADPLAS KT IVLSVGEATRTLTEI QSTADRQ I FEEKVGPLVGRLRLTAS LRQNGAKTAYRVNLKLDQADWDS GLPKVRYTQVWSHDVT IVANSTEAS RKS LYDLT KS LVAT SQVEDLWNLVPLGRP PVAT * ( S EQ I D NO : 4 4 4 )
KIF16B : :EWSR1: : HA: : 2xPCP
DNA :
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT AT T C AGAT G GAGAAAAG C AAAAC GAC AAT C AC AAAC T T AAAGAT AC C AGAAG GAG G C AC T G G G GAC T C AG GAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAG CAT C C C AAAGAAG G C C C T TAT GT T GAG GAT T TAT C C AAAC AT T T AGT AC AGAAT TAT G GT GAC GT AGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TAT G C AAAT AGAG C C AAAAAC AT CAT C AAC AAG C C T AC CAT T AAT GAG GAT G C C AAC GT C AAAC T TAT C C GT GAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA ATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCC ACT C AAG GAT AT G C AC AGAC C AC C C AG G CAT AT G G G C AAC AAAG C TAT G GAAC C TAT G GAC AG C C C AC T GAT GT C AGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACT GGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACC ACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCA G C C TAT G G G C AG C AG C C AG C AG C C AC T G C AC C T AC AAGAC C G C AG GAT G GAAAC AAG C C C AC T GAGAC T AGT C AA C C T C AAT C T AG C AC AG G G G GT T AC AAC C AG C C C AG C C T AG GAT AT G GAC AGAGT AAC T AC AGT TAT C C C C AG GT A CCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCG AC T AGT TAT GAT C AGAG C AGT T AC T C T C AG C AGAAC AC C TAT G G G C AAC C GAG C AG C TAT G GAC AG C AGAGT AG C TAT G GT C AAC AAAG C AG C TAT G G G C AG C AG C C T C C C AC TAGTTACC C AC C C C AAAC T G GAT C C T AC AG C C AAG C T C C AAGT C AAT AT AG C C AAC AGAG C AG C AG C T AC G G G C AG C AGAGT T CAT T C C GAC AG GAC C AC C C C AGT AG CAT G GGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGG GGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGC AGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCA CCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGAT GAT C T G G C AGAC T T C T T T AAG C AGT GTGGGGTTGT T AAGAT GAAC AAGAGAAC T G G G C AAC C CAT GAT C C AC AT C T AC C T G GAC AAG GAAAC AG GAAAG C C C AAAG G C GAT G C C AC AGT GT C C TAT GAAGAC C C AC C T AC T G C C AAG G C T GCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCA ATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGA GGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGA GGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCC AATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGC TTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGA GGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGT GGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTG ATGGAACAGGCGATCGCATACCCCTACGACGTGCCCGACTACGCCATCGAAGGCCGCCATATGCTAGCCTCCAAA ACCATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATCTTC GAAGAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACCGCG TATCGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAGGTA TGGTCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAGTCC CTCGTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTGCGGATCCGCTAGCCTCC AAAACCATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATC TTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACC GCGTATCGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAG GTATGGTCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAG TCCCTCGTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTCCACCGGTCGCCACC TAA (SEQ ID NO: 445)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDV SYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYP AYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQP TSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSM GVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGP PVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKA AVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPR GPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGR GGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAYPYDVPDYAIEGRHMLASK TIVLSVGEATRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRWLKLDQADWDSGLPKVRYTQV WSHDVTIVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGRADPLASKTIVLSVGEATRTLTEIQSTADRQI FEEKVGPLVGRLRLTASLRQNGAKTAYRVNLKLDQADWDSGLPKVRYTQVWSHDVTIVANSTEASRKSLYDLTK SLVATSQVEDLWNLVPLGRPPVAT* (SEQ ID NO: 446)
EBAG9I-29 : : EWSR1 : :Myc: : 2xPCP
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTAC ACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAG CCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGA CAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGT GCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAG CCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACT GAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGT TATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGA CAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCC TACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCAC CCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGC CCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGC GGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTT GATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGT GTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCC ATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCT ACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGG AAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGT GGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGC TTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGAC TGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCA AAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATG CGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGA GGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCC CCTGGACCTTTGATGGAACAGGCGATCGCAGAGCAGAAGCTGATCTCAGAGGAGGACCTGATCGAAGGCCGCCAT ATGCTAGCCTCCAAAACCATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCA GACCGTCAGATCTTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAAC GGAGCCAAGACCGCGTATCGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTG CGCTACACTCAGGTATGGTCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTAC GATTTGACCAAGTCCCTCGTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTGCG GATCCGCTAGCCTCCAAAACCATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACC GCAGACCGTCAGATCTTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAA AACGGAGCCAAGACCGCGTATCGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAA GTGCGCTACACTCAGGTATGGTCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTG TACGATTTGACCAAGTCCCTCGTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGT CCACCGGTCGCCACCTAA (SEQ ID NO: 447)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQ PTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQ PAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYS STQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDH PSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDL DLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPP TAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGG FPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGM RGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAEQKLISEEDLIEGRH MLASKTIVLSVGEATRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRWLKLDQADWDSGLPKV RYTQVWSHDVTIVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGRADPLASKTIVLSVGEATRTLTEIQST ADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRVNLKLDQADWDSGLPKVRYTQVWSHDVTIVANSTEASRKSL YDLTKSLVATSQVEDLWNLVPLGRPPVAT* (SEQ ID NO: 448)
EBAG9I-29 : : EWSR1 : : HA: : 2xPCP
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTAC ACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAG CCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGA CAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGT GCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAG CCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACT GAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGT TATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGA CAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCC TACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCAC CCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGC CCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGC GGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTT GATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGT GTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCC ATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCT ACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGG AAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGT GGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGC TTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGAC TGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCA AAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATG CGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGA GGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCC CCTGGACCTTTGATGGAACAGGCGATCGCATACCCCTACGACGTGCCCGACTACGCCATCGAAGGCCGCCATATG CTAGCCTCCAAAACCATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGAC CGTCAGATCTTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGA GCCAAGACCGCGTATCGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGC TACACTCAGGTATGGTCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGAT TTGACCAAGTCCCTCGTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTGCGGAT CCGCTAGCCTCCAAAACCATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCA GACCGTCAGATCTTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAAC GGAGCCAAGACCGCGTATCGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTG CGCTACACTCAGGTATGGTCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTAC GATTTGACCAAGTCCCTCGTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTCCA CCGGTCGCCACCTAA (SEQ ID NO: 449)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQ PTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQ PAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYS STQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDH PSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDL DLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPP TAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGG FPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGM RGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAYPYDVPDYAIEGRHM LASKTIVLSVGEATRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRWLKLDQADWDSGLPKVR YTQVWSHDVTIVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGRADPLASKTIVLSVGEATRTLTEIQSTA DRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRVNLKLDQADWDSGLPKVRYTQVWSHDVTIVANSTEASRKSLY DLTKSLVATSQVEDLWNLVPLGRPPVAT* (SEQ ID NO: 450)
LCK: : CbzRS
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG GAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTTGCACCAAATCTGATGAACTATGGACGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG AACTTTACACAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCA GGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCG TATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 451)
Protein :
MGCVCSSNPEGTELACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDH LWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLE NTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQT DRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPL EYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLMNYGRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTML NFTQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGA GFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 452)
LCK: : FUS : : CbzRS
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCCTCAAACGATTATACCCAACAAGCAACCCAA AGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGT TACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAG AACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCC CAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGT TACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGT GGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAAC AGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGT GGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGAC CGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGT GGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTC AATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATC TTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAG ACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCA ACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAAT CCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGG CGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGT GGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAAC TTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCT CACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGGCGCCCCCGGCTCCGCCGGCTCCGCC GCCGGCTCCGGCATGGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAA CCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCAC GAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCT CGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTG AACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACT AAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCG TCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACC AGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATG TCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCG AAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAA GACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTC GTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGAC AATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTTGCACCAAAT CTGATGAACTATGGACGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTAT CGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTACACAAATGGGTTCAGGTTGT ACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGAC AGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCA ATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAA GTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTG TAA (SEQ ID NO: 453)
Protein :
MGCVCSSNPEGTELASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQ NTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYG GQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQD RGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTI FVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGN PIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMN FSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGGAPGSAGSAAGSGMACPVPLQLPPLERLTLDDKK PLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDL NKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVST SISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKK DLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPN LMNYGRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFTQMGSGCTRENLESIITDFLNHLGIDFKIVGD SCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL
* (SEQ ID NO: 454)
TOM20 : : FUS : : SYNZIP1 : : CpkRS
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATCTGGTGGCCCAGCTGGAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAAC CTGCACAAGAAGGACCTGATCGCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAAGGCAAG CCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGA GCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTG ATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCG AAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCA CTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACA AAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCG AAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTC TCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATT AGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAA GCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGC CTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATC TATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTT CTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTG AGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTTTCACCAAATCTGTATAACTATCTG CGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGAC GGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTG GAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTAT GGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGT GAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTC AAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 455)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND
YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG
YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIANLVAQLENEVASLENENETLKKKNLHKKDLIAYLEKEIANLRKKIEEGK PIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRS KIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMP KSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQ ASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGF LEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLSPNLYNYLRKLDRALPDPIKIFEIGPCYRKESD GKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDR EWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 456)
KIF16B : : FUS : : CbzRS
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA TATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATG GCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCC CAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGC CAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGC TCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGC CAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAG AGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCC CCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAAC TATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGT GGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGC GGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGA GGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGAC TCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTG GCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACA GACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATT GACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAAT CGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGC AGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGG AAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAA CCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGA GGAGGCGATTACAAGGATGACGACGATAAGGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGC GCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTG ATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCG AAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCA CTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACA AAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCG AAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTC TCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATT AGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAA GCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGC CTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATC TATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTT CTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTG AGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTTGCACCAAATCTGATGAACTATGGA CGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGAC GGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTACACAAATGGGTTCAGGTTGTACTCGTGAGAACCTG GAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTAT GGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGT GAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTC AAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 457)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTYTDIEMNRLGKGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYS QQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYG QQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGN YGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGR GGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYT DRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGG SGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGR GGDYKDDDDKGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRS KIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMP KSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQ ASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGF LEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLMNYGRKLDRALPDPIKIFEIGPCYRKESD GKEHLEEFTMLNFTQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDR EWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 458)
EBAG9I-29 : : FUS : : CpkRS
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAA CAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTAC GGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTAT GGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGT AGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGC ACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAG CCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAG AACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCC ATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTAT GGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAAC CGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGAC CGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGAC AACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATT GGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTG AAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAA TTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGT GGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGA GGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGT GAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGA CCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAGGCGCC GATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTG CCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACT GGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATT GAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCAC AAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAG GACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCT CGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATT CCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCC ACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCA GCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGC AAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAA CGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAA TCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATT TTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTTTCACCAAATCTGTATAACTATCTGCGCAAACTGGAC CGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACAT CTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATC ACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTG GATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATC GACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAA CGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 459)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPY GQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSS TSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSS MSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSD RGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKL KGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRG GFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAGA DYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYI EMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVA RAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAP ALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIK SPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLSPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEH LEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGI DKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 460)
TOM20 : : FUS : : CbzRS
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT
GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT
GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC
AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT
GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT
CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA
GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC
GCAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCT
GGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTG
AATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTT
AGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACA
GCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAA
TTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAA
GCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGA
AGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATT
AGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCC
CCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGAC
GAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTG
CAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGAT
CGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGAT
ACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTTGCACCAAATCTGATG
AACTATGGACGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAA
GAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTACACAAATGGGTTCAGGTTGTACTCGT
GAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGT
ATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCG
CTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAA
CACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA
(SEQ ID NO: 461)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAGKPIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPL NTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDLNK FLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSI SSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDL QQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLM NYGRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFTQMGSGCTRENLESIITDFLNHLGIDFKIVGDSC MVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 462)
EBAG9I-29 : : FUS : : CbzRS
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAA CAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTAC GGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTAT GGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGT AGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGC ACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAG CCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAG AACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCC ATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTAT GGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAAC CGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGAC CGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGAC AACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATT GGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTG AAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAA TTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGT GGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGA GGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGT GAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGA CCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAGGCGCC GATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTG CCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACT GGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATT GAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCAC AAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAG GACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCT CGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATT CCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCC ACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCA GCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGC AAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAA CGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAA TCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATT TTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTTGCACCAAATCTGATGAACTATGGACGCAAACTGGAC CGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACAT CTGGAGGAGTTTACCATGCTGAACTTTACACAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATC ACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTG GATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATC GACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAA CGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 463)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPY GQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSS TSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSS MSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSD RGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKL KGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRG GFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAGA DYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYI EMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVA RAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAP ALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIK SPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLMNYGRKLDRALPDPIKIFEIGPCYRKESDGKEH LEEFTMLNFTQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGI DKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 464)
TOM20 : : FUS : : SYNZIP1 : : CbzRS
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATCTGGTGGCCCAGCTGGAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAAC CTGCACAAGAAGGACCTGATCGCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAAGGCAAG CCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGA GCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTG ATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCG AAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCA CTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACA AAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCG AAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTC TCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATT AGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAA GCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGC CTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATC TATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTT CTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTG AGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTTGCACCAAATCTGATGAACTATGGA CGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGAC GGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTACACAAATGGGTTCAGGTTGTACTCGTGAGAACCTG GAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTAT GGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGT GAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTC AAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 465)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIANLVAQLENEVASLENENETLKKKNLHKKDLIAYLEKEIANLRKKIEEGK PIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRS KIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMP KSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQ ASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGF LEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLMNYGRKLDRALPDPIKIFEIGPCYRKESD GKEHLEEFTMLNFTQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDR EWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 466) KIF16B : : FUS : : CpkRS
DNA :
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT AT T C AGAT G GAGAAAAG C AAAAC GAC AAT C AC AAAC T T AAAGAT AC C AGAAG GAG G C AC T G G G GAC T C AG GAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAG CAT C C C AAAGAAG G C C C T TAT GT T GAG GAT T TAT C C AAAC AT T T AGT AC AGAAT TAT G GT GAC GT AGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TAT G C AAAT AGAG C C AAAAAC AT CAT C AAC AAG C C T AC CAT T AAT GAG GAT G C C AAC GT C AAAC T TAT C C GT GAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA TATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATG GCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCC CAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGC C AGAG C AG C TAT T C T T C T TAT G G C C AGAG C C AGAAC AC AG G C TAT G GAAC T C AGT C AAC T C C C C AG G GAT AT G G C TCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGC C AG C AG C C AG C T C C C AG C AG C AC C T C G G GAAGT TACGGTAG C AGT T C T C AGAG C AG C AG C TAT G G G C AG C C C C AG AGT G G GAG C T AC AG C C AG C AG C C T AG C TAT G GT G GAC AG C AG C AAAG C TAT G GAC AG C AG C AAAG C TAT AAT C C C CCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAAC TATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGT GGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGC GGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGA GGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGAC TCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTG G C T GAT T AC T T C AAG C AGAT T G GT AT TAT T AAGAC AAAC AAGAAAAC G G GAC AG C C CAT GAT T AAT T T GT AC AC A GACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATT GACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAAT CGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGC AGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGG AAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAA CCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGA GGAGGCGATTACAAGGATGACGACGATAAGGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGC GCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTG ATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCG AAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCA CTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACA AAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCG AAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTC TCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATT AGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAA GCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGC CTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATC TATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTT CTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTG AGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTTTCACCAAATCTGTATAACTATCTG CGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGAC GGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTG GAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTAT GGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGT GAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTC AAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 467)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTYTDIEMNRLGKGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYS QQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYG QQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGN YGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGR GGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYT DRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGG SGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGR GGDYKDDDDKGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRS KIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMP KSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQ ASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGF LEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLSPNLYNYLRKLDRALPDPIKIFEIGPCYRKESD GKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDR EWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 468)
LCK : : FUS : : CpkRS
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCCTCAAACGATTATACCCAACAAGCAACCCAA AGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGT TACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAG AACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCC CAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGT TACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGT GGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAAC AGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGT GGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGAC CGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGT GGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTC AATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATC TTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAG ACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCA ACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAAT CCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGG CGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGT GGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAAC TTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCT CACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGGCGCCCCCGGCTCCGCCGGCTCCGCC GCCGGCTCCGGCATGGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAA CCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCAC GAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCT CGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTG AACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACT AAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCG TCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACC AGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATG TCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCG AAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAA GACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTC GTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGAC AATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTTTCACCAAAT CTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTAT CGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGT ACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGAC AGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCA ATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAA GTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTG TAA (SEQ ID NO: 469)
Protein :
MGCVCSSNPEGTELASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQ NTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYG GQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQD RGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTI FVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGN PIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMN FSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGGAPGSAGSAAGSGMACPVPLQLPPLERLTLDDKK PLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDL NKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVST SISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKK DLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLSPN LYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGD SCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL
* (SEQ ID NO: 470)
LCK : : CpkRS
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG GAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTTTCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG AACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCA GGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCG TATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 471)
Protein :
MGCVCSSNPEGTELACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDH LWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLE NTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQT DRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPL EYIERMGIDNDTELSKQIFRVDKNFCLRPMLSPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTML NFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGA GFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 472) TOM20 : : FUS : : SYNZIP3 : : CbzRS
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATGAGGTGACCACCCTGGAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATC GCCAGACTGAGAAAGGAAAAGGCCGCTCTGCGGAACAGACTGGCCCACAAGAAGGGCAAGCCTATTCCCAACCCC CTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCG CTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGT CTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAG ATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAA TATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGAC CAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGT GCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCT GTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACC GCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCA CTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAA CCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGT GAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCC CCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTC CGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTTGCACCAAATCTGATGAACTATGGACGCAAACTGGACCGT GCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTG GAGGAGTTTACCATGCTGAACTTTACACAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACC GATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGAT GTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGAC AAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGT GCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 473)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND
YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG
YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY
GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG
GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF
KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG
NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP
GGGPGGSHMGGNYGDDRRGGRGGAIANEVTTLENDAAFIENENAYLEKEIARLRKEKAALRNRLAHKKGKPIPNP LLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIE MACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVAR APKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPA LTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKS PILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLMNYGRKLDRALPDPIKIFEIGPCYRKESDGKEHL EEFTMLNFTQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGID KPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 474)
TOM20 : : FUS : : SYNZIP3 : : CpkRS
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATGAGGTGACCACCCTGGAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATC GCCAGACTGAGAAAGGAAAAGGCCGCTCTGCGGAACAGACTGGCCCACAAGAAGGGCAAGCCTATTCCCAACCCC CTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCG CTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGT CTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAG ATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAA TATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGAC CAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGT GCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCT GTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACC GCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCA CTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAA CCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGT GAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCC CCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTC CGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTTTCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGT GCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTG GAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACC GATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGAT GTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGAC AAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGT GCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 475)
Protein : MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIANEVTTLENDAAFIENENAYLEKEIARLRKEKAALRNRLAHKKGKPIPNP LLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIE MACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVAR APKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPA LTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKS PILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLSPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHL EEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGID KPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 476)
TOM20 : : EWSR1 : : PylRS (AA) : : FUS : : PylRS (AA)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAGTTCTTCATGGCGTCCACGGAT TACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCA CAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCT CAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCA ACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTC ACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAG CCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACA GGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCC ATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAG AGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGC AGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGC CAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAG GAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGG GGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGA GGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGAT GAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTC TTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAA ACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTT GATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGG GGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCT GGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCC CGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGA AACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCC TTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGAT CGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATG GACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATC GCATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCT AACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAAC GGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCT GCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTA ACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGA AACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGGCTAAGCCCCGACCGCGTTAGAGCCGTATCC CACTGGTCTTCCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCG CTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAG GTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGT ACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAAC AAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAA AAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCT GGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGC ATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCT GCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAA GACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGAC CTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTG GATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAAT GATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTG TATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGT AAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACT CGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGC TGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATT CCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTG AAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGGCC TCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAG CAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAG AGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCG ACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAG CAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGT GGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCT CAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTAT GGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGA GGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGC GGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGT GGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCC GAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCT GATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGAC AGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGAC TGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGG GGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGT GGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAG TGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCA GATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGA GGCGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAA ATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAAT AGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCC GATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCT CCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCA CAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCA AGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCG ATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTT CTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCA CGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATC ACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGT ATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATG CTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATC GGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATG GGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAA ATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCC GTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAA CGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATC TCTACGAACCTGTAA (SEQ ID NO: 477)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASTD
YSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTP
TAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSST
GGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQS SYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRG GFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADF FKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMR GGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCG NQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGM DRGGFGGGRRGGPGGPPGPLMEQAIAYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFAN GIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDG NPIPSAIAANSGIYGLSPDRVRAVSHWSSACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHE VSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTK KAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMS APVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFV DRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYR KESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPI PLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNLASNDYTQQATQSYGAYPTQPGQGYSQ QSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQ QPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNY GQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRG GMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTD RETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGS GGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRG GDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVS DEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPA SVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLS RRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPM LAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFK IVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGI STNL* (SEQ ID NO: 478)
LCK: : PylRS (AF) : : FUS : : PylRS (AF)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG GAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG AACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCG GGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCC TATTACAATGGTATTTCTACTAACCTGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTAC CCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGC CAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGA ACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGG CAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCT CAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGC TATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGT GGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGC GGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGC AGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGA GGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGC CCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTG GGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACG GGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGAT GACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCA TTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATG GGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGT GGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAAT GAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAAC TACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTG TGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATG GCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATAT CGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAA ACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCC CCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTT TCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCT AGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTG ACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCG TTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAG AACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCG ATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGT GTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCC CTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAG GAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGAT TTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTC ATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAA CCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCT GCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 479)
Protein :
MGCVCSSNPEGTELACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDH
LWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLE
NTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQT
DRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPL
EYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTML
NFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGA
GFGLERLLKVKHDFKNIKRAARSESYYNGISTNLASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYS
QSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSS
QSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGG
GYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGG
PRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFD
DPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGG
GGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGDKKPLNTLISATGL
WMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQ
TSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATA
SALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERE
NYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRA
LPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDV
MHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID
NO: 480)
LCK : : FUS : : PylRS (AF) : : EWSR1 : : PylRS (AF)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCCTCAAACGATTATACCCAACAAGCAACCCAA
AGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGT T AC AGT G GT TAT AG C C AGT C C AC G GAC AC T T C AG GAT AT G G C C AGAG C AG C TAT T C T T C T TAT G G C C AGAG C C AG AACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCC CAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGT TACGGTAG C AGT T C T C AGAG C AG C AG C TAT G G G C AG C C C C AGAGT G G GAG C T AC AG C C AG C AG C C T AG C TAT G GT G GAC AG C AG C AAAG C TAT G GAC AG C AG C AAAG C T AT AAT C C C C C T C AG G G C TAT G GAC AG C AGAAC C AGT AC AAC AGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGT GGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGAC CGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGT GGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTC AAT AAAT TTGGTGGCCCTCGG GAC C AAG GAT C AC GT CAT GAC T C C GAAC AG GAT AAT T C AGAC AAC AAC AC CAT C TTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAG AC AAAC AAGAAAAC G G GAC AG C C CAT GAT T AAT T T GT AC AC AGAC AG G GAAAC T G G C AAG C T GAAG G GAGAG G C A ACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAAT CCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGG CGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGT GGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAAC TTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCT CACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGGCGCCCCCGGCTCCGCCGGCTCCGCC GCCGGCTCCGGCATGGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAA CCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCAC GAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCT CGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTG AACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACT AAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCG TCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACC AG CAT TAG C AGT AT TAG C AC C G GT G C C AC CGCTAGCGCCCTGGT T AAAG G C AAT AC C AAT C C GAT T AC AAG CAT G TCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCG AAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAA GACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTC GTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGAC AATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAAT CTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTAT CGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGT ACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGAC AGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCA ATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAA GTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTG ATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCC ACT C AAG GAT AT G C AC AGAC C AC C C AG G CAT AT G G G C AAC AAAG C TAT G GAAC C TAT G GAC AG C C C AC T GAT GT C AGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACT GGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACC ACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCA G C C TAT G G G C AG C AG C C AG C AG C C AC T G C AC C T AC AAGAC C G C AG GAT G GAAAC AAG C C C AC T GAGAC T AGT C AA C C T C AAT C T AG C AC AG G G G GT T AC AAC C AG C C C AG C C T AG GAT AT G GAC AGAGT AAC T AC AGT TAT C C C C AG GT A CCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCG AC T AGT TAT GAT C AGAG C AGT T AC T C T C AG C AGAAC AC C TAT G G G C AAC C GAG C AG C TAT G GAC AG C AGAGT AG C TAT G GT C AAC AAAG C AG C TAT G G G C AG C AG C C T C C C AC TAGTTACC C AC C C C AAAC T G GAT C C T AC AG C C AAG C T C C AAGT C AAT AT AG C C AAC AGAG C AG C AG C T AC G G G C AG C AGAGT T CAT T C C GAC AG GAC C AC C C C AGT AG CAT G GGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGG GGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGC AGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCA CCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGAT GAT C T G G C AGAC T T C T T T AAG C AGT GTGGGGTTGT T AAGAT GAAC AAGAGAAC T G G G C AAC C CAT GAT C C AC AT C T AC C T G GAC AAG GAAAC AG GAAAG C C C AAAG G C GAT G C C AC AGT GT C C TAT GAAGAC C C AC C T AC T G C C AAG G C T GCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCA ATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGA GGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGA GGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCC AATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGC TTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGA GGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGT GGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTG ATGGAACAGGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATT CATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTG AACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGT GTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTT AGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAA GCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTT CCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACC AATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTG GAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTG CTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGT GAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATC GAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGC CCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTC GAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGC CAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGAC TTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCT AGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGT CTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAAT GGTATTTCTACTAACCTGTAA (SEQ ID NO: 481)
Protein :
MGCVCSSNPEGTELASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQ NTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYG GQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQD RGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTI FVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGN PIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMN FSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGGAPGSAGSAAGSGMACPVPLQLPPLERLTLDDKK PLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDL NKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAIPVSTQESVSVPASVST SISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKK DLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPN LANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGD SCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL MASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPT GYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQ PQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSS YGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNR GRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLD DLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPP MNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCP NPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFR GGRGMDRGGFGGGRRGGPGGPPGPLMEQDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLW NNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTE AAQAQPSGSKFSPAIPVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRL EVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYI ERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFC QMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFG LERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 482)
LCK : : FUS : : PylRS (AF) : : FUS : : PylRS (AF)
DNA: ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCCTCAAACGATTATACCCAACAAGCAACCCAA AGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGT T AC AGT G GT TAT AG C C AGT C C AC G GAC AC T T C AG GAT AT G G C C AGAG C AG C TAT T C T T C T TAT G G C C AGAG C C AG AACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCC CAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGT TACGGTAG C AGT T C T C AGAG C AG C AG C TAT G G G C AG C C C C AGAGT G G GAG C T AC AG C C AG C AG C C T AG C TAT G GT G GAC AG C AG C AAAG C TAT G GAC AG C AG C AAAG C T AT AAT C C C C C T C AG G G C TAT G GAC AG C AGAAC C AGT AC AAC AGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGT GGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGAC CGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGT GGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTC AAT AAAT TTGGTGGCCCTCGG GAC C AAG GAT C AC GT CAT GAC T C C GAAC AG GAT AAT T C AGAC AAC AAC AC CAT C TTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAG AC AAAC AAGAAAAC G G GAC AG C C CAT GAT T AAT T T GT AC AC AGAC AG G GAAAC T G G C AAG C T GAAG G GAGAG G C A ACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAAT CCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGG CGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGT GGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAAC TTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCT CACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGGCGCCCCCGGCTCCGCCGGCTCCGCC GCCGGCTCCGGCATGGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAA CCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCAC GAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCT CGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTG AACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACT AAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCG TCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACC AG CAT TAG C AGT AT TAG C AC C G GT G C C AC CGCTAGCGCCCTGGT T AAAG G C AAT AC C AAT C C GAT T AC AAG CAT G TCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCG AAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAA GACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTC GTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGAC AATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAAT CTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTAT CGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGT ACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGAC AGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCA ATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAA GTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTG GCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCC CAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGC C AGAG C AG C TAT T C T T C T TAT G G C C AGAG C C AGAAC AC AG G C TAT G GAAC T C AGT C AAC T C C C C AG G GAT AT G G C TCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGC C AG C AG C C AG C T C C C AG C AG C AC C T C G G GAAGT TACGGTAG C AGT T C T C AGAG C AG C AG C TAT G G G C AG C C C C AG AGT G G GAG C T AC AG C C AG C AG C C T AG C TAT G GT G GAC AG C AG C AAAG C TAT G GAC AG C AG C AAAG C TAT AAT C C C CCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAAC TATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGT GGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGC GGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGA GGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGAC TCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTG G C T GAT T AC T T C AAG C AGAT T G GT AT TAT TAAGAC AAAC AAGAAAAC G G GAC AG C C CAT GAT T AAT T T GT AC AC A GACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATT GACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAAT CGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGC AGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGG AAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAA CCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGA GGAGGCGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGC GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTG GAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGT ATTTCTACTAACCTGTAA (SEQ ID NO: 483)
Protein :
MGCVCSSNPEGTELASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQ NTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYG GQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQD RGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTI FVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGN PIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMN FSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGGAPGSAGSAAGSGMACPVPLQLPPLERLTLDDKK PLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDL NKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVST SISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKK DLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPN LANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGD SCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL ASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYG STGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNP PQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGG GGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESV ADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFN RGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPK PDGPGGGPGGSHMGGNYGDDRRGGRGGDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVW NSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEA AQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLE VLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIE RMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQ MGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGL ERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 484)
TOM20 : : FUS : : PylRS (AF) : : EWSR1 : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AG C C AG C AG C C T AG C TAT G GT G GAC AG C AG C AAAG C TAT G GAC AG C AG C AAAG C T AT AAT C C C C C T C AG G G C TAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAG C AGAT T G GT AT TAT T AAGAC AAAC AAGAAAAC G G GAC AG C C CAT GAT T AAT T T GT AC AC AGAC AG G GAAAC T GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCT GGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTG AATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTT AGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACA GCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAA TTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAA GCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGA AGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATT AGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCC CCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGAC GAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTG CAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGAT CGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGAT ACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCT AACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAA GAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGT GAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGT ATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCG CTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAA CACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGATGGCG T C C AC G GAT T AC AGT AC C TAT AG C C AAG C T G C AG C G C AG C AG G G C T AC AGT G C T T AC AC C G C C C AG C C C AC T C AA G GAT AT G C AC AGAC C AC C C AG G CAT AT G G G C AAC AAAG C TAT G GAAC C TAT G GAC AG C C C AC T GAT GT C AG C TAT ACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTAT ACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACT GCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTAT G G G C AG C AG C C AG C AG C C AC T G C AC C T AC AAGAC C G C AG GAT G GAAAC AAG C C C AC T GAGAC T AGT C AAC C T C AA T C T AG C AC AG G G G GT T AC AAC C AG C C C AG C C T AG GAT AT G GAC AGAGT AAC T AC AGT TAT C C C C AG GT AC C T G G G AGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGT TAT GAT C AGAG C AGT T AC T C T C AG C AGAAC AC C TAT G G G C AAC C GAG C AG C TAT G GAC AG C AGAGT AG C TAT G GT C AAC AAAG C AG C TAT G G G C AG C AG C C T C C C AC TAGTTACC C AC C C C AAAC T G GAT C C T AC AG C C AAG C T C C AAGT C AAT AT AG C C AAC AGAG C AG C AG C T AC G G G C AG C AGAGT T CAT T C C GAC AG GAC C AC C C C AGT AG CAT G G GT GT T TATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGG GGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCT GGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTA GAT CCAGAT GAAGAC T C T GAC AAC AGT G C AAT T TAT GTACAAGGATTAAAT GACAGT GT GACT CTAGAT GAT CT G G C AGAC T T C T T T AAG C AGT GTGGGGTTGT T AAGAT GAAC AAGAGAAC T G G G C AAC C CAT GAT C C AC AT C T AC C T G GACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTG GAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAAC AGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCA GGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCC CGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCG GGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTC CCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGC CTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGC CGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAA CAGGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAA ATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAAT AGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCC GATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCT CCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCA CAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCA AGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCG ATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTT CTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCA CGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATC ACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGT ATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATG CTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATC GGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATG GGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAA ATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCC GTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAG CGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATT TCTACTAACCTGTAA (SEQ ID NO: 485)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAGKPIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPL NTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDLNK FLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSI SSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDL QQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLA NYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSC MVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNLMA STDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGY TTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQ SSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYG QQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGR GRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDL ADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMN SMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNP GCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGG RGMDRGGFGGGRRGGPGGPPGPLMEQDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWN SRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAA QAQPSGSKFSPAI PVSTQESVSVPASVSTSI SSI STGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEV LLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIER MGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQM GSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLE RLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 486)
TOM20 : : FUS : : PylRS (AF) : : FUS : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT C AG C C C T AC G GAC AG C AGAGT T AC AGT G GT TAT AG C C AGT C C AC G GAC AC T T C AG GAT AT G G C C AGAG C AG C TAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AG C C AG C AG C C T AG C TAT G GT G GAC AG C AG C AAAG C TAT G GAC AG C AG C AAAG C T AT AAT C C C C C T C AG G G C TAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAG C AGAT T G GT AT TAT T AAGAC AAAC AAGAAAAC G G GAC AG C C CAT GAT T AAT T T GT AC AC AGAC AG G GAAAC T GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCT GGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTG AATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTT AGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACA GCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAA TTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAA GCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGA AGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATT AGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCC CCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGAC GAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTG CAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGAT CGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGAT ACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCT AACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAA GAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGT GAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGT ATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCG CTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAA CACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGGCCTCA AACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAG AGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGC AGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACT GGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAG CCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGG AG C T AC AG C C AG C AG C C T AG C TAT G GT G GAC AG C AG C AAAG C TAT G GAC AG C AG C AAAG C TAT AAT C C C C C T C AG GGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGC CAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGT GGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGC GGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGC ATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAA CAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGAT T AC T T C AAG C AGAT T G GT AT TAT T AAGAC AAAC AAGAAAAC G G GAC AG C C CAT GAT T AAT T T GT AC AC AGAC AG G GAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGG TTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGT GGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGT GGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGT CCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGAT GGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGC GATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATC AAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGC CGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGAT GAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCT ACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAG GCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGT GTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATT ACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTG CTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGT CGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACC CGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATG GGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTA GCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGC CCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGT TCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATT GTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTT GTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGT CTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCT ACTAACCTGTAA (SEQ ID NO: 487)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAGKPIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPL NTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDLNK FLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSI SSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDL QQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLA NYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSC MVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNLAS NDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGST GGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQ GYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGG GGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVAD YFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRG GGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPD GPGGGPGGSHMGGNYGDDRRGGRGGDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNS RSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQ AQPSGSKFSPAI PVSTQESVSVPASVSTSI SSI STGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVL LNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERM GIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMG SGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLER LLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 488)
TOM20 : : EWSR1 : : 4clN22 : : PylRS (AA) : : FUS : : PylRS (AA)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAGTTCTTCATGGCGTCCACGGAT TACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCA CAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCT CAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCA ACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTC ACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAG C C AG C AG C C AC T G C AC C T AC AAGAC C G C AG GAT G GAAAC AAG C C C AC T GAGAC T AGT C AAC C T C AAT C T AG C AC A G G G G GT T AC AAC C AG C C C AG C C T AG GAT AT G GAC AGAGT AAC T AC AGT TAT C C C C AG GT AC C T G G GAG C T AC C C C AT G C AG C C AGT C AC T G C AC C T C CAT CCTACCCTCCTAC C AG C TAT T C C T C T AC AC AG C C GAC T AGT TAT GAT C AG AG C AGT T AC T C T C AG C AGAAC AC C TAT G G G C AAC C GAG C AG C TAT G GAC AG C AGAGT AG C TAT G GT C AAC AAAG C AG C TAT G G G C AG C AG C C T C C C AC TAGTTACC C AC C C C AAAC T G GAT C C T AC AG C C AAG C T C C AAGT C AAT AT AG C C AAC AGAG C AG C AG C T AC G G G C AG C AGAGT T CAT T C C GAC AG GAC C AC C C C AGT AG CAT G G GT GT T TAT G G G C AG GAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGG GGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGA GGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGAT GAAGAC T C T GAC AAC AGT G C AAT T TAT GTACAAGGATTAAAT GACAGT GT GACT CTAGAT GAT CT GGCAGACTT C T T T AAG C AGT GTGGGGTTGT T AAGAT GAAC AAGAGAAC T G G G C AAC C CAT GAT C C AC AT C T AC C T G GAC AAG GAA ACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTT GATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGG GGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCT GGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCC CGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGA AACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCC TTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGAT CGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATG GACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATC GCAGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTA G C C AC CAT G GAC G C AC AAAC AC GAC GAC GT GAG CGTCGCGCT GAGAAAC AAG C T C AAT G GAAAG C T G C AAAC C C A CCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACA CGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCT GGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCT GAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGA GCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAA GCTGCAAACCCACCGCTCGGGCTAAGCCCCGACCGCGTTAGAGCCGTATCCCACTGGTCTTCCGCGTGCCCGGTG CCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACT GGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATT GAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCAC AAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAG GACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCT CGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATT CCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCC ACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCA GCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGC AAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAA CGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAA TCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATT TTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGAC CGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACAT CTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATC ACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTG GATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATC GACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAA CGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGGCCTCAAACGATTATACCCAACAAGCA ACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAG C AGAGT T AC AGT G GT TAT AG C C AGT C C AC G GAC AC T T C AG GAT AT G G C C AGAG C AG C TAT T C T T C T TAT G G C C AG AGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAG AGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCG G GAAGT TACGGTAG C AGT T C T C AGAG C AG C AG C TAT G G G C AG C C C C AGAGT G G GAG C T AC AG C C AG C AG C C T AG C TAT G GT G GAC AG C AG C AAAG C TAT G GAC AG C AG C AAAG C TAT AAT C C C C C T C AG G G C TAT G GAC AG C AGAAC C AG TACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGT AGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAG CAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGC AGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGT
GGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAAC
ACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATT
ATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGA
GAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCC
GGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGC
CGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTT
CCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAAT
ATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGT
GGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGATAAAAAACCGCTGAATACC
CTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGT
TCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGT
GCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTG
ACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATG
CCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAA
TTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGT
ATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTT
CAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATC
AGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAA
ATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGC
TTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAA
CTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTAT
CTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCC
GACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAAC
CTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTG
TATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGAC
CGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGAC
TTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA ( SEQ ID
NO: 489)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASTD YSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTP TAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSST GGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQS SYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRG GFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADF FKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMR GGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCG NQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGM DRGGFGGGRRGGPGGPPGPLMEQAIAGAPGSAGSAAGSGEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANP PLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRA EKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLGLSPDRVRAVSHWSSACPV PLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHH KYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSG KPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQI FRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESII TDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIK RAARSESYYNGISTNLASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQ SQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPS YGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQ QDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNN TIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFS GNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCEN MNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGDKKPLNTLISATGLWMSRTGTIHKIKHHEVSR SKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAM PKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPV QASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRG FLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKES DGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLD REWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 490)
LCK : : EWSR1 : : MCP : : PylRS (AA) : : FUS : : PylRS (AA)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCATGGCGTCCACGGATTACAGTACCTATAGCCAA GCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATAT GGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTAT GGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATAC AGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCC TATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCT ACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCC AGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCA CCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAG AACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCT CCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTAC GGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCC GGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGC ATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCT GGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGT GCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTT GTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGC GATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAA GGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGT GAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGC ATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGA GGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGG AGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGT GATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGA ATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGT GGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATCGCATATCCCTATGATGTG CCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTT CTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATC AGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATAC ACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCC ACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCA ATCGCAGCAAACTCCGGCATCTACGGGCTAAGCCCCGACCGCGTTAGAGCCGTATCCCACTGGTCTTCCGCGTGC CCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCT GCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATC TATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGT CACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCC AATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCC GTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCG GCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACC GGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCA GCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAAT TCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCC GAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAG ATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAA CAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAA CTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAA GAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGC ATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGAC ACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGG GGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAAC ATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGGCCTCAAACGATTATACCCAA
CAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTAC
GGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTAT
GGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGT
AGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGC
ACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAG
CCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAG
AACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCC
ATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTAT
GGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAAC
CGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGAC
CGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGAC
AACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATT
GGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTG
AAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAA
TTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGT
GGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGA
GGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGT
GAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGA
CCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGATAAAAAACCGCTG
AATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTT
AGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACA
GCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAA
TTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAA
GCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGA
AGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATT
AGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCC
CCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGAC
GAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTG
CAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGAT
CGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGAT
ACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTAT
AACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAA
GAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGT
GAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGT
ATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCG
CTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAA
CACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA
(SEQ ID NO: 491)
Protein :
MGCVCSSNPEGTELMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATY GQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAP TRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQ NTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFS GPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNS AIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQ GSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSG GGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGG MFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAYPYDVPDYAGAPGSAGSAAGSGASNFTQFV LVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFA TNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGLSPDRVRAVSHWSSACPVPLQLPPLERLTLDDKKPLNTLIS ATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKA NEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSIST GATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYA EERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRK LDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGD TLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNLASNDYTQ QATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGS SQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQ NQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYN RSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQI GIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGR GGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGG PGGSHMGGNYGDDRRGGRGGDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRT ARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSG SKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKD EISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDND TELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTR ENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVK HDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 492)
LCK : : PylRS (AA) : : FUS : : PylRS (AA)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG GAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG GCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCA GGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCG TATTATAACGGGATCTCTACGAACCTGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTAC CCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGC CAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGA ACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGG CAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCT CAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGC TATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGT GGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGC GGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGC AGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGA GGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGC CCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTG GGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACG GGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGAT GACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCA TTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATG GGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGT GGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAAT GAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAAC TACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTG TGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATG GCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATAT CGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAA ACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCC CCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTT TCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCT AGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTG ACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCG TTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAG AACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCG ATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGT GTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCC CTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAG GAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGAT TTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTC ATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAA CCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCC GCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 493)
Protein :
MGCVCSSNPEGTELACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDH
LWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLE
NTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQT
DRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPL
EYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTML
AFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGA
GFGLERLLKVKHDFKNIKRAARSESYYNGISTNLASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYS
QSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSS
QSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGG
GYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGG
PRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFD
DPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGG
GGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGDKKPLNTLISATGL
WMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQ
TSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATA
SALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERE
NYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRA
LPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDV
MHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID
NO: 494)
LCK : : PylRS (AF) : : EWSR1 : : PylRS (AF)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG GAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG AACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCG GGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCC TATTACAATGGTATTTCTACTAACCTGATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAG GGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTAT GGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTAT GCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAG GGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCT GCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGAT GGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGA CAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCT CCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAA CCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCA CCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCA TTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAAC CGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGG CGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGAT GAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAA GGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAG AGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCC TATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAA GTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATG CCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGA GGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAG CACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAAC CAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGT GGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGC CGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGT GGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGT CTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAG ATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAA TATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGAC CAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGT GCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCT GTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACC GCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCA CTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAA CCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGT GAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCC CCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTC CGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGT GCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTG GAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACC GATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGAT GTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGAC AAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGT GCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 495)
Protein :
MGCVCSSNPEGTELACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDH LWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLE NTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQT DRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPL EYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTML NFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGA GFGLERLLKVKHDFKNIKRAARSESYYNGISTNLMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSY GTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQS AYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYP PTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSS
FRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMD
EGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVS
YEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRG
GDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRG
GPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQDKKPLNTLISATG
LWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANED
QTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGAT
ASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEER
ENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDR
ALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLD
VMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID
NO: 496)
TOM20 : : FUS : : MCP : : PylRS (AF) : : EWSR1 : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCT AACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAAC GGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCT GCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTA ACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGA AACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGCGCCGATTACAAGGACGATGATGACAAGGGA GCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGC CTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACC ATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTT GTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGC CGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTC GTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACT GAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCT GTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAAT ACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGT CTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAA CTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAA CGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTAT ATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTG CGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATC TTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTT TGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATT GACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTG TCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTT GGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTAC AATGGTATTTCTACTAACCTGATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTAC AGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACC TATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACT TCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTAT GGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATAT GGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAAC AAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGT AACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACC AGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGC AGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAA ACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGA CAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGC ATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGA GGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGA CCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTA AATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACT GGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAA GACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCC CTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCA CCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGAT AGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGA GCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGT AAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCT GGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGT GGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCT GGGGGGCCCCCTGGACCTTTGATGGAACAGGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGG ATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCG TGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGT AAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACA AGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCT AAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCT ACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGC GCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACA AAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTT CGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAAC TATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATT CTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTG GATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTG CCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAG TTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTT CTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATG CACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCT TGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCA CGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 497)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND
YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG
YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY
GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG
GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF
KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFAN GIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDG NPIPSAIAANSGIYGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGT IHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKV VSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGN TNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLE REITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKI FEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLEL SSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNLMASTDYSTYSQAAAQQGY SAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGY GTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQS NYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQ TGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRG GGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRT GQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPP PLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQC KAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGP GGPPGPLMEQDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYR KTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVS TQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPF RELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRV DKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDF LNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAA RSESYYNGISTNL* (SEQ ID NO: 498)
TOM20 : : FUS : : 4clN22 : : PylRS (AF) : : EWSR1 : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCT GGTAGTGGAGCATCGATAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCACAAACACGA CGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGC GCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAG AAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCT GGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCT GCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGAC GCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCT AGAGGGCCCGTTGGTGCTCCTGGTTCAGCAGGAAGCGCAGCAGGATCAGGTGCGTGCCCGGTGCCGCTGCAGCTG CCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATG AGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGT GGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAA ACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGC GTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAA CCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACC CAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCC CTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAA TCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGT GAAC T G GAGAG C GAAC T G C T GT C AC GT C GT AAAAAAGAC C T G C AAC AAAT C TAT G C C GAAGAAC GT GAGAAC TAT CTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTG ATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGAT AAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCT GATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTT ACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTG AACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCAC GGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGG ATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGT T C C GAGT C C TAT T AC AAT G GT AT T T C T AC T AAC C T GAT G G C GT C C AC G GAT T AC AGT AC C TAT AG C C AAG C T G C A G C G C AG C AG G G C T AC AGT G C T T AC AC C G C C C AG C C C AC T C AAG GAT AT G C AC AGAC C AC C C AG G CAT AT G G G C AA C AAAG C TAT G GAAC C TAT G GAC AG C C C AC T GAT GT C AG C TAT AC C C AG G C T C AGAC C AC T G C AAC C TAT G G G C AG ACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAG CCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCA GCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGA C C G C AG GAT G GAAAC AAG C C C AC T GAGAC T AGT C AAC C T C AAT C T AG C AC AG G G G GT T AC AAC C AG C C C AG C C T A G GAT AT G GAC AGAGT AAC T AC AGT TAT C C C C AG GT AC C T G G GAG C T AC C C CAT G C AG C C AGT C AC T G C AC C T C C A TCCTACCCTCCTAC C AG C TAT T C C T C T AC AC AG C C GAC T AGT TAT GAT C AGAG C AGT T AC T C T C AG C AGAAC AC C TAT G G G C AAC C GAG C AG C TAT G GAC AG C AGAGT AG C TAT G GT C AAC AAAG C AG C TAT G G G C AG C AG C C T C C C AC T AGT T AC C C AC C C C AAAC T G GAT C C T AC AG C C AAG C T C C AAGT C AAT AT AG C C AAC AGAG C AG C AG C T AC G G G C AG CAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCA GGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGC AGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGA C C CAT G GAT GAAG GAC C AGAT C T T GAT C T AG G C C C AC C T GT AGAT C C AGAT GAAGAC T C T GAC AAC AGT G C AAT T TATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAG AT GAAC AAGAGAAC T G G G C AAC C CAT GAT C C AC AT C T AC C T G GAC AAG GAAAC AG GAAAG C C C AAAG G C GAT G C C ACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGC AAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGC AGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGA GGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGA AACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACA GAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGT GGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTC AGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGA AGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGATAAAAAACCGCTGAATACCCTGATCTCT GCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATC TATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGT CACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCC AATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCC GTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCG GCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACC GGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCA GCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAAT TCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCC GAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAG ATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAA CAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAA CTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAA GAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGC ATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGAC ACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGG GGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAAC ATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 499)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND
YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG
YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY
GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG
GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF
KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG
NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP
GGGPGGSHMGGNYGDDRRGGRGGAIAGKPIPNPLLGLDSTGAPGSAGSAAGSGASIEQKLISEEDLLATMDAQTR
RRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGA
GGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLES
RGPVGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMAC
GDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPK
PLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTK
SQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPIL
IPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEF
TMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPW
IGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNLMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQ
QSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYA
AQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPP
SYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQ
QSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGG
PMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDA
TVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMG
GRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDR
GRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQDKKPLNTLIS
ATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKA
NEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSIST
GATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYA
EERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRK
LDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGD
TLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ
ID NO: 500)
TOM20 : : FUS : : SYNZIP1 : : MCP : : PylRS (AF) : : EWSR1 : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAG C AGAT T G GT AT TAT T AAGAC AAAC AAGAAAAC G G GAC AG C C CAT GAT T AAT T T GT AC AC AGAC AG G GAAAC T GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATCTGGTGGCCCAGCTGGAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAAC C T G C AC AAGAAG GAC C T GAT CGCCTACCTG GAAAAG GAAAT C G C C AAC C T GAGAAAGAAGAT C GAG GAAG CAT C G ATATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCT AACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAAC GGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCT GCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTA ACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGA AACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGCGCCGATTACAAGGACGATGATGACAAGGGA GCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGC CTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACC ATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTT GTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGC CGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTC GTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACT GAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCT GTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAAT ACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGT CTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAA CTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAA CGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTAT ATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTG CGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATC TTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTT TGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATT GACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTG TCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTT GGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTAC AAT G GT AT T T C T AC T AAC C T GAT G G C GT C C AC G GAT T AC AGT AC C TAT AG C C AAG C T G C AG C G C AG C AG G G C T AC AGT G C T T AC AC C G C C C AG C C C AC T C AAG GAT AT G C AC AGAC C AC C C AG G CAT AT G G G C AAC AAAG C TAT G GAAC C TAT G GAC AG C C C AC T GAT GT C AG C TAT AC C C AG G C T C AGAC C AC T G C AAC C TAT G G G C AGAC C G C C TAT G C AAC T TCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTAT G G C AC T G GT G C T TAT GAT AC C AC C AC T G C T AC AGT C AC C AC C AC C C AG G C C T C C TAT G C AG C T C AGT C T G CAT AT GGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAAC AAG C C C AC T GAGAC T AGT C AAC C T C AAT C T AG C AC AG G G G GT T AC AAC C AG C C C AG C C T AG GAT AT G GAC AGAGT AACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACC AG C TAT T C C T C T AC AC AG C C GAC T AGT TAT GAT C AGAG C AGT T AC T C T C AG C AGAAC AC C TAT G G G C AAC C GAG C AG C TAT G GAC AG C AGAGT AG C TAT G GT C AAC AAAG C AG C TAT G G G C AG C AG C C T C C C AC TAGTTACC C AC C C C AA AC T G GAT C C T AC AG C C AAG C T C C AAGT C AAT AT AG C C AAC AGAG C AG C AG C T AC G G G C AG C AGAGT T CAT T C C GA CAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGC ATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGA GGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGA C C AGAT C T T GAT C T AG G C C C AC C T GT AGAT C C AGAT GAAGAC T C T GAC AAC AGT G C AAT T TAT GT AC AAG GAT T A AATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACT G G G C AAC C CAT GAT C C AC AT C T AC C T G GAC AAG GAAAC AG GAAAG C C C AAAG G C GAT G C C AC AGT GT C C TAT GAA GACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCC CTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCA CCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGAT AGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGA GCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGT AAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCT GGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGT GGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCT GGGGGGCCCCCTGGACCTTTGATGGAACAGGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGG ATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCG TGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGT AAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACA AGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCT AAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCT ACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGC GCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACA AAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTT CGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAAC TATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATT CTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTG GATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTG CCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAG TTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTT CTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATG CACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCT TGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCA CGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 501)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND
YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG
YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY
GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG
GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF
KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG
NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP
GGGPGGSHMGGNYGDDRRGGRGGAIANLVAQLENEVASLENENETLKKKNLHKKDLIAYLEKEIANLRKKIEEAS
IYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSS
AQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGADYKDDDDKG
APGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLV
VNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENT
EAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDR
LEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEY
IERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNF
CQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGF
GLERLLKVKHDFKNIKRAARSESYYNGISTNLMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGT
YGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAY
GTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPT
SYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFR
QDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEG
PDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYE
DPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGD
RGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGP
GGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQDKKPLNTLISATGLW
MSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQT
SVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATAS
ALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEEREN
YLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRAL
PDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVM
HGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID
NO: 502)
TOM20 : : FUS : : SYNZIP2 : : MCP : : PylRS (AF) : : EWSR1 : : PylRS (AF)
DNA: ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC C G C AAAAGAC GAAGT GAC C C C AAC T T C AAGAAC AG G C T T C GAGAAC GAAGAAAGAAAC AGAAG C T T G C C AAG GAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT C AG C C C T AC G GAC AG C AGAGT T AC AGT G GT TAT AG C C AGT C C AC G GAC AC T T C AG GAT AT G G C C AGAG C AG C TAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AG C C AG C AG C C T AG C TAT G GT G GAC AG C AG C AAAG C TAT G GAC AG C AG C AAAG C T AT AAT C C C C C T C AG G G C TAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAG C AGAT T G GT AT TAT T AAGAC AAAC AAGAAAAC G G GAC AG C C CAT GAT T AAT T T GT AC AC AGAC AG G GAAAC T GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC G C AG C T AGAAAC GCCTACCT GAGAAAGAAAAT C G C C AGAC T GAAGAAG GAC AAC C T G C AG C T G GAAAGAGAC GAG C AGAAC C T G GAAAAGAT CAT C G C C AAC C T C AGAGAT GAGAT C G C C AGAC T G GAAAAC GAG GT G G C C AG C C AC GAG CAGGCATCGATATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGT GGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAAC TTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGT CAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAAT ATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTA AAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGCGCCGATTACAAGGACGATGAT GACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCG CTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGT AC C G GAAC CAT T C AT AAAAT C AAAC AC C AC GAG GT TAGCCGTTC GAAAAT C TAT AT T GAGAT GGCGTGTGGC GAT CATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGT AAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAA GTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTG GAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAG TCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTT AAAG G C AAT AC C AAT C C GAT T AC AAG CAT GTCTGCCCCGGTT C AAG CAT C AG C T C C AG C AC T GAC AAAAT C C C AA ACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTG GAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGG AAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCT C T G GAGT AT AT C GAG C GT AT G G G CAT C GAC AAT GAT AC C GAAC T GAG C AAAC AAAT TTTCCGTGTG GAT AAAAAC TTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCT ATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATG CTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCAC CTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGAC CTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGT GCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAG T C C TAT T AC AAT G GT AT T T C T AC T AAC C T GAT G G C GT C C AC G GAT T AC AGT AC C TAT AG C C AAG C T G C AG C G C AG C AG G G C T AC AGT G C T T AC AC C G C C C AG C C C AC T C AAG GAT AT G C AC AGAC C AC C C AG G CAT AT G G G C AAC AAAG C TAT G GAAC C TAT G GAC AG C C C AC T GAT GT C AG C TAT AC C C AG G C T C AGAC C AC T G C AAC C TAT G G G C AGAC C G C C TATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTC C AG G G GT AT G G C AC T G GT G C T TAT GAT AC C AC C AC T G C T AC AGT C AC C AC C AC C C AG G C C T C C TAT G C AG C T C AG TCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAG GAT G GAAAC AAG C C C AC T GAGAC T AGT C AAC C T C AAT C T AG C AC AG G G G GT T AC AAC C AG C C C AG C C T AG GAT AT G GAC AGAGT AAC T AC AGT TAT C C C C AG GT AC C T G G GAG C T AC C C CAT G C AG C C AGT C AC T G C AC C T C CAT C C T AC CCTCCTAC C AG C TAT T C C T C T AC AC AG C C GAC T AGT TAT GAT C AGAG C AGT T AC T C T C AG C AGAAC AC C TAT G G G CAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTAC CCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGT TCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAG AACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGT GGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATG GATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTA CAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAAC AAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTG TCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTT AAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGC ATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGT GGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTC CAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGC AACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGA GGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGT GGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGA GGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGATAAAAAACCGCTGAATACCCTGATCTCTGCTACT GGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATT GAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCAC AAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAG GACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCT CGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATT CCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCC ACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCA GCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGC AAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAA CGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAA TCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATT TTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGAC CGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACAT CTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATC ACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTG GATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATC GACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAA CGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 503)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAARNAYLRKKIARLKKDNLQLERDEQNLEKIIANLRDEIARLENEVASHE QASIYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVR QSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGADYKDDD DKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGD HLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPL ENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQ TDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIP LEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTM LNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIG AGFGLERLLKVKHDFKNIKRAARSESYYNGISTNLMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQS YGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQ SAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSY PPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQS SFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPM DEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATV
SYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGR
GGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGR
GGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQDKKPLNTLISAT
GLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANE
DQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGA
TASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEE
RENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLD
RALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTL
DVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID
NO: 504)
LCK : : FUS : : SYNZIP1 : : MCP : : PylRS (AF) : : EWSR1 : : PylRS (AF)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCCTCAAACGATTATACCCAACAAGCAACCCAA AGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGT TACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAG AACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCC CAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGT TACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGT GGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAAC AGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGT GGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGAC CGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGT GGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTC AATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATC TTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAG ACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCA ACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAAT CCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGG CGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGT GGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAAC TTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCT CACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATCGCAAATCTGGTGGCCCAGCTG GAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAACCTGCACAAGAAGGACCTGATC GCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAAGCATCGATATATCCCTATGATGTGCCG GATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTC GTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGC TCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACC ATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACG AATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATC GCAGCAAACTCCGGCATCTACGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCT GCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAA CCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCAC GAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCT CGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTG AACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACT AAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCG TCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACC AGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATG TCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCG AAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAA GACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTC GTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGAC AATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAAT CTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTAT CGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGT ACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGAC AGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCA ATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAA GTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTG ATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCC ACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTC AGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACT GGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACC ACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCA GCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAA CCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTA CCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCG ACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGC TATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCT CCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATG GGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGG GGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGC AGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCA CCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGAT GATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATC TACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCT GCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCA ATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGA GGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGA GGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCC AATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGC TTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGA GGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGT GGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTG ATGGAACAGGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATT CATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTG AACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGT GTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTT AGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAA GCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTT CCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACC AATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTG GAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTG CTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGT GAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATC GAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGC CCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTC GAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGC CAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGAC TTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCT AGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGT CTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAAT GGTATTTCTACTAACCTGTAA (SEQ ID NO: 505)
Protein :
MGCVCSSNPEGTELASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQ
NTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYG
GQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQD
RGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTI
FVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGN
PIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMN
FSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIANLVAQLENEVASLENENETLKKKNLHKKDLI
AYLEKEIANLRKKIEEASIYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWIS SNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAI AANSGIYGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHH EVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRT KKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSM SAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFF VDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCY RKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGP IPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNLMASTDYSTYSQAAAQQGYSAYTAQP TQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDT TTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQV PGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQA PSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMG SAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHI YLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPG GPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEG FLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPL MEQDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCR VSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSV PASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESEL LSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLR PMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGID FKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYN GISTNL* (SEQ ID NO: 506)
LCK : : FUS : : SYNZIP2 : : MCP : : PylRS (AF) : : EWSR1 : : PylRS (AF)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCCTCAAACGATTATACCCAACAAGCAACCCAA AGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGT TACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAG AACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCC CAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGT TACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGT GGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAAC AGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGT GGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGAC CGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGT GGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTC AATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATC TTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAG ACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCA ACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAAT CCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGG CGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGT GGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAAC TTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCT CACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATCGCAGCTAGAAACGCCTACCTG AGAAAGAAAATCGCCAGACTGAAGAAGGACAACCTGCAGCTGGAAAGAGACGAGCAGAACCTGGAAAAGATCATC GCCAACCTCAGAGATGAGATCGCCAGACTGGAAAACGAGGTGGCCAGCCACGAGCAGGCATCGATATATCCCTAT GATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAG TTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAA TGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGC AAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATT TTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCC TCAGCAATCGCAGCAAACTCCGGCATCTACGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGT GCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGAT GATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATC AAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGC CGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGAT GAG GAT C T GAAC AAAT T C C T GAC AAAAG C C AAT GAG GAC C AAAC AAG C GT GAAAGT GAAAGT CGTTAGCGCTCCT ACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAG GCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGT GT GAG C AC C AG CAT TAG C AGT AT TAG C AC C G GT G C C AC CGCTAGCGCCCTGGT T AAAG G C AAT AC C AAT C C GAT T ACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTG CTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGT C GT AAAAAAGAC C T G C AAC AAAT C TAT G C C GAAGAAC GT GAGAAC TAT C T G G G GAAAC T G GAAC GT GAAAT C AC C CGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATG GGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTA GCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGC CCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGT TCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATT GTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTT GTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGT CTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCT AC T AAC C T GAT G G C GT C C AC G GAT T AC AGT AC C TAT AG C C AAG C T G C AG C G C AG C AG G G C T AC AGT G C T T AC AC C G C C C AG C C C AC T C AAG GAT AT G C AC AGAC C AC C C AG G CAT AT G G G C AAC AAAG C TAT G GAAC C TAT G GAC AG C C C ACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAG CCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCT TAT GAT AC C AC C AC T G C T AC AGT C AC C AC C AC C C AG G C C T C C TAT G C AG C T C AGT C T G CAT AT G G C AC T C AG C C T G C T TAT C C AG C C TAT G G G C AG C AG C C AG C AG C C AC T G C AC C T AC AAGAC C G C AG GAT G GAAAC AAG C C C AC T GAG AC T AGT C AAC C T C AAT C T AG C AC AG G G G GT T AC AAC C AG C C C AG C C T AG GAT AT G GAC AGAGT AAC T AC AGT TAT CCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCT AC AC AG C C GAC T AGT TAT GAT C AGAG C AGT T AC T C T C AG C AGAAC AC C TAT G G G C AAC C GAG C AG C TAT G GAC AG C AGAGT AG C TAT G GT C AAC AAAG C AG C TAT G G G C AG C AG C C T C C C AC TAGTTACC C AC C C C AAAC T G GAT C C T AC AG C C AAG C T C C AAGT C AAT AT AG C C AAC AGAG C AG C AG C T AC G G G C AG C AGAGT T CAT T C C GAC AG GAC C AC C C C AGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCT GATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGT GGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGAT CTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTG ACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATG AT C C AC AT C T AC C T G GAC AAG GAAAC AG GAAAG C C C AAAG G C GAT G C C AC AGT GT C C TAT GAAGAC C C AC C T AC T GCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAG AAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGA GGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTC CCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGG CAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAG CCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGG GGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGT GGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCT GGACCTTTGATGGAACAGGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC G GAAC CAT T C AT AAAAT C AAAC AC C AC GAG GT TAGCCGTTC GAAAAT C TAT AT T GAGAT GGCGTGTGGC GAT CAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGT AT AT C GAG C GT AT G G G CAT C GAC AAT GAT AC C GAAC T GAG C AAAC AAAT TTTCCGTGTG GAT AAAAAC T T C TGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG AACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCG GGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCC TATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 507)
Protein :
MGCVCSSNPEGTELASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQ NTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYG GQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQD RGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTI FVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGN PIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMN FSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAARNAYLRKKIARLKKDNLQLERDEQNLEKII ANLRDEIARLENEVASHEQASIYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAE WISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIP SAIAANSGIYGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKI KHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAP TRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPI TSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREIT RFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIG PCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAV VGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNLMASTDYSTYSQAAAQQGYSAYT AQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGA YDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSY PQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSY SQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRG GMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPM IHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRG GPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPK PEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPP GPLMEQDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCK RCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQES VSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELE SELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNF CLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHL GIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSES YYNGISTNL* (SEQ ID NO: 508)
LCK : : PylRS (AA) : : EWSR1 : : PylRS (AA)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG GAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG GCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCA GGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCG TATTATAACGGGATCTCTACGAACCTGATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAG GGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTAT GGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTAT GCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAG GGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCT GCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGAT GGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGA CAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCT CCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAA CCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCA CCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCA TTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAAC CGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGG CGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGAT GAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAA GGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAG AGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCC TATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAA GTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATG CCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGA GGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAG CACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAAC CAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGT GGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGC CGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGT GGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGT CTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAG ATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAA TATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGAC CAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGT GCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCT GTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACC GCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCA CTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAA CCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGT GAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCC CCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTC CGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGT GCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTG GAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACC GATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGAT GTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGAC AAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGT GCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 509)
Protein :
MGCVCSSNPEGTELACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDH LWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLE NTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQT DRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPL EYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTML AFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGA GFGLERLLKVKHDFKNIKRAARSESYYNGISTNLMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSY GTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQS AYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYP PTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSS FRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMD EGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVS YEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRG GDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRG GPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQDKKPLNTLISATG LWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANED QTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGAT ASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEER ENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDR ALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLD VMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 510)
LCK : : FUS : : PylRS (AA) : : EWSR1 : : PylRS (AA)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCCTCAAACGATTATACCCAACAAGCAACCCAA AGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGT TACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAG AACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCC CAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGT TACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGT GGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAAC AGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGT GGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGAC CGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGT GGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTC AATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATC TTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAG ACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCA ACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAAT CCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGG CGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGT GGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAAC TTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCT CACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGGCGCCCCCGGCTCCGCCGGCTCCGCC GCCGGCTCCGGCATGGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAA CCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCAC GAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCT CGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTG AACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACT AAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCG TCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACC AGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATG TCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCG AAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAA GACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTC GTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGAC AATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAAT CTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTAT CGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGT ACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGAC AGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCA ATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAA GTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTG ATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCC ACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTC AGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACT GGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACC ACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCA GCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAA CCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTA CCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCG ACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGC TATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCT CCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATG GGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGG GGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGC AGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCA CCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGAT GATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATC TACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCT GCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCA ATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGA GGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGA GGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCC AATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGC TTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGA GGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGT GGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTG ATGGAACAGGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATT CATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTG AACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGT GTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTT AGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAA GCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTT CCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACC AATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTG GAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTG CTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGT GAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATC GAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGC CCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTC GAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCC CAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGAC TTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCT AGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGT CTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAAC GGGATCTCTACGAACCTGTAA (SEQ ID NO: 511)
Protein :
MGCVCSSNPEGTELASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQ NTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYG GQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQD RGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTI FVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGN PIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMN FSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGGAPGSAGSAAGSGMACPVPLQLPPLERLTLDDKK PLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDL NKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVST SISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKK DLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPN LYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGD SCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL MASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPT GYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQ PQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSS YGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNR GRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLD DLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPP MNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCP NPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFR GGRGMDRGGFGGGRRGGPGGPPGPLMEQDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLW NNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTE AAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRL EVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYI ERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFA QMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFG LERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 512)
TOM20 : : FUS : : PylRS (AA) : : EWSR1 : : PylRS (AA)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCT GGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTG AATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTT AGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACA GCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAA TTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAA GCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGA AGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATT AGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCC CCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGAC GAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTG CAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGAT CGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGAT ACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTAT AACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAA GAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGT GAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGT ATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCG CTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAA CACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGATGGCG TCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAA GGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTAT ACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTAT ACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACT GCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTAT GGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAA TCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGG AGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGT TATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGT CAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGT CAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTT TATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGG GGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCT GGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTA GATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTG GCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTG GACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTG GAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAAC AGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCA GGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCC CGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCG GGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTC CCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGC CTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGC CGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAA CAGGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAA ATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAAT AGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCC GATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCT CCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCA CAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCA AGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCG ATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTT CTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCA CGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATC ACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGT ATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATG CTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATC GGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATG GGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAA ATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCC GTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAA CGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATC TCTACGAACCTGTAA (SEQ ID NO: 513)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAGKPIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPL NTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDLNK FLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSI SSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDL QQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLY NYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSC MVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNLMA STDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGY TTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQ SSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYG QQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGR GRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDL ADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMN SMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNP GCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGG RGMDRGGFGGGRRGGPGGPPGPLMEQDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWN SRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAA QAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEV LLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIER MGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQM GSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLE RLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 514)
TOM20 : : FUS : : MCP : : PylRS (AA) : : EWSR1 : : PylRS (AA)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCT AACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAAC GGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCT GCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTA ACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGA AACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGCGCCGATTACAAGGACGATGATGACAAGGGA GCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGC CTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACC ATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTT GTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGC CGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTC GTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACT GAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCT GTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAAT ACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGT CTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAA CTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAA CGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTAT ATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTG CGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATC TTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTT GCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATT GACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTG TCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTC GGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTAT AACGGGATCTCTACGAACCTGATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTAC AGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACC TATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACT TCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTAT GGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATAT GGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAAC AAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGT AACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACC AGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGC AGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAA ACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGA CAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGC ATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGA GGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGA CCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTA AATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACT GGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAA GACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCC CTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCA CCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGAT AGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGA GCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGT AAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCT GGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGT GGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCT GGGGGGCCCCCTGGACCTTTGATGGAACAGGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGG ATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCG TGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGT AAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACA AGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCT AAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCT ACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGC GCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACA AAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTT CGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAAC TATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATT CTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTG GATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTG CCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAG TTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTT CTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATG CACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCG TGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCC CGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 515)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND
YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG
YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY
GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG
GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF
KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFAN GIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDG NPIPSAIAANSGIYGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGT IHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKV VSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGN TNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLE REITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKI FEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLEL SSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNLMASTDYSTYSQAAAQQGY SAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGY GTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQS NYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQ TGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRG GGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRT GQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPP PLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQC KAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGP GGPPGPLMEQDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYR KTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVS TQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPF RELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRV DKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDF LNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAA RSESYYNGISTNL* (SEQ ID NO: 516)
TOM20 : : FUS : : 4clN22 : : PylRS (AA) : : EWSR1 : : PylRS (AA)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCT GGTAGTGGAGCATCGATAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCACAAACACGA CGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGC GCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAG AAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCT GGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCT GCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGAC GCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCT AGAGGGCCCGTTGGTGCTCCTGGTTCAGCAGGAAGCGCAGCAGGATCAGGTGCGTGCCCGGTGCCGCTGCAGCTG CCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATG AGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGT GGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAA ACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGC GTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAA CCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACC CAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCC CTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAA TCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGT GAAC T G GAGAG C GAAC T G C T GT C AC GT C GT AAAAAAGAC C T G C AAC AAAT C TAT G C C GAAGAAC GT GAGAAC TAT CTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTG ATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGAT AAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCT GATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTT ACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTG AACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCAC GGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGG ATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGT TCTGAATCGTATTATAACGGGATCTCTACGAACCTGATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCA G C G C AG C AG G G C T AC AGT G C T T AC AC C G C C C AG C C C AC T C AAG GAT AT G C AC AGAC C AC C C AG G CAT AT G G G C AA C AAAG C TAT G GAAC C TAT G GAC AG C C C AC T GAT GT C AG C TAT AC C C AG G C T C AGAC C AC T G C AAC C TAT G G G C AG ACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAG CCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCA GCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGA C C G C AG GAT G GAAAC AAG C C C AC T GAGAC T AGT C AAC C T C AAT C T AG C AC AG G G G GT T AC AAC C AG C C C AG C C T A G GAT AT G GAC AGAGT AAC T AC AGT TAT C C C C AG GT AC C T G G GAG C T AC C C CAT G C AG C C AGT C AC T G C AC C T C C A TCCTACCCTCCTAC C AG C TAT T C C T C T AC AC AG C C GAC T AGT TAT GAT C AGAG C AGT T AC T C T C AG C AGAAC AC C TAT G G G C AAC C GAG C AG C TAT G GAC AG C AGAGT AG C TAT G GT C AAC AAAG C AG C TAT G G G C AG C AG C C T C C C AC T AGT T AC C C AC C C C AAAC T G GAT C C T AC AG C C AAG C T C C AAGT C AAT AT AG C C AAC AGAG C AG C AG C T AC G G G C AG CAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCA GGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGC AGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGA C C CAT G GAT GAAG GAC C AGAT C T T GAT C T AG G C C C AC C T GT AGAT C C AGAT GAAGAC T C T GAC AAC AGT G C AAT T TATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAG AT GAAC AAGAGAAC T G G G C AAC C CAT GAT C C AC AT C T AC C T G GAC AAG GAAAC AG GAAAG C C C AAAG G C GAT G C C ACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGC AAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGC AGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGA GGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGA AACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACA GAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGT GGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTC AGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGA AGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGATAAAAAACCGCTGAATACCCTGATCTCT GCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATC TATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGT CACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCC AATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCC GTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCG GCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACC GGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCA GCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAAT TCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCC GAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAG ATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAA CAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAA CTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAA GAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGC ATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGAC ACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGG GGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAAC ATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 517)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND
YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG
YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY
GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG
GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF
KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG
NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP
GGGPGGSHMGGNYGDDRRGGRGGAIAGKPIPNPLLGLDSTGAPGSAGSAAGSGASIEQKLISEEDLLATMDAQTR
RRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGA
GGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLES
RGPVGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMAC
GDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPK
PLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTK
SQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPIL
IPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEF
TMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPW
IGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNLMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQ
QSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYA
AQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPP
SYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQ
QSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGG
PMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDA
TVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMG
GRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDR
GRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQDKKPLNTLIS
ATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKA
NEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSIST
GATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYA
EERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRK
LDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGD
TLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ
ID NO: 518)
LCK : : EWSR1 : : MCP : : PylRS (AF) : : FUS : : PylRS (AF)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCATGGCGTCCACGGATTACAGTACCTATAGCCAA GCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATAT GGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTAT GGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATAC AGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCC TATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCT ACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCC AGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCA CCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAG AACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCT CCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTAC GGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCC GGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGC ATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCT GGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGT GCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTT GT T AAGAT GAAC AAGAGAAC T G G G C AAC C CAT GAT C C AC AT C T AC C T G GAC AAG GAAAC AG GAAAG C C C AAAG G C GATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAA GGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGT GAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGC ATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGA GGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGG AGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGT GATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGA ATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGT GGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATCGCATATCCCTATGATGTG CCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTT CTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATC AGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATAC ACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCC ACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCA ATCGCAGCAAACTCCGGCATCTACGGGCTAAGCTATACAGATATTGAAATGAACAGATTGGGAAAGGCGTGCCCG GTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCT ACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTAT ATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCAC CACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAAT GAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTT GCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCC ATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGT GCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCT CCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCC GGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAA GAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATC AAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAA ATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTG GACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAA CATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATC ATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACC CTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGT ATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATC AAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGGCCTCAAACGATTATACCCAACAA GCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGA C AG C AGAGT T AC AGT G GT TAT AG C C AGT C C AC G GAC AC T T C AG GAT AT G G C C AGAG C AG C TAT T C T T C T TAT G G C CAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGC CAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACC TCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCT AG C TAT G GT G GAC AG C AG C AAAG C TAT G GAC AG C AG C AAAG C T AT AAT C C C C C T C AG G G C TAT G GAC AG C AGAAC CAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATG AGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGA CAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGC AGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGT GGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAAC AACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGT AT TAT T AAGAC AAAC AAGAAAAC G G GAC AG C C CAT GAT T AAT T T GT AC AC AGAC AG G GAAAC T G G C AAG C T GAAG GGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTC TCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGA GGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGA TTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAG AATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCA GGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGATAAAAAACCGCTGAAT ACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGC CGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCA CGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTC CTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCA
ATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGC
AAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGC
AGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCG
GTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAA
ATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAA
CAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGT
GGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACC
GAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAAC
TATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAG
TCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAG
AACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATG
GTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTG
GATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACAC
GACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ
ID NO: 519)
Protein :
MGCVCSSNPEGTELMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATY GQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAP TRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQ NTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFS GPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNS AIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQ GSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSG GGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGG MFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAYPYDVPDYAGAPGSAGSAAGSGASNFTQFV LVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFA TNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGLSYTDIEMNRLGKACPVPLQLPPLERLTLDDKKPLNTLISA TGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKAN EDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTG ATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAE ERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKL DRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDT LDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNLASNDYTQQ ATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSS QSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQN QYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNR SSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIG IIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRG GRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGP GGSHMGGNYGDDRRGGRGGDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTA RALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGS KFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDE ISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDT ELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRE NLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKH DFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 520)
TOM20 : : EWSR1 : : 4clN22 : : PylRS (AF) : : FUS : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAGTTCTTCATGGCGTCCACGGAT TACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCA CAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCT CAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCA ACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTC ACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAG C C AG C AG C C AC T G C AC C T AC AAGAC C G C AG GAT G GAAAC AAG C C C AC T GAGAC T AGT C AAC C T C AAT C T AG C AC A G G G G GT T AC AAC C AG C C C AG C C T AG GAT AT G GAC AGAGT AAC T AC AGT TAT C C C C AG GT AC C T G G GAG C T AC C C C AT G C AG C C AGT C AC T G C AC C T C CAT CCTACCCTCCTAC C AG C TAT T C C T C T AC AC AG C C GAC T AGT TAT GAT C AG AG C AGT T AC T C T C AG C AGAAC AC C TAT G G G C AAC C GAG C AG C TAT G GAC AG C AGAGT AG C TAT G GT C AAC AAAG C AG C TAT G G G C AG C AG C C T C C C AC TAGTTACC C AC C C C AAAC T G GAT C C T AC AG C C AAG C T C C AAGT C AAT AT AG C C AAC AGAG C AG C AG C T AC G G G C AG C AGAGT T CAT T C C GAC AG GAC C AC C C C AGT AG CAT G G GT GT T TAT G G G C AG GAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGG GGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGA GGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGAT GAAGAC T C T GAC AAC AGT G C AAT T TAT GTACAAGGATTAAAT GACAGT GT GACT CTAGAT GAT CT GGCAGACTT C T T T AAG C AGT GTGGGGTTGT T AAGAT GAAC AAGAGAAC T G G G C AAC C CAT GAT C C AC AT C T AC C T G GAC AAG GAA ACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTT GATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGG GGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCT GGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCC CGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGA AACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCC TTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGAT CGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATG GACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATC GCAGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTA G C C AC CAT G GAC G C AC AAAC AC GAC GAC GT GAG CGTCGCGCT GAGAAAC AAG C T C AAT G GAAAG C T G C AAAC C C A CCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACA CGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCT GGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCT GAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGA GCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAA GCTGCAAACCCACCGCTCGGGCTAAGCTATACAGATATTGAAATGAACAGATTGGGAAAGGCGTGCCCGGTGCCG CTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGT CTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAG ATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAA TATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGAC CAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGT GCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCT GTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACC GCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCA CTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAA CCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGT GAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCC CCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTC CGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGT GCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTG GAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACC GATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGAT GTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGAC AAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGT GCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGGCCTCAAACGATTATACCCAACAAGCAACC CAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAG AGT T AC AGT G GT TAT AG C C AGT C C AC G GAC AC T T C AG GAT AT G G C C AGAG C AG C TAT T C T T C T TAT G G C C AGAG C CAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGC TCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGA AGTTACGGTAG C AGT T C T C AGAG C AG C AG C TAT G G G C AG C C C C AGAGT G G GAG C T AC AG C C AG C AG C C T AG C TAT G GT G GAC AG C AG C AAAG C TAT G GAC AG C AG C AAAG C TAT AAT C C C C C T C AG G G C TAT G GAC AG C AGAAC C AGT AC AACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGT GGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAG GACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGT GGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGC TTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACC ATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATT AAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAG GCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGA AATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGA GGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCC AGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATG AACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGC TCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGATAAAAAACCGCTGAATACCCTG ATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCG AAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCA CTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACA AAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCG AAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTC TCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATT AGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAA GCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGC CTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATC TATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTT CTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTG AGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTG CGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGAC GGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTG GAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTT GGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGT GAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTC AAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 521)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASTD YSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTP TAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSST GGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQS SYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRG GFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADF FKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMR GGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCG NQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGM DRGGFGGGRRGGPGGPPGPLMEQAIAGAPGSAGSAAGSGEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANP PLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRA EKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLGLSYTDIEMNRLGKACPVP LQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHK YRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAIP VSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGK PFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIF RVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIIT DFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKR AARSESYYNGISTNLASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQS QNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSY GGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQ DRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNT IFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSG NPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENM NFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRS KIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMP KSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQ ASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGF LEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESD GKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDR EWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 522)
TOM20 : : EWSR1 : : MCP : : PylRS (AF) : : FUS : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAGTTCTTCATGGCGTCCACGGAT TACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCA CAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCT CAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCA ACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTC ACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAG CCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACA GGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCC ATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAG AGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGC AGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGC CAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAG GAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGG GGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGA GGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGAT GAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTC TTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAA ACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTT GATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGG GGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCT GGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCC CGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGA AACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCC TTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGAT CGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATG GACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATC GCATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCT AACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAAC GGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCT GCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTA ACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGA AACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGGCTAAGCTATACAGATATTGAAATGAACAGA TTGGGAAAGGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTG AATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTT AGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACA GCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAA TTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAA GCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGA AGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATT AGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCC CCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGAC GAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTG CAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGAT CGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGAT ACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCT AACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAA GAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGT GAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGT ATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCG CTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAA CACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGGCCTCA AACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAG AGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGC AGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACT GGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAG CCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGG AGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAG GGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGC CAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGT GGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGC GGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGC ATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAA CAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGAT TACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGG GAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGG TTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGT GGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGT GGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGT CCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGAT GGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGC GATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATC AAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGC CGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGAT GAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCT ACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAG GCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGT GTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATT ACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTG CTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGT CGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACC CGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATG GGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTA GCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGC CCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGT TCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATT GTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTT GTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGT CTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCT ACTAACCTGTAA (SEQ ID NO: 523)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASTD YSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTP TAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSST GGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQS SYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRG GFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADF FKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMR GGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCG NQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGM DRGGFGGGRRGGPGGPPGPLMEQAIAYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFAN GIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDG NPIPSAIAANSGIYGLSYTDIEMNRLGKACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEV SRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKK AMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSA PVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVD RGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRK ESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIP LDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNLASNDYTQQATQSYGAYPTQPGQGYSQQ SSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQ PAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYG QDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGG MGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDR ETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSG GGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGG DKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSD EDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPAS VSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSR RKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPML APNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKI VGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGIS TNL* (SEQ ID NO: 524)
LCK : : PylRS (AA) : : FUS : : PylRS (AA)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG GAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG GCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCA GGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCG TATTATAACGGGATCTCTACGAACCTGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTAC CCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGC CAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGA ACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGG CAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCT CAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGC TATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGT GGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGC GGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGC AGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGA GGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGC CCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTG GGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACG GGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGAT GACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCA TTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATG GGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGT GGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAAT GAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAAC TACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTG TGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATG GCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATAT CGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAA ACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCC CCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTT TCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCT AGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTG ACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCG TTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAG AACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCG ATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGT GTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCC CTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAG GAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGAT TTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTC ATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAA CCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCC GCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 525)
Protein :
MGCVCSSNPEGTELACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDH
LWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLE
NTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQT
DRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPL
EYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTML
AFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGA
GFGLERLLKVKHDFKNIKRAARSESYYNGISTNLASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYS
QSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSS
QSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGG
GYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGG
PRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFD
DPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGG
GGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGDKKPLNTLISATGL
WMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQ
TSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATA
SALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERE
NYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRA
LPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDV
MHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID
NO: 526)
EBAG9I-29 : : EWSR1 : : SYNZIP4 : : 4clN22
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTAC ACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAG CCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGA CAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGT GCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAG CCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACT GAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGT TATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGA CAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCC TACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCAC CCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGC CCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGC GGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTT GATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGT GTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCC ATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCT ACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGG AAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGT GGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGC TTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGAC TGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCA AAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATG CGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGA GGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCC CCTGGACCTTTGATGGAACAGGCGATCGCACAAAAGGTGGCTGAACTGAAAAATAGAGTGGCCGTGAAGCTGAAC CGGAACGAGCAGCTGAAGAACAAGGTGGAAGAGCTGAAGAACAGAAACGCCTACCTGAAGAATGAGCTGGCCACC CTGGAAAACGAGGTGGCCAGACTGGAAAACGACGTGGCCGAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGT AGTGGAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGT CGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGA GCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAA TGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCC ACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCG CTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGA CGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCTAGAGGGCCCGTT TAA (SEQ ID NO: 527)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQ PTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQ PAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYS STQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDH PSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDL DLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPP TAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGG FPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGM RGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAQKVAELKNRVAVKLN RNEQLKNKVEELKNRNAYLKNELATLENEVARLENDVAEGAPGSAGSAAGSGEQKLISEEDLLATMDAQTRRRER RAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLA TMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLESRGPV
* (SEQ ID NO: 528)
KIF16B : : FUS : : SYNZIP1 : : PylRS (AF)
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA TATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATG GCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCC CAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGC CAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGC TCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGC CAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAG AGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCC CCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAAC TATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGT GGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGC GGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGA GGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGAC TCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTG GCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACA GACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATT GACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAAT CGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGC AGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGG AAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAA CCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGA GGTGGTGCGATCGCAAATCTGGTGGCCCAGCTGGAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTG AAGAAAAAGAACCTGCACAAGAAGGACCTGATCGCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATC GAGGAAGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCG CTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGT ACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGAT CATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGT AAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAA GTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTG GAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAG TCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTT AAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAA ACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTG GAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGG AAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCT CTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAAC TTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCT ATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATG CTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCAC CTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGAC CTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGT GCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAG TCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 529)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTYTDIEMNRLGKGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYS QQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYG QQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGN YGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGR GGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYT DRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGG SGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGR GGAIANLVAQLENEVASLENENETLKKKNLHKKDLIAYLEKEIANLRKKIEEGAPGSAGSAAGSGACPVPLQLPP LERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTC KRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQE SVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFREL ESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKN FCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNH LGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSE SYYNGISTNL* (SEQ ID NO: 530)
KIF16B : : FUS : : SYNZIP1 : : PylRS (AA)
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA TATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATG GCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCC CAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGC CAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGC TCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGC CAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAG AGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCC CCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAAC TATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGT GGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGC GGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGA GGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGAC TCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTG GCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACA GACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATT GACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAAT CGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGC AGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGG AAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAA CCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGA GGTGGTGCGATCGCAAATCTGGTGGCCCAGCTGGAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTG AAGAAAAAGAACCTGCACAAGAAGGACCTGATCGCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATC GAGGAAGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCG CTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGT ACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGAT CATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGT AAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAA GTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTG GAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAG TCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTT AAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAA ACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTG GAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGG AAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCT CTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAAC TTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCT ATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATG CTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCAC CTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGAC CTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGA GCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAA TCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 531)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTYTDIEMNRLGKGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYS QQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYG QQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGN YGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGR GGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYT DRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGG SGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGR GGAIANLVAQLENEVASLENENETLKKKNLHKKDLIAYLEKEIANLRKKIEEGAPGSAGSAAGSGACPVPLQLPP LERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTC KRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQE SVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFREL ESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKN FCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNH LGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSE SYYNGISTNL* (SEQ ID NO: 532)
EBAG9I-29 : : EWSR1 : : SYNZIP2 : :MCP
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTAC ACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAG CCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGA CAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGT GCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAG CCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACT GAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGT TATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCC TCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGA CAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCC TACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCAC CCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGC CCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGC GGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTT GATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGT GTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCC ATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCT ACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGG AAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGT GGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGC TTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGAC
TGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCA
AAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATG
CGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGA
GGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCC
CCTGGACCTTTGATGGAACAGGCGATCGCAGCTAGAAACGCCTACCTGAGAAAGAAAATCGCCAGACTGAAGAAG
GACAACCTGCAGCTGGAAAGAGACGAGCAGAACCTGGAAAAGATCATCGCCAACCTCAGAGATGAGATCGCCAGA
CTGGAAAACGAGGTGGCCAGCCACGAGCAGTATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCT
GGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTG
ACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAA
GTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCC
TGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAG
GCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA
(SEQ ID NO: 533)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQ PTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQ PAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYS STQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDH PSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDL DLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPP TAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGG FPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGM RGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAARNAYLRKKIARLKK DNLQLERDEQNLEKIIANLRDEIARLENEVASHEQYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDV TVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVK AMQGLLKDGNPIPSAIAANSGIY* (SEQ ID NO: 534)
TOM20 : : EWSR1 : : SYNZIP2 : :MCP
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAGTTCTTCATGGCGTCCACGGAT TACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCA CAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCT CAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCA ACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTC ACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAG CCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACA GGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCC ATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAG AGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGC AGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGC CAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAG GAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGG GGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGA GGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGAT GAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTC TTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAA ACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTT GATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGG GGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCT GGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCC CGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGA AACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCC TTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGAT CGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATG GACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATC GCAGCTAGAAACGCCTACCTGAGAAAGAAAATCGCCAGACTGAAGAAGGACAACCTGCAGCTGGAAAGAGACGAG CAGAACCTGGAAAAGATCATCGCCAACCTCAGAGATGAGATCGCCAGACTGGAAAACGAGGTGGCCAGCCACGAG CAGTATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCT AACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAAC GGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCT GCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTA ACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGA AACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 535)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASTD
YSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTP
TAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSST
GGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQS
SYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRG
GFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADF
FKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMR
GGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCG
NQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGM
DRGGFGGGRRGGPGGPPGPLMEQAIAARNAYLRKKIARLKKDNLQLERDEQNLEKIIANLRDEIARLENEVASHE
QYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSS
AQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY* (SEQ ID
NO: 536)
TOM20 : : FUS : : SYNZIP4 : : 4clN22 : : PylRS (AA)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCACAAAAGGTGGCTGAACTGAAAAATAGAGTGGCCGTGAAGCTGAACCGGAACGAGCAGCTGAAGAACAAGGTG GAAGAGCTGAAGAACAGAAACGCCTACCTGAAGAATGAGCTGGCCACCCTGGAAAACGAGGTGGCCAGACTGGAA AACGACGTGGCCGAGTTAATCGCAGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCATCGATAGAG CAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAG AAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCT GGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCT GCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGAC GCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGA GCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAG CGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCTAGAGGGCCCGTTGGTGCTCCT GGTTCAGCAGGAAGCGCAGCAGGATCAGGTGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACC CTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGC GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTG GAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGG ATCTCTACGAACCTGTAA (SEQ ID NO: 537)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAQKVAELKNRVAVKLNRNEQLKNKVEELKNRNAYLKNELATLENEVARLE NDVAELIAGAPGSAGSAAGSGASIEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGA GGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDG AGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLESRGPVGAPGSAGSAAGSGACPVPLQLPPLERLT LDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRV SDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVP ASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELL SRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRP MLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDF KIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNG ISTNL* (SEQ ID NO: 538)
KIF16B : : EWSR1 : : SYNZIP4 : : 4clN22
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT
GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA
AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC
TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG
CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA
ATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCC
ACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTC
AGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACT
GGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACC
ACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCA
GCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAA
CCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTA
CCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCG
ACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGC
TATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCT
CCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATG
GGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGG
GGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGC
AGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCA
CCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGAT
GATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATC
TACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCT
GCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCA
ATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGA
GGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGA
GGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCC
AATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGC
TTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGA
GGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGT
GGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTG
ATGGAACAGGCGATCGCACAAAAGGTGGCTGAACTGAAAAATAGAGTGGCCGTGAAGCTGAACCGGAACGAGCAG
CTGAAGAACAAGGTGGAAGAGCTGAAGAACAGAAACGCCTACCTGAAGAATGAGCTGGCCACCCTGGAAAACGAG
GTGGCCAGACTGGAAAACGACGTGGCCGAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGAGCAG
AAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAA
CAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGC
GGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCA
AACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCA
CAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCC
GGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGT
CGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCTAGAGGGCCCGTTTAA ( SEQ ID
NO: 539)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDV SYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYP AYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQP TSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSM GVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGP PVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKA AVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPR GPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGR GGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAQKVAELKNRVAVKLNRNEQ LKNKVEELKNRNAYLKNELATLENEVARLENDVAEGAPGSAGSAAGSGEQKLISEEDLLATMDAQTRRRERRAEK QAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDA QTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLESRGPV* (SEQ ID NO: 540)
TOM20 : : EWSR1 : : SYNZP4 : : 4clN22
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAGTTCTTCATGGCGTCCACGGAT TACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCA CAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCT CAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCA ACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTC ACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAG CCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACA GGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCC ATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAG AGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGC AGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGC CAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAG GAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGG GGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGA GGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGAT GAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTC TTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAA ACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTT GATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGG GGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCT GGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCC CGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGA AACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCC TTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGAT CGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATG GACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATC GCACAAAAGGTGGCTGAACTGAAAAATAGAGTGGCCGTGAAGCTGAACCGGAACGAGCAGCTGAAGAACAAGGTG GAAGAGCTGAAGAACAGAAACGCCTACCTGAAGAATGAGCTGGCCACCCTGGAAAACGAGGTGGCCAGACTGGAA AACGACGTGGCCGAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGAGCAGAAGCTGATCTCAGAG GAGGACCTGCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAA GCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATG GACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAC GGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGT GAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGA GCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAA GCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCTAGAGGGCCCGTTTAA (SEQ ID NO: 541)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASTD
YSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTP
TAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSST
GGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQS
SYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRG
GFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADF
FKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMR
GGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCG
NQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGM
DRGGFGGGRRGGPGGPPGPLMEQAIAQKVAELKNRVAVKLNRNEQLKNKVEELKNRNAYLKNELATLENEVARLE
NDVAEGAPGSAGSAAGSGEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATM DAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAG AGAGAGGLATMDAQTRRRERRAEKQAQWKAA PPLESRGPV* (SEQ ID NO: 542)
TOM20 : : FUS : : SYNZIP1 : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATCTGGTGGCCCAGCTGGAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAAC CTGCACAAGAAGGACCTGATCGCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAAGGCAAG CCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGA GCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTG ATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCG AAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCA CTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACA AAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCG AAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTC TCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATT AGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAA GCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGC CTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATC TATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTT CTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTG AGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTG CGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGAC GGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTG GAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTT GGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGT GAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTC AAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 543)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND
YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG
YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY
GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG
GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIANLVAQLENEVASLENENETLKKKNLHKKDLIAYLEKEIANLRKKIEEGK PIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRS KIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMP KSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQ ASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGF LEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESD GKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDR EWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 544)
TOM20 : : FUS : : SYNZIP : : 3 : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATGAGGTGACCACCCTGGAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATC GCCAGACTGAGAAAGGAAAAGGCCGCTCTGCGGAACAGACTGGCCCACAAGAAGGGCAAGCCTATTCCCAACCCC CTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCG CTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGT CTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAG ATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAA TATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGAC CAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGT GCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCT GTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACC GCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCA CTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAA CCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGT GAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCC CCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTC CGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGT GCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTG GAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACC GATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGAT GTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGAC AAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGT GCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 545) Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIANEVTTLENDAAFIENENAYLEKEIARLRKEKAALRNRLAHKKGKPIPNP LLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIE MACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVAR APKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPA LTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKS PILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHL EEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGID KPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 546)
EBAG9I-29 : : FUS : : SYNZIP1 : : PylRS (AF)
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAA CAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTAC GGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTAT GGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGT AGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGC ACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAG CCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAG AACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCC ATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTAT GGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAAC CGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGAC CGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGAC AACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATT GGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTG AAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAA TTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGT GGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGA GGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGT GAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGA CCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAAATCTG GTGGCCCAGCTGGAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAACCTGCACAAG AAGGACCTGATCGCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAAGGCGCCGATTACAAG GACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAG CTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGG ATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCG TGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGT AAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACA AGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCT AAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCT ACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGC GCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACA AAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTT CGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAAC TATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATT CTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTG GATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTG CCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAG TTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTT CTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATG CACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCT TGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCA CGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 547)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPY GQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSS TSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSS MSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSD RGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKL KGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRG GFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIANL VAQLENEVASLENENETLKKKNLHKKDLIAYLEKEIANLRKKIEEGADYKDDDDKGAPGSAGSAAGSGACPVPLQ LPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYR KTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVS TQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPF RELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRV DKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDF LNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAA RSESYYNGISTNL* (SEQ ID NO: 548)
EBAG9I-29 : : FUS : : SYNZIP3 : : PylRS (AF)
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAA CAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTAC GGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTAT GGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGT AGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGC ACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAG CCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAG AACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCC ATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTAT GGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAAC CGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGAC CGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGAC AACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATT GGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTG AAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAA TTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGT GGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGA GGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGT GAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGA CCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAAATGAG GTGACCACCCTGGAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATCGCCAGACTG AGAAAGGAAAAGGCCGCTCTGCGGAACAGACTGGCCCACAAGAAGGGCGCCGATTACAAGGACGATGATGACAAG GGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAA CGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGA ACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTG GTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGT TGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAA GTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAAC ACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTT TCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGC AATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGAT CGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGC GAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTG GAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAG TATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGT CTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAA ATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAAC TTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGC ATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAA CTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGT TTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTAT TACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 549)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPY GQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSS TSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSS MSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSD RGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKL KGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRG GFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIANE VTTLENDAAFIENENAYLEKEIARLRKEKAALRNRLAHKKGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLE RLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKR CRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESV SVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELES ELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFC LRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLG IDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESY YNGISTNL* (SEQ ID NO: 550)
TOM20 : : FUS : : SYNZIP1 : : PylRS (AA)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATCTGGTGGCCCAGCTGGAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAAC CTGCACAAGAAGGACCTGATCGCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAAGGCAAG CCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGA GCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTG ATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCG AAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCA CTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACA AAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCG AAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTC TCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATT AGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAA GCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGC CTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATC TATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTT CTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTG AGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTG CGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGAC GGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTG GAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTAT GGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGT GAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTC AAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 551)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIANLVAQLENEVASLENENETLKKKNLHKKDLIAYLEKEIANLRKKIEEGK PIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRS KIYIEMACGDHLVVNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMP KSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQ ASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGF LEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESD GKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDR EWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 552)
TOM20 : : FUS : : SYNZIP3 : : PylRS (AA)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATGAGGTGACCACCCTGGAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATC GCCAGACTGAGAAAGGAAAAGGCCGCTCTGCGGAACAGACTGGCCCACAAGAAGGGCAAGCCTATTCCCAACCCC CTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCG CTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGT CTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAG ATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAA TATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGAC CAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGT GCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCT GTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACC GCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCA CTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAA CCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGT GAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCC CCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTC CGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGT GCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTG GAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACC GATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGAT GTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGAC AAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGT GCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 553)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIANEVTTLENDAAFIENENAYLEKEIARLRKEKAALRNRLAHKKGKPIPNP LLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIE MACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVAR APKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPA LTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKS PILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHL EEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGID KPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 554)
TOM20 : : FUS : : SYNZIP3 : : PylRS (AAAF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATGAGGTGACCACCCTGGAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATC GCCAGACTGAGAAAGGAAAAGGCCGCTCTGCGGAACAGACTGGCCCACAAGAAGGGCAAGCCTATTCCCAACCCC CTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCG CTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGT CTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAG ATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAA TATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGAC CAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGT GCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCT GTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACC GCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCA CTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAA CCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGT GAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCC CCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTC CGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGT GCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTG GAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACC GATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGAT GTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGAC AAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGT GCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 555)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIANEVTTLENDAAFIENENAYLEKEIARLRKEKAALRNRLAHKKGKPIPNP LLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIE MACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVAR APKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPA LTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKS PILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHL EEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGID KPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 556)
LCK: : FUS : : SYNZIP3 : : PylRS (AF)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCATGGCCTCAAACGATTATACCCAACAAGCAACC CAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAG AGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGC CAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGC TCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGA AGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTAT GGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTAC AACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGT GGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAG GACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGT GGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGC TTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACC ATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATT AAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAG GCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGA AATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGA GGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCC AGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATG AACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGC TCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAAATGAGGTGACCACC CTGGAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATCGCCAGACTGAGAAAGGAA AAGGCCGCTCTGCGGAACAGACTGGCCCACAAGAAGGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCA GGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACC CTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGC GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTG GAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGT ATTTCTACTAACCTGTAA (SEQ ID NO: 557)
Protein :
MGCVCSSNPEGTELMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQS QNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSY GGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQ DRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNT IFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSG NPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENM NFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIANEVTTLENDAAFIENENAYLEKEIARLRKE KAALRNRLAHKKGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIH KIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWS APTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTN PITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLERE ITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFE IGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSS AWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 558)
LCK: : SYNZIP1 : : PylRS (AF) DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCGATCGCAAATCTGGTGGCCCAGCTGGAAAAC GAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAACCTGCACAAGAAGGACCTGATCGCCTAC CTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTG GATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCG CCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGT CGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGC GATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACC TGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTG AAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCA CTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAG GAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTG GTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCC CAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAA CTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTG GGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATT CCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAA AACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGAT CCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACC ATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAAC CACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGC GACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATC GGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCC GAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 559)
Protein :
MGCVCSSNPEGTELAIANLVAQLENEVASLENENETLKKKNLHKKDLIAYLEKEIANLRKKIEEGKPIPNPLLGL DSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACG DHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKP LENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKS QTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILI PLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFT MLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWI GAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 560)
LCK: : SYNZIP3 : : PylRS (AF)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCGATCGCAAATGAGGTGACCACCCTGGAAAAC
GACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATCGCCAGACTGAGAAAGGAAAAGGCCGCT
CTGCGGAACAGACTGGCCCACAAGAAGGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCA
CCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTG
ACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATT
CATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTG
AACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGT
GTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTT
AGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAA
GCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTT
CCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACC
AATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTG
GAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTG
CTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGT
GAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATC
GAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGC
CCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTC
GAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGC
CAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGAC
TTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCT AGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGT CTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAAT GGTATTTCTACTAACCTGTAA (SEQ ID NO: 561)
Protein:
MGCVCSSNPEGTELAIANEVTTLENDAAFIENENAYLEKEIARLRKEKAALRNRLAHKKGKPIPNPLLGLDSTGA PGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLW NNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTE AAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRL EVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYI ERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFC QMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFG LERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 562)
SYNZIP2 : :MCP
DNA:
ATGGCGATCGCAGCTAGAAACGCCTACCTGAGAAAGAAAATCGCCAGACTGAAGAAGGACAACCTGCAGCTGGAA AGAGACGAGCAGAACCTGGAAAAGATCATCGCCAACCTCAGAGATGAGATCGCCAGACTGGAAAACGAGGTGGCC AGCCACGAGCAGTATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGT GGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAAC TTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGT CAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAAT ATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTA AAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 563)
Protein :
MAIAARNAYLRKKIARLKKDNLQLERDEQNLEKIIANLRDEIARLENEVASHEQYPYDVPDYAGAPGSAGSAAGS GASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLN MELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY* (SEQ ID NO: 564)
LCK: : EWSR1 : : SYNZIP2 : :MCP
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCATGGCGTCCACGGATTACAGTACCTATAGCCAA GCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATAT GGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTAT GGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATAC AGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCC TATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCT ACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCC AGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCA CCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAG AACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCT CCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTAC GGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCC GGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGC ATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCT GGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGT GCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTT GTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGC GATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAA GGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGT GAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGC ATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGA GGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGG AGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGT GATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGA ATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGT GGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATCGCAGCTAGAAACGCCTAC CTGAGAAAGAAAATCGCCAGACTGAAGAAGGACAACCTGCAGCTGGAAAGAGACGAGCAGAACCTGGAAAAGATC ATCGCCAACCTCAGAGATGAGATCGCCAGACTGGAAAACGAGGTGGCCAGCCACGAGCAGTATCCCTATGATGTG CCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTT CTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATC AGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATAC ACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCC ACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCA ATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 565)
Protein :
MGCVCSSNPEGTELMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATY GQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAP TRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQ NTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFS GPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNS AIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQ GSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSG GGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGG MFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAARNAYLRKKIARLKKDNLQLERDEQNLEKI IANLRDEIARLENEVASHEQYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWI SSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSA IAANSGIY* (SEQ ID NO: 566)
LCK: : EWSR1 : : SYNZIP4 : : 4clN22
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCATGGCGTCCACGGATTACAGTACCTATAGCCAA GCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATAT GGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTAT GGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATAC AGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCC TATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCT ACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCC AGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCA CCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAG AACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCT CCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTAC GGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCC GGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGC ATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCT GGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGT GCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTT GTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGC GATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAA GGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGT GAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGC ATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGA GGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGG AGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGT GATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGA ATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGT GGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATCGCACAAAAGGTGGCTGAA CTGAAAAATAGAGTGGCCGTGAAGCTGAACCGGAACGAGCAGCTGAAGAACAAGGTGGAAGAGCTGAAGAACAGA AACGCCTACCTGAAGAATGAGCTGGCCACCCTGGAAAACGAGGTGGCCAGACTGGAAAACGACGTGGCCGAGGGA GCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACC ATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTC GACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGA CGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCT GGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAA CAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGC GGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCA AACCCACCGCTCGAGTCTAGAGGGCCCGTTTAA (SEQ ID NO: 567)
Protein :
MGCVCSSNPEGTELMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATY GQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAP TRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQ NTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFS GPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNS AIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQ GSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSG GGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGG MFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAQKVAELKNRVAVKLNRNEQLKNKVEELKNR NAYLKNELATLENEVARLENDVAEGAPGSAGSAAGSGEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANPPL DGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEK QAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLESRGPV* (SEQ ID NO: 568)
LCK: : SYNZIP2 : :MCP
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCGATCGCAGCTAGAAACGCCTACCTGAGAAAG AAAATCGCCAGACTGAAGAAGGACAACCTGCAGCTGGAAAGAGACGAGCAGAACCTGGAAAAGATCATCGCCAAC CTCAGAGATGAGATCGCCAGACTGGAAAACGAGGTGGCCAGCCACGAGCAGTATCCCTATGATGTGCCGGATTAT GCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGAC AATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAAC TCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAA GTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCC GACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCA AACTCCGGCATCTACTAA (SEQ ID NO: 569)
Protein :
MGCVCSSNPEGTELAIAARNAYLRKKIARLKKDNLQLERDEQNLEKIIANLRDEIARLENEVASHEQYPYDVPDY AGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIK VEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY* (SEQ ID NO: 570)
EWSRl : : SYNZIP2 : :MCP
DNA:
ATGATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAG CCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGAT
GTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCC ACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGAT ACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTAT CCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGT CAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAG
GTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAG CCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGT AGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAA GCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGC ATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAAC CGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATG GGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGC CCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTA GATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCAC ATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAG GCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCT CCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCA GGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCA AGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGT CCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAA GGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGA AGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTC CGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCT TTGATGGAACAGGCGATCGCAGCTAGAAACGCCTACCTGAGAAAGAAAATCGCCAGACTGAAGAAGGACAACCTG CAGCTGGAAAGAGACGAGCAGAACCTGGAAAAGATCATCGCCAACCTCAGAGATGAGATCGCCAGACTGGAAAAC GAGGTGGCCAGCCACGAGCAGTATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCT GCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCC CCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGT AGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCG TACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAA GGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 571)
Protein :
MMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPP
TGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETS
QPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQS
SYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDN
RGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTL
DDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKP
PMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQC
PNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGF
RGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAARNAYLRKKIARLKKDNLQLERDEQNLEKIIANLRDEIARLEN
EVASHEQYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTC
SVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY*
(SEQ ID NO: 572)
LCK: : FUS : : SYNZIP1 : : PylRS (AF)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCATGGCCTCAAACGATTATACCCAACAAGCAACC CAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAG AGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGC CAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGC TCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGA AGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTAT GGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTAC AACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGT GGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAG GACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGT GGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGC TTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACC ATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATT AAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAG GCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGA AATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGA GGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCC AGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATG AACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGC TCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATCGCAAATCTGGTGGCCCAG CTGGAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAACCTGCACAAGAAGGACCTG ATCGCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAAGGCAAGCCTATTCCCAACCCCCTG CTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTG CAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTG TGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATG GCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATAT CGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAA ACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCC CCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTT TCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCT AGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTG ACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCG TTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAG AACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCG ATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGT GTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCC CTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAG GAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGAT TTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTC ATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAA CCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCT GCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 573)
Protein :
MGCVCSSNPEGTELMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQS
QNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSY
GGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQ
DRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNT
IFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSG
NPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENM
NFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIANLVAQLENEVASLENENETLKKKNLHKKDL
IAYLEKEIANLRKKIEEGKPIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGL
WMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQ
TSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATA
SALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERE
NYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRA
LPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDV
MHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID
NO: 574)
LCK: : FUS : : SYNZIP3 : : PylRS (AF)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCATGGCCTCAAACGATTATACCCAACAAGCAACC CAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAG AGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGC CAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGC TCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGA AGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTAT GGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTAC AACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGT GGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAG GACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGT GGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGC TTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACC ATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATT AAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAG GCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGA AATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGA GGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCC AGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATG AACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGC TCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATCGCAAATGAGGTGACCACC CTGGAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATCGCCAGACTGAGAAAGGAA AAGGCCGCTCTGCGGAACAGACTGGCCCACAAGAAGGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGC ACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG GAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG AACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCG GGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCC TATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 575)
Protein :
MGCVCSSNPEGTELMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQS QNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSY GGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQ DRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNT IFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSG NPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENM NFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIANEVTTLENDAAFIENENAYLEKEIARLRKE KAALRNRLAHKKGKPIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRT GTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKV KWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVK GNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGK LEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPI KIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDL ELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 576)
TOM20 : : EWSR1 : : SYNZIP4 : : 2xPCP
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAGTTCTTCATGGCGTCCACGGAT TACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCA CAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCT CAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCA ACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTC ACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAG CCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACA GGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCC ATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAG AGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGC AGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGC CAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAG GAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGG GGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGA GGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGAT GAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTC TTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAA ACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTT GATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGG GGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCT GGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCC CGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGA AACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCC TTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGAT CGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATG GACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATC GCACAAAAGGTGGCTGAACTGAAAAATAGAGTGGCCGTGAAGCTGAACCGGAACGAGCAGCTGAAGAACAAGGTG GAAGAGCTGAAGAACAGAAACGCCTACCTGAAGAATGAGCTGGCCACCCTGGAAAACGAGGTGGCCAGACTGGAA AACGACGTGGCCGAGTTAATCGCAGAGCAGAAGCTGATCTCAGAGGAGGACCTGATCGAAGGCCGCCATATGCTA GCCTCCAAAACCATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGT CAGATCTTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCC AAGACCGCGTATCGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTAC ACTCAGGTATGGTCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTG ACCAAGTCCCTCGTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTGCGGATCCG CTAGCCTCCAAAACCATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGAC CGTCAGATCTTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGA GCCAAGACCGCGTATCGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGC TACACTCAGGTATGGTCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGAT TTGACCAAGTCCCTCGTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTCCACCG GTCGCCACCTAA (SEQ ID NO: 577)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASTD YSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTP TAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSST GGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQS SYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRG GFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADF FKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMR GGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCG NQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGM DRGGFGGGRRGGPGGPPGPLMEQAIAQKVAELKNRVAVKLNRNEQLKNKVEELKNRNAYLKNELATLENEVARLE NDVAELIAEQKLISEEDLIEGRHMLASKTIVLSVGEATRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGA KTAYRVNLKLDQADWDSGLPKVRYTQVWSHDVTIVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGRADP LASKTIVLSVGEATRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRWLKLDQADWDSGLPKVR YTQVWSHDVTIVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGRPPVAT* (SEQ ID NO: 578)
TOM20 : : EWSR1 : : SYNZIP2 : : 2xPCP
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAGTTCTTCATGGCGTCCACGGAT TACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCA CAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCT CAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCA ACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTC ACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAG
CCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACA
GGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCC
ATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAG
AGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGC
AGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGC
CAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAG
GAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGG
GGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGA
GGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGAT
GAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTC
TTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAA
ACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTT
GATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGG
GGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCT
GGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCC
CGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGA
AACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCC
TTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGAT
CGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATG
GACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATC
GCAGCTAGAAACGCCTACCTGAGAAAGAAAATCGCCAGACTGAAGAAGGACAACCTGCAGCTGGAAAGAGACGAG
CAGAACCTGGAAAAGATCATCGCCAACCTCAGAGATGAGATCGCCAGACTGGAAAACGAGGTGGCCAGCCACGAG
CAGTTAATCGCATACCCCTACGACGTGCCCGACTACGCCATCGAAGGCCGCCATATGCTAGCCTCCAAAACCATC
GTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATCTTCGAAGAG
AAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACCGCGTATCGC
GTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAGGTATGGTCG
CACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAGTCCCTCGTC
GCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTGCGGATCCGCTAGCCTCCAAAACC
ATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATCTTCGAA
GAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACCGCGTAT
CGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAGGTATGG
TCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAGTCCCTC
GTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTCCACCGGTCGCCACCTAA
(SEQ ID NO: 579)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASTD YSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTP TAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSST GGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQS SYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRG GFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADF FKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMR GGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCG NQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGM DRGGFGGGRRGGPGGPPGPLMEQAIAARNAYLRKKIARLKKDNLQLERDEQNLEKIIANLRDEIARLENEVASHE QLIAYPYDVPDYAIEGRHMLASKTIVLSVGEATRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYR VNLKLDQADWDSGLPKVRYTQVWSHDVTIVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGRADPLASKT IVLSVGEATRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRWLKLDQADWDSGLPKVRYTQVW SHDVTIVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGRPPVAT* (SEQ ID NO: 580)
KIF16B : :EWSR1: : SYNZIP2 : :MCP
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA ATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCC ACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTC AGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACT GGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACC ACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCA GCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAA CCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTA CCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCG ACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGC TATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCT CCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATG GGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGG GGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGC AGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCA CCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGAT GATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATC TACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCT GCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCA ATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGA GGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGA GGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCC AATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGC TTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGA GGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGT GGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTG ATGGAACAGGCGATCGCAGCTAGAAACGCCTACCTGAGAAAGAAAATCGCCAGACTGAAGAAGGACAACCTGCAG CTGGAAAGAGACGAGCAGAACCTGGAAAAGATCATCGCCAACCTCAGAGATGAGATCGCCAGACTGGAAAACGAG GTGGCCAGCCACGAGCAGTATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCT GGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCA AGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGC GTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTAC TTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGT CTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 581)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDV SYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYP AYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQP TSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSM GVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGP PVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKA AVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPR GPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGR GGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAARNAYLRKKIARLKKDNLQ LERDEQNLEKIIANLRDEIARLENEVASHEQYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAP SNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQG LLKDGNPIPSAIAANSGIY* (SEQ ID NO: 582)
LCK: : SYNZIP1 : : PylRS (AF)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCGATCGCAAATCTGGTGGCCCAGCTGGAAAAC GAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAACCTGCACAAGAAGGACCTGATCGCCTAC CTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAAGGCGCCGATTACAAGGACGATGATGACAAGGGA
GCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGC CTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACC ATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTT GTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGC CGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTC GTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACT GAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCT GTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAAT ACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGT CTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAA CTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAA CGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTAT ATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTG CGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATC TTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTT TGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATT GACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTG TCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTT GGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTAC AATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 583)
Protein :
MGCVCSSNPEGTELAIANLVAQLENEVASLENENETLKKKNLHKKDLIAYLEKEIANLRKKIEEGADYKDDDDKG APGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLV VNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENT EAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDR LEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEY IERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNF CQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGF GLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 584)
LCK: : FUS : : SYNZIP1 : : PylRS (AF)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCATGGCCTCAAACGATTATACCCAACAAGCAACC CAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAG AGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGC CAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGC TCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGA AGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTAT GGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTAC AACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGT GGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAG GACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGT GGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGC TTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACC ATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATT AAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAG GCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGA AATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGA GGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCC AGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATG AACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGC TCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAAATCTGGTGGCCCAG CTGGAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAACCTGCACAAGAAGGACCTG ATCGCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAAGGCGCCGATTACAAGGACGATGAT GACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCG CTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGT ACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGAT CATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGT AAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAA GTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTG GAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAG TCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTT AAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAA ACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTG GAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGG AAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCT CTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAAC TTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCT ATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATG CTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCAC CTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGAC CTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGT GCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAG TCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 585)
Protein :
MGCVCSSNPEGTELMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQS QNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSY GGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQ DRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNT IFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSG NPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENM NFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIANLVAQLENEVASLENENETLKKKNLHKKDL IAYLEKEIANLRKKIEEGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSR TGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVK VKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALV KGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLG KLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDP IKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGD LELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 586)
SYNZIP4 : : 4clN22
DNA:
ATGGCGATCGCACAAAAGGTGGCTGAACTGAAAAATAGAGTGGCCGTGAAGCTGAACCGGAACGAGCAGCTGAAG AACAAGGTGGAAGAGCTGAAGAACAGAAACGCCTACCTGAAGAATGAGCTGGCCACCCTGGAAAACGAGGTGGCC AGACTGGAAAACGACGTGGCCGAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGAGCAGAAGCTG ATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCT CAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTA GCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCA CCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACA CGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCT GGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCT GAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCTAGAGGGCCCGTTTAA (SEQ ID NO: 587)
Protein :
MAIAQKVAELKNRVAVKLNRNEQLKNKVEELKNRNAYLKNELATLENEVARLENDVAEGAPGSAGSAAGSGEQKL ISEEDLLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANP PLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRA EKQAQWKAANPPLESRGPV* (SEQ ID NO: 588)
TOM20 : : FUS : : SYNZIP1 : : MCP : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATCTGGTGGCCCAGCTGGAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAAC CTGCACAAGAAGGACCTGATCGCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAAGCATCG ATATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCT AACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAAC GGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCT GCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTA ACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGA AACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGCGCCGATTACAAGGACGATGATGACAAGGGA GCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGC CTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACC ATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTT GTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGC CGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTC GTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACT GAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCT GTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAAT ACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGT CTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAA CTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAA CGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTAT ATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTG CGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATC TTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTT TGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATT GACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTG TCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTT GGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTAC AATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 589)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIANLVAQLENEVASLENENETLKKKNLHKKDLIAYLEKEIANLRKKIEEAS IYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSS AQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGADYKDDDDKG APGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLV VNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENT EAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDR LEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEY IERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNF CQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGF GLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 590)
TOM20 : : FUS : : SYNZIP2 : : MCP : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAGCTAGAAACGCCTACCTGAGAAAGAAAATCGCCAGACTGAAGAAGGACAACCTGCAGCTGGAAAGAGACGAG CAGAACCTGGAAAAGATCATCGCCAACCTCAGAGATGAGATCGCCAGACTGGAAAACGAGGTGGCCAGCCACGAG CAGGCATCGATATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGT GGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAAC TTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGT CAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAAT ATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTA AAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGCGCCGATTACAAGGACGATGAT GACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCG CTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGT ACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGAT CATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGT AAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAA GTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTG GAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAG TCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTT AAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAA ACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTG GAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGG AAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCT CTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAAC TTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCT ATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATG CTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCAC CTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGAC CTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGT GCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAG TCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 591)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAARNAYLRKKIARLKKDNLQLERDEQNLEKIIANLRDEIARLENEVASHE QASIYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVR QSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGADYKDDD DKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGD HLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPL ENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQ TDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIP LEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTM LNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIG AGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 592)
TOM20 : : FUS : : SYNZIP1 : : MCP : : PylRS (AA)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATCTGGTGGCCCAGCTGGAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAAC CTGCACAAGAAGGACCTGATCGCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAAGCATCG ATATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCT AACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAAC GGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCT GCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTA ACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGA AACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGCGCCGATTACAAGGACGATGATGACAAGGGA GCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGC CTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACC ATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTT GTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGC CGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTC GTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACT GAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCT GTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAAT ACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGT CTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAA CTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAA CGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTAT ATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTG CGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATC TTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTT GCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATT GACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTG TCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTC GGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTAT AACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 593)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIANLVAQLENEVASLENENETLKKKNLHKKDLIAYLEKEIANLRKKIEEAS IYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSS AQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGADYKDDDDKG APGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLV VNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENT EAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDR LEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEY IERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAF AQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGF GLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 594) TOM20 : : FUS : : SYNZIP2 : : MCP : : PylRS (AA)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAGCTAGAAACGCCTACCTGAGAAAGAAAATCGCCAGACTGAAGAAGGACAACCTGCAGCTGGAAAGAGACGAG CAGAACCTGGAAAAGATCATCGCCAACCTCAGAGATGAGATCGCCAGACTGGAAAACGAGGTGGCCAGCCACGAG CAGGCATCGATATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGT GGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAAC TTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGT CAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAAT ATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTA AAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGCGCCGATTACAAGGACGATGAT GACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCG CTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGT ACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGAT CATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGT AAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAA GTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTG GAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAG TCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTT AAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAA ACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTG GAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGG AAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCT CTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAAC TTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCT ATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATG CTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCAC CTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGAC CTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGA GCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAA TCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 595)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND
YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAARNAYLRKKIARLKKDNLQLERDEQNLEKIIANLRDEIARLENEVASHE QASIYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVR QSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGADYKDDD DKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGD HLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPL ENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQ TDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIP LEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTM LAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIG AGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 596)
TOM20: : FUS : :SYNZIP: :MCP: :IFRS1
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATCTGGTGGCCCAGCTGGAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAAC CTGCACAAGAAGGACCTGATCGCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAAGCATCG ATATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCT AACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAAC GGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCT GCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTA ACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGA AACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGCGCCGATTACAAGGACGATGATGACAAGGGA GCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGC CTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACC ATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTT GTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGC CGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTC GTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACT GAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCT GTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAAT ACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGT CTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAA CTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAA CGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTAT ATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTG CGCCCTATGCTTGCACCAAATATGCTGAACTATAGCCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATC TTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGTCTTTT ATGCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATT GACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTG TCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTC GGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTAT AACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 597)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIANLVAQLENEVASLENENETLKKKNLHKKDLIAYLEKEIANLRKKIEEAS IYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSS AQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGADYKDDDDKG APGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLV VNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENT EAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDR LEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEY IERMGIDNDTELSKQIFRVDKNFCLRPMLAPNMLNYSRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLSF MQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGF GLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 598)
TOM20 : : FUS : : SYNZIP2 : :MCP: : IFRS1
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAGCTAGAAACGCCTACCTGAGAAAGAAAATCGCCAGACTGAAGAAGGACAACCTGCAGCTGGAAAGAGACGAG CAGAACCTGGAAAAGATCATCGCCAACCTCAGAGATGAGATCGCCAGACTGGAAAACGAGGTGGCCAGCCACGAG CAGGCATCGATATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGT GGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAAC TTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGT CAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAAT ATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTA AAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGCGCCGATTACAAGGACGATGAT GACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCG CTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGT ACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGAT CATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGT AAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAA GTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTG GAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAG TCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTT AAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAA ACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTG GAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGG AAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCT CTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAAC TTCTGTCTGCGCCCTATGCTTGCACCAAATATGCTGAACTATAGCCGCAAACTGGACCGTGCCCTGCCTGATCCT ATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATG CTGTCTTTTATGCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCAC CTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGAC CTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGA GCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAA TCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 599)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAARNAYLRKKIARLKKDNLQLERDEQNLEKIIANLRDEIARLENEVASHE QASIYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVR QSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGADYKDDD DKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGD HLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPL ENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQ TDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIP LEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNMLNYSRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTM LSFMQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIG AGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 600)
TOM20 : : FUS : : SYNZIP3 : : 4clN22 : : CbzRS
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATGAGGTGACCACCCTGGAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATC GCCAGACTGAGAAAGGAAAAGGCCGCTCTGCGGAACAGACTGGCCCACAAGAAGGGAGCACCAGGAAGTGCTGGT TCTGCTGCTGGTAGTGGAGCATCGATAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCA CAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCC GGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGT CGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGA GCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAA TGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCC ACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCG CTCGAGTCTAGAGGGCCCGTTGGTGCTCCTGGTTCAGCAGGAAGCGCAGCAGGATCAGGTGCGTGCCCGGTGCCG CTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGT CTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAG ATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAA TATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGAC CAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGT GCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCT GTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACC GCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCA CTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAA CCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGT GAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCC CCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTC CGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTTGCACCAAATCTGATGAACTATGGACGCAAACTGGACCGT GCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTG GAGGAGTTTACCATGCTGAACTTTACACAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACC GATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGAT GTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGAC AAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGT GCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 601)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIANEVTTLENDAAFIENENAYLEKEIARLRKEKAALRNRLAHKKGAPGSAG SAAGSGASIEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRER RAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLA TMDAQTRRRERRAEKQAQWKAANPPLESRGPVGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATG LWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANED QTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGAT ASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEER ENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLMNYGRKLDR ALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFTQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLD VMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 602)
EBAG9I-29 : : FUS : : SYNZIP3 : : PCP : : PylRS (AF)
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAA CAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTAC GGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTAT GGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGT AGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGC ACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAG CCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAG AACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCC ATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTAT GGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAAC CGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGAC CGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGAC AACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATT GGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTG AAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAA TTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGT GGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGA GGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGT GAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGA CCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAAATGAG GTGACCACCCTGGAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATCGCCAGACTG AGAAAGGAAAAGGCCGCTCTGCGGAACAGACTGGCCCACAAGAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCT GGTAGTGGAGCATCGATAGAGCAGAAGCTGATCTCAGAGGAGGACCTGATCGAAGGCCGCCATATGCTAGCCTCC AAAACCATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATC TTCGAAGAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACC GCGTATCGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAG GTATGGTCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAG TCCCTCGTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTGCGGATCCGCTAGCC GGTGCTCCTGGTTCAGCAGGAAGCGCAGCAGGATCAGGTGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAA CGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGA ACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTG GTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGT TGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAA GTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAAC ACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTT TCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGC AATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGAT CGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGC GAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTG GAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAG TATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGT CTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAA ATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAAC TTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGC ATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAA CTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGT TTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTAT TACAATGGTATTTCTACTAACCTTTAA (SEQ ID NO: 603)
Protein : MAITQFRLFKFCTCLATVFSFLKRLICRSGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPY GQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSS TSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSS MSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSD RGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKL KGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRG GFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIANE VTTLENDAAFIENENAYLEKEIARLRKEKAALRNRLAHKKGAPGSAGSAAGSGASIEQKLISEEDLIEGRHMLAS KTIVLSVGEATRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRWLKLDQADWDSGLPKVRYTQ VWSHDVTIVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGRADPLAGAPGSAGSAAGSGACPVPLQLPPLE RLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKR CRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESV SVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELES ELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFC LRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLG IDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESY YNGISTNL* (SEQ ID NO: 604)
EBAG9I-29 : : FUS : : SYNZIP4 : : PylRS (AF)
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAA CAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTAC GGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTAT GGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGT AGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGC ACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAG CCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAG AACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCC ATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTAT GGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAAC CGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGAC CGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGAC AACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATT GGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTG AAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAA TTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGT GGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGA GGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGT GAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGA CCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCACAAAAG GTGGCTGAACTGAAAAATAGAGTGGCCGTGAAGCTGAACCGGAACGAGCAGCTGAAGAACAAGGTGGAAGAGCTG AAGAACAGAAACGCCTACCTGAAGAATGAGCTGGCCACCCTGGAAAACGAGGTGGCCAGACTGGAAAACGACGTG GCCGAGTTAATCGCAGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCATCGATAGAGCAGAAGCTG ATCTCAGAGGAGGACCTGATCGAAGGCCGCCATATGCTAGCCTCCAAAACCATCGTTCTTTCGGTCGGCGAGGCT ACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATCTTCGAAGAGAAGGTCGGGCCTCTGGTGGGT CGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACCGCGTATCGCGTCAACCTAAAACTGGATCAG GCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAGGTATGGTCGCACGACGTGACAATCGTTGCG AATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAGTCCCTCGTCGCGACCTCGCAGGTCGAAGAT CTTGTCGTCAACCTTGTGCCGCTGGGCCGTGCGGATCCGCTAGCCGGTGCTCCTGGTTCAGCAGGAAGCGCAGCA GGATCAGGTGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTG AATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTT AGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACA GCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAA TTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAA GCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGA AGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATT AGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCC
CCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGAC
GAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTG
CAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGAT
CGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGAT
ACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCT
AACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAA
GAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGT
GAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGT
ATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCG
CTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAA
CACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA
(SEQ ID NO: 605)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPY GQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSS TSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSS MSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSD RGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKL KGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRG GFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAQK VAELKNRVAVKLNRNEQLKNKVEELKNRNAYLKNELATLENEVARLENDVAELIAGAPGSAGSAAGSGASIEQKL ISEEDLIEGRHMLASKTIVLSVGEATRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRWLKLDQ ADWDSGLPKVRYTQVWSHDVTIVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGRADPLAGAPGSAGSAA GSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRT ARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSG SKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKD EISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDND TELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTR ENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVK HDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 606)
EBAG9I-29 : : FUS : : SYNZIP3 : : 4clN22 : : IFRS1
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAA CAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTAC GGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTAT GGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGT AGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGC ACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAG CCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAG AACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCC ATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTAT GGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAAC CGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGAC CGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGAC AACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATT GGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTG AAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAA TTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGT GGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGA GGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGT GAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGA CCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAAATGAG GTGACCACCCTGGAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATCGCCAGACTG AGAAAGGAAAAGGCCGCTCTGCGGAACAGACTGGCCCACAAGAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCT GGTAGTGGAGCATCGATAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCACAAACACGA CGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGC GCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAG AAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCT GGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCT GCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGAC GCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCT AGAGGGCCCGTTGGTGCTCCTGGTTCAGCAGGAAGCGCAGCAGGATCAGGTGCGTGCCCGGTGCCGCTGCAGCTG CCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATG AGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGT GGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAA ACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGC GTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAA CCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACC CAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCC CTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAA TCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGT GAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTAT CTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTG ATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGAT AAAAACTTCTGTCTGCGCCCTATGCTTGCACCAAATATGCTGAACTATAGCCGCAAACTGGACCGTGCCCTGCCT GATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTT ACCATGCTGTCTTTTATGCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTG AACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCAC GGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGG ATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGT TCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 607)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPY GQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSS TSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSS MSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSD RGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKL KGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRG GFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIANE VTTLENDAAFIENENAYLEKEIARLRKEKAALRNRLAHKKGAPGSAGSAAGSGASIEQKLISEEDLLATMDAQTR RRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGA GGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLES RGPVGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMAC GDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPK PLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTK SQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPIL IPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNMLNYSRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEF TMLSFMQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPW IGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 608)
LCK : : FUS : : SYNZIP1 : : MCP : : PylRS (AF)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCCTCAAACGATTATACCCAACAAGCAACCCAA AGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGT TACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAG AACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCC CAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGT TACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGT GGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAAC AGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGT GGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGAC CGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGT GGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTC AATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATC TTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAG ACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCA ACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAAT CCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGG CGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGT GGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAAC TTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCT CACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATCGCAAATCTGGTGGCCCAGCTG GAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAACCTGCACAAGAAGGACCTGATC GCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAAGCATCGATATATCCCTATGATGTGCCG GATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTC GTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGC TCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACC ATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACG AATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATC GCAGCAAACTCCGGCATCTACGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCT GCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAA CCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCAC GAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCT CGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTG AACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACT AAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCG TCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACC AGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATG TCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCG AAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAA GACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTC GTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGAC AATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAAT CTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTAT CGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGT ACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGAC AGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCA ATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAA GTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTG TAA (SEQ ID NO: 609)
Protein :
MGCVCSSNPEGTELASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQ NTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYG GQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQD RGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTI FVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGN PIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMN FSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIANLVAQLENEVASLENENETLKKKNLHKKDLI AYLEKEIANLRKKIEEASIYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWIS SNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAI AANSGIYGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHH EVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRT KKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSM SAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFF VDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCY RKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGP IPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 610)
LCK : : FUS : : SYNZIP2 : : MCP : : PylRS (AF)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCCTCAAACGATTATACCCAACAAGCAACCCAA AGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGT TACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAG AACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCC CAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGT TACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGT GGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAAC AGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGT GGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGAC CGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGT GGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTC AATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATC TTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAG ACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCA ACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAAT CCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGG CGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGT GGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAAC TTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCT CACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATCGCAGCTAGAAACGCCTACCTG AGAAAGAAAATCGCCAGACTGAAGAAGGACAACCTGCAGCTGGAAAGAGACGAGCAGAACCTGGAAAAGATCATC GCCAACCTCAGAGATGAGATCGCCAGACTGGAAAACGAGGTGGCCAGCCACGAGCAGGCATCGATATATCCCTAT GATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAG TTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAA TGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGC AAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATT TTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCC TCAGCAATCGCAGCAAACTCCGGCATCTACGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGT GCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGAT GATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATC AAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGC CGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGAT GAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCT ACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAG GCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGT GTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATT ACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTG CTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGT CGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACC CGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATG GGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTA GCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGC CCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGT TCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATT GTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTT GTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGT CTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCT ACTAACCTGTAA (SEQ ID NO: 611)
Protein :
MGCVCSSNPEGTELASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQ
NTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYG GQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQD RGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTI FVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGN PIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMN FSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAARNAYLRKKIARLKKDNLQLERDEQNLEKII ANLRDEIARLENEVASHEQASIYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAE WISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIP SAIAANSGIYGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKI KHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAP TRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPI TSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREIT RFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIG PCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAV VGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 612)
CGI : : FUS : : SYNZIP1 : : MCP : : PylRS (AF)
DNA:
ATGGCCATTTGTCAATTCTTCCTTCAAGGCCGGTGCCGCTTTGGAGATCGGTGCTGGAACGAACATCCCGGTGCT AGGGGTGCAGGAGGAGGACGGCAGCAACCGCAGCAGCAGCCTTCAGGTAATAATAGACGTGGATGGAATACAACT AGCCAGAGATATTCCAATGTCATCCAGCCATCCAGTTTCTCCAAATCCACACCATGGGGGGGCAGCAGAGATCAA GAAAAGCCATATTTCAGTTCTTTTGATTCTGGAGCTTCAACTAACAGGAAGGAAGGCTTTGGATTGTCTGAGAAC CCATTTGCTTCACTTAGTCCTGATGAGCAGAAAGATGAAAAGAAACTTCTGGAAGGAATTGTAAAAGATATGGAG GTTTGGGAATCATCAGGGCAGTGGATGTTTTCTGTTTATTCACCAGTGAAAAAGAAACCTAATATTTCAGGTTTT ACAGACATTTCACCAGAGGAATTGAGGCTTGAATACCATAACTTCTTAACCAGCAATAACTTACAGAGTTATCTA AATTCTGTCCAACGTTTAATAAATCAATGGAGGAACAGGGTAAATGAACTGAAAAGTCTAAATATATCAACTAAA GTAGCTTTGCTCTCTGATGTAAAGGATGGAGTAAATCAAGCAGCACCTGCATTTGGATTTGGCAGCAGTCAAGCA GCAACATTTATGTCGCCAGGCTTTCCAGTCAATAACAGCAGCAGTGATAATGCTCAGAACTTTAGTTTTAAAACA AACTCTGGATTTGCTGCTGCCTCTTCTGGAAGCCCTGCTGGTTTTGGGAGTTCCCCAGCATTTGGAGCTGCAGCC TCTACCAGTTCAGGTATCTCTACTTCTGCTCCAGCTTTTGGATTTGGGAAGCCTGAAGTCACATCGGCTGCATCA TTTTCATTCAAAAGCCCTGCAGCTTCCAGTTTTGGATCACCTGGATTTTCAGGACTTCCAGCTTCCTTGGCAACA GGTCCTGTCAGAGCTCCAGTGGCCCCAGCCTTTGGAGGTGGCAGTTCTGTGGCTGGTTTTGGTAGTCCGGGCTCA CATTCTCACACTGCTTTTTCTAAGCCATCCAGTGACACTTTTGGAAATAGCAGCATATCCACTTCTCTGTCAGCC TCAAGCAGCATCATTGCAACAGATAATGTGTTATTCACACCCAGAGATAAACTAACAGTAGAAGAACTGGAACAA TTTCAATCCAAGAAATTTACTCTGGGAAAAATTCCATTAAAGCCTCCACCTCTGGAACTTCTAAATGTTGGAGCA CCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTAT GGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGT GGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACA GGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCG TCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGT AGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAG CAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGC AGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGC AGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGA GGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTAT GAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAA TTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTG CAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAAC AAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTC TCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATC AAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGA GGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGT GGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCT TGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATG GGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAAATCTGGTGGCCCAGCTGGAAAAC GAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAACCTGCACAAGAAGGACCTGATCGCCTAC CTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAAGCATCGATATATCCCTATGATGTGCCGGATTAT GCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGAC AATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAAC
TCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAA
GTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCC
GACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCA
AACTCCGGCATCTACGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCT
GGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTG
AATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTT
AGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACA
GCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAA
TTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAA
GCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGA
AGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATT
AGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCC
CCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGAC
GAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTG
CAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGAT
CGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGAT
ACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCT
AACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAA
GAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGT
GAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGT
ATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCG
CTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAA
CACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA
(SEQ ID NO: 613)
Protein :
MAICQFFLQGRCRFGDRCWNEHPGARGAGGGRQQPQQQPSGNNRRGWNTTSQRYSNVIQPSSFSKSTPWGGSRDQ EKPYFSSFDSGASTNRKEGFGLSENPFASLSPDEQKDEKKLLEGIVKDMEVWESSGQWMFSVYSPVKKKPNISGF TDISPEELRLEYHNFLTSNNLQSYLNSVQRLINQWRNRWELKSLNISTKVALLSDVKDGWQAAPAFGFGSSQA ATFMSPGFPWNSSSDNAQNFSFKTNSGFAAASSGSPAGFGSSPAFGAAASTSSGISTSAPAFGFGKPEVTSAAS FSFKSPAASSFGSPGFSGLPASLATGPVRAPVAPAFGGGSSVAGFGSPGSHSHTAFSKPSSDTFGNSSISTSLSA SSSIIATDNVLFTPRDKLTVEELEQFQSKKFTLGKIPLKPPPLELLNVGAPGSAGSAAGSGMASNDYTQQATQSY GAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQS SYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSS SGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGY EPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTN KKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRG GPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHM GGNYGDDRRGGRGGAIANLVAQLENEVASLENENETLKKKNLHKKDLIAYLEKEIANLRKKIEEASIYPYDVPDY AGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIK VEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGADYKDDDDKGAPGSAGSAA GSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRT ARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSG SKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKD EISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDND TELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTR ENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVK HDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 614)
CGI : : FUS : : SYNZIP2 : : MCP : : PylRS (AF)
DNA:
ATGGCCATTTGTCAATTCTTCCTTCAAGGCCGGTGCCGCTTTGGAGATCGGTGCTGGAACGAACATCCCGGTGCT AGGGGTGCAGGAGGAGGACGGCAGCAACCGCAGCAGCAGCCTTCAGGTAATAATAGACGTGGATGGAATACAACT AGCCAGAGATATTCCAATGTCATCCAGCCATCCAGTTTCTCCAAATCCACACCATGGGGGGGCAGCAGAGATCAA GAAAAGCCATATTTCAGTTCTTTTGATTCTGGAGCTTCAACTAACAGGAAGGAAGGCTTTGGATTGTCTGAGAAC CCATTTGCTTCACTTAGTCCTGATGAGCAGAAAGATGAAAAGAAACTTCTGGAAGGAATTGTAAAAGATATGGAG GTTTGGGAATCATCAGGGCAGTGGATGTTTTCTGTTTATTCACCAGTGAAAAAGAAACCTAATATTTCAGGTTTT AC AGAC AT T T C AC C AGAG GAAT T GAG G C T T GAAT AC C AT AAC T T C T T AAC C AG C AAT AAC T T AC AGAGT TAT C T A AAT T C T GT C C AAC GT T T AAT AAAT C AAT G GAG GAAC AG G GT AAAT GAAC T GAAAAGT C T AAAT AT AT C AAC T AAA GTAGCTTTGCTCTCTGATGTAAAGGATGGAGTAAATCAAGCAGCACCTGCATTTGGATTTGGCAGCAGTCAAGCA G C AAC AT T TAT GT C G C C AG G C T T T C C AGT C AAT AAC AG C AG C AGT GAT AAT G C T C AGAAC TTTAGTTT T AAAAC A AACTCTGGATTTGCTGCTGCCTCTTCTGGAAGCCCTGCTGGTTTTGGGAGTTCCCCAGCATTTGGAGCTGCAGCC TCTACCAGTTCAGGTATCTCTACTTCTGCTCCAGCTTTTGGATTTGGGAAGCCTGAAGTCACATCGGCTGCATCA TTTTCATTCAAAAGCCCTGCAGCTTCCAGTTTTGGATCACCTGGATTTTCAGGACTTCCAGCTTCCTTGGCAACA GGTCCTGTCAGAGCTCCAGTGGCCCCAGCCTTTGGAGGTGGCAGTTCTGTGGCTGGTTTTGGTAGTCCGGGCTCA CATTCTCACACTGCTTTTTCTAAGCCATCCAGTGACACTTTTGGAAATAGCAGCATATCCACTTCTCTGTCAGCC T C AAG C AG CAT CAT T G C AAC AGAT AAT GT GT TAT T C AC AC C C AGAGAT AAAC T AAC AGT AGAAGAAC T G GAAC AA TTTCAATCCAAGAAATTTACTCTGGGAAAAATTCCATTAAAGCCTCCACCTCTGGAACTTCTAAATGTTGGAGCA CCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTAT GGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGT G GT TAT AG C C AGT C C AC G GAC AC T T C AG GAT AT G G C C AGAG C AG C TAT T C T T C T TAT G G C C AGAG C C AGAAC AC A GGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCG TCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGT AG C AGT T C T C AGAG C AG C AG C TAT G G G C AG C C C C AGAGT G G GAG C T AC AG C C AG C AG C C T AG C TAT G GT G GAC AG C AG C AAAG C TAT G GAC AG C AG C AAAG C TAT AAT C C C C C T C AG G G C TAT G GAC AG C AGAAC C AGT AC AAC AG C AG C AGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGC AGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGA GGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTAT GAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAA TTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATCTTTGTG CAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAAC AAGAAAAC G G GAC AG C C CAT GAT T AAT T T GT AC AC AGAC AG G GAAAC T G G C AAG C T GAAG G GAGAG G C AAC G GT C TCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATC AAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGA GGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGT GGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCT TGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATG GGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAGCTAGAAACGCCTACCTGAGAAAG AAAAT C G C C AGAC T GAAGAAG GAC AAC C T G C AG C T G GAAAGAGAC GAG C AGAAC C T G GAAAAGAT CAT C G C C AAC C T C AGAGAT GAGAT C G C C AGAC T G GAAAAC GAG GT G G C C AG C C AC GAG C AG G CAT C GAT AT AT C C C TAT GAT GT G CCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTT CTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATC AGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATAC ACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCC ACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCA ATCGCAGCAAACTCCGGCATCTACGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGT TCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAA AAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACAC CACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCT TCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGAT CTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGT ACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAG CCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGC AC C AG CAT TAG C AGT AT TAG C AC C G GT G C C AC CGCTAGCGCCCTGGT T AAAG G C AAT AC C AAT C C GAT T AC AAG C ATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAAT CCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAA AAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTT TTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATC GACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCA AATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGT TATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGT TGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGC GACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGC CCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTG AAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAAC CTGTAA (SEQ ID NO: 615)
Protein :
MAICQFFLQGRCRFGDRCWNEHPGARGAGGGRQQPQQQPSGNNRRGWNTTSQRYSNVIQPSSFSKSTPWGGSRDQ EKPYFSSFDSGASTNRKEGFGLSENPFASLSPDEQKDEKKLLEGIVKDMEVWESSGQWMFSVYSPVKKKPNISGF TDISPEELRLEYHNFLTSNNLQSYLNSVQRLINQWRNRWELKSLNISTKVALLSDVKDGWQAAPAFGFGSSQA ATFMSPGFPWNSSSDNAQNFSFKTNSGFAAASSGSPAGFGSSPAFGAAASTSSGISTSAPAFGFGKPEVTSAAS FSFKSPAASSFGSPGFSGLPASLATGPVRAPVAPAFGGGSSVAGFGSPGSHSHTAFSKPSSDTFGNSSISTSLSA SSSIIATDNVLFTPRDKLTVEELEQFQSKKFTLGKIPLKPPPLELLNVGAPGSAGSAAGSGMASNDYTQQATQSY GAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQS SYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSS SGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGY EPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTN KKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRG GPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHM GGNYGDDRRGGRGGAIAARNAYLRKKIARLKKDNLQLERDEQNLEKIIANLRDEIARLENEVASHEQASIYPYDV PDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKY TIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGADYKDDDDKGAPGSAG SAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRS SRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQ PSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLN PKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGI DNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSG CTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLL KVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 616)
TOM20 : : FUS : : SYNZIP4 : : lN22 : : CbzRS
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCACAAAAGGTGGCTGAACTGAAAAATAGAGTGGCCGTGAAGCTGAACCGGAACGAGCAGCTGAAGAACAAGGTG GAAGAGCTGAAGAACAGAAACGCCTACCTGAAGAATGAGCTGGCCACCCTGGAAAACGAGGTGGCCAGACTGGAA AACGACGTGGCCGAGTTAATCGCAGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCATCGATAGAG CAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAG AAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCT GGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCT GCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGAC GCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGA GCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAG CGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCTAGAGGGCCCGTTGGTGCTCCT GGTTCAGCAGGAAGCGCAGCAGGATCAGGTGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACC CTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCAT AAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAAC AATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTG TCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGC GCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCA GCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCA GCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAAT CCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAG GTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTG TCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAA ATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAG CGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCT ATGCTTGCACCAAATCTGATGAACTATGGACGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAG ATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTACACAA ATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTC AAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGT GCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTG GAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGG ATCTCTACGAACCTGTAA (SEQ ID NO: 617)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAQKVAELKNRVAVKLNRNEQLKNKVEELKNRNAYLKNELATLENEVARLE NDVAELIAGAPGSAGSAAGSGASIEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGA GGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDG AGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLESRGPVGAPGSAGSAAGSGACPVPLQLPPLERLT LDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRV SDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVP ASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELL SRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRP MLAPNLMNYGRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFTQMGSGCTRENLESIITDFLNHLGIDF KIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNG ISTNL* (SEQ ID NO: 618)
EBAG9I-29 : : FUS : : SYNZIP4 : : 4clN22 : : IFRS1
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAA CAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTAC GGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTAT GGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGT AGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGC ACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAG CCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAG AACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCC ATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTAT GGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAAC CGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGAC CGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGAC AACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATT GGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTG AAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAA TTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGT GGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGA GGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGT GAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGA CCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCACAAAAG GTGGCTGAACTGAAAAATAGAGTGGCCGTGAAGCTGAACCGGAACGAGCAGCTGAAGAACAAGGTGGAAGAGCTG AAGAACAGAAACGCCTACCTGAAGAATGAGCTGGCCACCCTGGAAAACGAGGTGGCCAGACTGGAAAACGACGTG GCCGAGTTAATCGCAGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCATCGATAGAGCAGAAGCTG ATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCT CAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTA GCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCA CCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACA CGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCT GGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCT GAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCTAGAGGGCCCGTTGGTGCTCCTGGTTCAGCA GGAAGCGCAGCAGGATCAGGTGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGAC AAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAA CACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGC TCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAG GATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACC CGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCA CAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTG AGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACA AGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTG AATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGT AAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGC TTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGC ATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTTGCA CCAAATATGCTGAACTATAGCCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCG TGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGTCTTTTATGCAAATGGGTTCA GGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTG GGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTT GGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTG CTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACG AACCTGTAA (SEQ ID NO: 619)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPY GQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSS TSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSS MSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSD RGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKL KGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRG GFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAQK VAELKNRVAVKLNRNEQLKNKVEELKNRNAYLKNELATLENEVARLENDVAELIAGAPGSAGSAAGSGASIEQKL ISEEDLLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANP PLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRA EKQAQWKAANPPLESRGPVGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIK HHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPT RTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPIT SMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITR FFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNMLNYSRKLDRALPDPIKIFEIGP CYRKESDGKEHLEEFTMLSFMQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAW GPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 620)
TOM20 : : FUS : : SYNZIP3 : : 4clN22 : : PylRS (AA)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATGAGGTGACCACCCTGGAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATC GCCAGACTGAGAAAGGAAAAGGCCGCTCTGCGGAACAGACTGGCCCACAAGAAGGGAGCACCAGGAAGTGCTGGT TCTGCTGCTGGTAGTGGAGCATCGATAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCA CAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCC GGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGT CGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGA GCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAA TGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCC ACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCG CTCGAGTCTAGAGGGCCCGTTGGTGCTCCTGGTTCAGCAGGAAGCGCAGCAGGATCAGGTGCGTGCCCGGTGCCG CTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGT CTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAG ATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAA TATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGAC CAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGT GCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCT GTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACC GCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCA CTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAA CCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGT GAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCC CCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTC CGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGT GCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTG GAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACC GATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGAT GTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGAC AAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGT GCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 621) Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND
YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG
YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY
GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG
GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF
KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG
NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP
GGGPGGSHMGGNYGDDRRGGRGGAIANEVTTLENDAAFIENENAYLEKEIARLRKEKAALRNRLAHKKGAPGSAG
SAAGSGASIEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRER
RAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLA
TMDAQTRRRERRAEKQAQWKAANPPLESRGPVGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATG
LWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANED
QTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGAT
ASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEER
ENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDR
ALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLD
VMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID
NO: 622)
LCK : : FUS : : SYNZIP1 : : MCP : : PylRS (AA)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCCTCAAACGATTATACCCAACAAGCAACCCAA AGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGT TACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAG AACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCC CAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGT TACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGT GGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAAC AGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGT GGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGAC CGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGT GGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTC AATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATC TTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAG ACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCA ACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAAT CCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGG CGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGT GGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAAC TTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCT CACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATCGCAAATCTGGTGGCCCAGCTG GAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAACCTGCACAAGAAGGACCTGATC GCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAAGCATCGATATATCCCTATGATGTGCCG GATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTC GTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGC TCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACC ATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACG AATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATC GCAGCAAACTCCGGCATCTACGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCT GCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAA CCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCAC GAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCT CGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTG AACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACT AAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCG TCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACC AGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATG TCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCG AAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAA GACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTC GTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGAC AATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTGGCACCAAAT CTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTAT CGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGCCTTTGCCCAAATGGGTTCAGGTTGT ACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGAC AGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTTGGACCA ATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCAGGATTCGGTCTGGAACGCCTGCTGAAA GTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCGTATTATAACGGGATCTCTACGAACCTG TAA (SEQ ID NO: 623)
Protein :
MGCVCSSNPEGTELASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQ NTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYG GQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQD RGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTI FVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGN PIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMN FSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIANLVAQLENEVASLENENETLKKKNLHKKDLI AYLEKEIANLRKKIEEASIYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWIS SNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAI AANSGIYGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHH EVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRT KKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSM SAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFF VDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCY RKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGP IPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 624)
TOM20 : : EWSR1 : : SYNZIP4 : : 4clN22 : : SYNZIP4 : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAGTTCTTCATGGCGTCCACGGAT TACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCA CAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCT CAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCA ACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTC ACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAG CCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACA GGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCC ATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAG AGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGC AGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGC CAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAG GAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGG GGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGA GGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGAT GAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTC TTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAA ACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTT GATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGG GGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCT GGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCC CGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGA AACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCC TTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGAT CGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATG GACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATC GCACAAAAGGTGGCTGAACTGAAAAATAGAGTGGCCGTGAAGCTGAACCGGAACGAGCAGCTGAAGAACAAGGTG GAAGAGCTGAAGAACAGAAACGCCTACCTGAAGAATGAGCTGGCCACCCTGGAAAACGAGGTGGCCAGACTGGAA AACGACGTGGCCGAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGAGCAGAAGCTGATCTCAGAG GAGGACCTGCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAA GCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATG GACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAC GGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGT GAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGA GCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAA GCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCTAGAGGGCCCGTTTCTGGGCTAAGCGGTGCTCCGGGGTCA GCCGGAAGTGCAGCAGGATCAGGTCAAAAGGTGGCTGAACTGAAAAATAGAGTGGCCGTGAAGCTGAACCGGAAC GAGCAGCTGAAGAACAAGGTGGAAGAGCTGAAGAACAGAAACGCCTACCTGAAGAATGAGCTGGCCACCCTGGAA AACGAGGTGGCCAGACTGGAAAACGACGTGGCCGAGGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGC ACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG GAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG AACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCG GGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCC TATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 625)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASTD YSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTP TAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSST GGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQS SYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRG GFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADF FKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMR GGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCG NQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGM DRGGFGGGRRGGPGGPPGPLMEQAIAQKVAELKNRVAVKLNRNEQLKNKVEELKNRNAYLKNELATLENEVARLE NDVAEGAPGSAGSAAGSGEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATM DAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAG AGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLESRGPVSGLSGAPGSAGSAAGSGQKVAELKNRVAVKLNRN EQLKNKVEELKNRNAYLKNELATLENEVARLENDVAEGKPIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPL ERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCK RCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQES VSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELE SELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNF CLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHL GIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSES YYNGISTNL* (SEQ ID NO: 626)
TOM20 : : FUS : : SYNZIP1 : :MCP: : SYNZIP1 : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATCTGGTGGCCCAGCTGGAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAAC CTGCACAAGAAGGACCTGATCGCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAAGCATCG ATATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCT AACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAAC GGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCT GCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTA ACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGA AACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGGCTAAGCGGTGCTCCGGGGTCAGCCGGAAGT GCAGCAGGATCAGGTAATCTGGTGGCCCAGCTGGAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTG AAGAAAAAGAACCTGCACAAGAAGGACCTGATCGCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATC GAGGAAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCT GCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGACAAAAAACCG CTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAG GTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGT ACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAAC AAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAA AAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCT GGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGC ATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCT GCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAA GACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGAC CTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTG GATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAAT GATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTG GCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGT AAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACT CGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGC TGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATC CCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTA AAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 627)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIANLVAQLENEVASLENENETLKKKNLHKKDLIAYLEKEIANLRKKIEEAS IYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSS AQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGLSGAPGSAGS AAGSGNLVAQLENEVASLENENETLKKKNLHKKDLIAYLEKEIANLRKKIEEGKPIPNPLLGLDSTGAPGSAGSA AGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSR TARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPS GSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPK DEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDN DTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCT RENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKV KHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 628)
TOM20 : : EWSR1 : : SYNZIP4 : : 4clN22 : : SYNZIP4 : : PylRS (AA)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAGTTCTTCATGGCGTCCACGGAT TACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCA CAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCT CAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCA ACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTC ACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAG CCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACA GGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCC ATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAG AGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGC AGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGC CAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAG GAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGG GGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGA GGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGAT GAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTC TTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAA ACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTT GATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGG GGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCT GGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCC CGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGA AACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCC TTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGAT CGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATG GACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATC GCACAAAAGGTGGCTGAACTGAAAAATAGAGTGGCCGTGAAGCTGAACCGGAACGAGCAGCTGAAGAACAAGGTG GAAGAGCTGAAGAACAGAAACGCCTACCTGAAGAATGAGCTGGCCACCCTGGAAAACGAGGTGGCCAGACTGGAA AACGACGTGGCCGAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGAGCAGAAGCTGATCTCAGAG GAGGACCTGCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAA GCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATG GACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAC GGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGT GAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGA GCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAA GCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCTAGAGGGCCCGTTTCTGGGCTAAGCGGTGCTCCGGGGTCA GCCGGAAGTGCAGCAGGATCAGGTCAAAAGGTGGCTGAACTGAAAAATAGAGTGGCCGTGAAGCTGAACCGGAAC GAGCAGCTGAAGAACAAGGTGGAAGAGCTGAAGAACAGAAACGCCTACCTGAAGAATGAGCTGGCCACCCTGGAA AACGAGGTGGCCAGACTGGAAAACGACGTGGCCGAGGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGC ACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG GAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG GCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCA GGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCG TATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 629)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASTD YSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTP TAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSST GGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQS SYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRG GFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADF FKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMR GGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCG NQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGM DRGGFGGGRRGGPGGPPGPLMEQAIAQKVAELKNRVAVKLNRNEQLKNKVEELKNRNAYLKNELATLENEVARLE NDVAEGAPGSAGSAAGSGEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATM DAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAG AGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLESRGPVSGLSGAPGSAGSAAGSGQKVAELKNRVAVKLNRN EQLKNKVEELKNRNAYLKNELATLENEVARLENDVAEGKPIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPL ERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCK RCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQES VSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELE SELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNF CLRPMLAPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHL GIDFKIVGDSCMVYGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSES YYNGISTNL* (SEQ ID NO: 630)
TOM20 : : FUS : : SYNZIP3 : : 4clN22 : : SYNZIP3 : : PylRS (AA)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATGAGGTGACCACCCTGGAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATC GCCAGACTGAGAAAGGAAAAGGCCGCTCTGCGGAACAGACTGGCCCACAAGAAGGGAGCACCAGGAAGTGCTGGT TCTGCTGCTGGTAGTGGAGCATCGATAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCA CAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCC GGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGT CGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGA GCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAA TGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCC ACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCG CTCGAGTCTAGAGGGCTAAGCGGTGCTCCGGGGTCAGCCGGAAGTGCAGCAGGATCAGGTAATGAGGTGACCACC CTGGAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATCGCCAGACTGAGAAAGGAA AAGGCCGCTCTGCGGAACAGACTGGCCCACAAGAAGGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGC ACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG GAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTGGCACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG GCCTTTGCCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTATGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTTGGACCAATTCCGCTGGACCGTGAGTGGGGTATCGACAAACCGTGGATCGGAGCA GGATTCGGTCTGGAACGCCTGCTGAAAGTGAAACACGACTTCAAAAACATCAAACGTGCCGCCCGTTCTGAATCG TATTATAACGGGATCTCTACGAACCTGTAA (SEQ ID NO: 631)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND
YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG
YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY
GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIANEVTTLENDAAFIENENAYLEKEIARLRKEKAALRNRLAHKKGAPGSAG SAAGSGASIEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRER RAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLA TMDAQTRRRERRAEKQAQWKAANPPLESRGLSGAPGSAGSAAGSGNEVTTLENDAAFIENENAYLEKEIARLRKE KAALRNRLAHKKGKPIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRT GTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKV KWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVK GNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGK LEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLYNYLRKLDRALPDPI KIFEIGPCYRKESDGKEHLEEFTMLAFAQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVYGDTLDVMHGDL ELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 632)
LCK : : EWSR1 : : SYNZIP4 : : 4clN22 : : SYNZIP4 : : PylRS (AF)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCATGGCGTCCACGGATTACAGTACCTATAGCCAA GCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATAT GGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTAT GGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATAC AGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCC TATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCT ACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCC AGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCA CCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAG AACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCT CCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTAC GGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCC GGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGC ATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCT GGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGT GCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTT GTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGC GATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAA GGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGT GAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGC ATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGA GGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGG AGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGT GATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGA ATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGT GGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATCGCACAAAAGGTGGCTGAA CTGAAAAATAGAGTGGCCGTGAAGCTGAACCGGAACGAGCAGCTGAAGAACAAGGTGGAAGAGCTGAAGAACAGA AACGCCTACCTGAAGAATGAGCTGGCCACCCTGGAAAACGAGGTGGCCAGACTGGAAAACGACGTGGCCGAGGGA GCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACC ATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTC GACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGA CGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCT GGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAA CAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGC GGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCA AACCCACCGCTCGAGTCTAGAGGGCCCGTTTCTGGGCTAAGCGGTGCTCCGGGGTCAGCCGGAAGTGCAGCAGGA TCAGGTCAAAAGGTGGCTGAACTGAAAAATAGAGTGGCCGTGAAGCTGAACCGGAACGAGCAGCTGAAGAACAAG GTGGAAGAGCTGAAGAACAGAAACGCCTACCTGAAGAATGAGCTGGCCACCCTGGAAAACGAGGTGGCCAGACTG GAAAACGACGTGGCCGAGGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGT GCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGAT GACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATC AAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGC CGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGAT GAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCT ACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAG GCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGT GTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATT ACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTG CTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGT CGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACC CGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATG GGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTA GCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGC CCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGT TCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATT GTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTT GTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGT CTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCT ACTAACCTGTAA (SEQ ID NO: 633)
Protein :
MGCVCSSNPEGTELMASTDYSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATY GQTAYATSYGQPPTGYTTPTAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAP TRPQDGNKPTETSQPQSSTGGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQ NTYGQPSSYGQQSSYGQQSSYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFS GPGENRSMSGPDNRGRGRGGFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNS AIYVQGLNDSVTLDDLADFFKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQ GSKLKVSLARKKPPMNSMRGGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSG GGNVQHRAGDWQCPNPGCGNQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGG MFRGGRGGDRGGFRGGRGMDRGGFGGGRRGGPGGPPGPLMEQAIAQKVAELKNRVAVKLNRNEQLKNKVEELKNR NAYLKNELATLENEVARLENDVAEGAPGSAGSAAGSGEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANPPL DGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEK QAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLESRGPVSGLSGAPGSAGSAAG SGQKVAELKNRVAVKLNRNEQLKNKVEELKNRNAYLKNELATLENEVARLENDVAEGKPIPNPLLGLDSTGAPGS AGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNS RSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQ AQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVL LNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERM GIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMG SGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLER LLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 634)
LCK: : FUS : : SYNZIP3 : : 4clN22 : : SYNZIP3 : : PylRS (AF)
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCCTCAAACGATTATACCCAACAAGCAACCCAA AGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGT TACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAG AACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCC CAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGT TACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGT GGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAAC AGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGT GGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGAC CGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGT GGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTC AATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATC TTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAG ACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCA ACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAAT CCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGG CGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGT GGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAAC TTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCT CACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATCGCAAATGAGGTGACCACCCTG GAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATCGCCAGACTGAGAAAGGAAAAG GCCGCTCTGCGGAACAGACTGGCCCACAAGAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCA TCGATAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGT CGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGA GCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAA TGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCC ACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCG CTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGA CGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCTAGAGGGCTAAGC GGTGCTCCGGGGTCAGCCGGAAGTGCAGCAGGATCAGGTAATGAGGTGACCACCCTGGAAAACGACGCCGCCTTC ATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATCGCCAGACTGAGAAAGGAAAAGGCCGCTCTGCGGAACAGA CTGGCCCACAAGAAGGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCT GGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGAC AAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAA CACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGC TCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAG GATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACC CGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCA CAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTG AGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACA AGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTG AATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGT AAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGC TTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGC ATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCA CCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCG TGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCA GGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTG GGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTG GGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTG CTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACT AACCTGTAA (SEQ ID NO: 635)
Protein :
MGCVCSSNPEGTELASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQ NTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYG GQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQD RGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTI FVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGN PIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMN FSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIANEVTTLENDAAFIENENAYLEKEIARLRKEK AALRNRLAHKKGAPGSAGSAAGSGASIEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAG AGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPP LDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLESRGLSGAPGSAGSAAGSGNEVTTLENDAAF IENENAYLEKEIARLRKEKAALRNRLAHKKGKPIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDD KKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDE DLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASV STSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRR KKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLA PNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIV GDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGIST NL* (SEQ ID NO: 636)
TOM20 : : FUS : : SYNZIP3 : : 4clN22 : : SYNZIP3 : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATGAGGTGACCACCCTGGAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATC GCCAGACTGAGAAAGGAAAAGGCCGCTCTGCGGAACAGACTGGCCCACAAGAAGGGAGCACCAGGAAGTGCTGGT TCTGCTGCTGGTAGTGGAGCATCGATAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCA CAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCC GGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGT CGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGA GCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAA TGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCC ACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCG CTCGAGTCTAGAGGGCTAAGCGGTGCTCCGGGGTCAGCCGGAAGTGCAGCAGGATCAGGTAATGAGGTGACCACC CTGGAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATCGCCAGACTGAGAAAGGAA AAGGCCGCTCTGCGGAACAGACTGGCCCACAAGAAGGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGC ACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG GAACGCCTGACCCTGGATGACAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG AACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCG GGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCC TATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 637)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIANEVTTLENDAAFIENENAYLEKEIARLRKEKAALRNRLAHKKGAPGSAG SAAGSGASIEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRER RAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLA TMDAQTRRRERRAEKQAQWKAANPPLESRGLSGAPGSAGSAAGSGNEVTTLENDAAFIENENAYLEKEIARLRKE KAALRNRLAHKKGKPIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRT GTIHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKV KWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVK GNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGK LEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPI KIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDL ELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 638)
TOM20 : : EWSR1 : : SYNZIP2 : :MCP: : SYNZIP2 : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAGTTCTTCATGGCGTCCACGGAT TACAGTACCTATAGCCAAGCTGCAGCGCAGCAGGGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCA CAGACCACCCAGGCATATGGGCAACAAAGCTATGGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCT CAGACCACTGCAACCTATGGGCAGACCGCCTATGCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCA ACTGCCCCCCAGGCATACAGCCAGCCTGTCCAGGGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTC ACCACCACCCAGGCCTCCTATGCAGCTCAGTCTGCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAG CCAGCAGCCACTGCACCTACAAGACCGCAGGATGGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACA GGGGGTTACAACCAGCCCAGCCTAGGATATGGACAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCC ATGCAGCCAGTCACTGCACCTCCATCCTACCCTCCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAG AGCAGTTACTCTCAGCAGAACACCTATGGGCAACCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGC AGCTATGGGCAGCAGCCTCCCACTAGTTACCCACCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGC CAACAGAGCAGCAGCTACGGGCAGCAGAGTTCATTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAG GAGTCTGGAGGATTTTCCGGACCAGGAGAGAACCGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGG GGATTTGATCGTGGAGGCATGAGCAGAGGTGGGCGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGA GGTGGCTTCAATAAGCCTGGTGGACCCATGGATGAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGAT GAAGACTCTGACAACAGTGCAATTTATGTACAAGGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTC TTTAAGCAGTGTGGGGTTGTTAAGATGAACAAGAGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAA ACAGGAAAGCCCAAAGGCGATGCCACAGTGTCCTATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTT GATGGGAAAGATTTTCAAGGGAGCAAACTTAAAGTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGG GGTGGTCTGCCACCCCGTGAGGGCAGAGGCATGCCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCT GGGGGACCCATGGGTCGCATGGGAGGCCGTGGAGGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCC CGAGGGAACCCCTCTGGAGGAGGAAACGTCCAGCACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGA AACCAGAACTTCGCCTGGAGAACAGAGTGCAACCAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCC TTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGTGGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGAT CGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGCCGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATG GACCGAGGTGGCTTTGGTGGAGGAAGACGAGGTGGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATC GCAGCTAGAAACGCCTACCTGAGAAAGAAAATCGCCAGACTGAAGAAGGACAACCTGCAGCTGGAAAGAGACGAG CAGAACCTGGAAAAGATCATCGCCAACCTCAGAGATGAGATCGCCAGACTGGAAAACGAGGTGGCCAGCCACGAG CAGTATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCT AACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAAC GGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCT GCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTA ACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGA AACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGGCTAAGCGGTGCTCCGGGGTCAGCCGGAAGT GCAGCAGGATCAGGTGCTAGAAACGCCTACCTGAGAAAGAAAATCGCCAGACTGAAGAAGGACAACCTGCAGCTG GAAAGAGACGAGCAGAACCTGGAAAAGATCATCGCCAACCTCAGAGATGAGATCGCCAGACTGGAAAACGAGGTG GCCAGCCACGAGCAGGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCT GGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGAC AAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAA CACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGC TCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAG GATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACC CGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCA CAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTG AGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACA AGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTG AATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGT AAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGC TTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGC ATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCA CCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCG TGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCA GGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTG GGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTG GGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTG CTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACT AACCTGTAA (SEQ ID NO: 639)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASTD YSTYSQAAAQQGYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTP TAPQAYSQPVQGYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSST GGYNQPSLGYGQSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQS SYGQQPPTSYPPQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRG GFDRGGMSRGGRGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADF FKQCGWKMNKRTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMR GGLPPREGRGMPPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCG NQNFAWRTECNQCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGM DRGGFGGGRRGGPGGPPGPLMEQAIAARNAYLRKKIARLKKDNLQLERDEQNLEKIIANLRDEIARLENEVASHE QYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSS AQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIYGLSGAPGSAGS AAGSGARNAYLRKKIARLKKDNLQLERDEQNLEKIIANLRDEIARLENEVASHEQGKPIPNPLLGLDSTGAPGSA GSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSR SSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQA QPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLL NPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMG IDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGS GCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERL LKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 640)
LCK: : OMeRS
DNA:
ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTG GAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACC GGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCAT CTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAA CGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTG AAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAA AACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCC GTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAA GGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACC GATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAG AGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAA CTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTG GAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTC TGTCTGCGCCCTATGCTAACACCAAATCTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATC AAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTG GTCTTTTGGCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTG GGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTG GAACTGTCTAGTGCCCTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCG GGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCC TATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 641)
Protein :
MGCVCSSNPEGTELACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDH LWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLE NTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQT DRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPL EYIERMGIDNDTELSKQIFRVDKNFCLRPMLTPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTML VFWQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSALVGPIPLDREWGIDKPWIGA GFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 642)
TOM20 : : FUS : : OMeRS
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAGGCAAGCCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCT GGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTG AATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTT AGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACA GCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAA TTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAA GCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGA AGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATT AGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCC
CCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGAC
GAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTG
CAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGAT
CGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGAT
ACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAACACCAAATCTGTAT
AACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAA
GAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGTCTTTTGGCAAATGGGTTCAGGTTGTACTCGT
GAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGT
ATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCCTTGTGGGCCCAATCCCG
CTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAA
CACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA
(SEQ ID NO: 643)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAGKPIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPL NTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDLNK FLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSI SSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDL QQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLTPNLY NYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLVFWQMGSGCTRENLESIITDFLNHLGIDFKIVGDSC MVFGDTLDVMHGDLELSSALVGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 644)
KIF16B : : FUS : : OMeRS
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA TATACAGATATTGAAATGAACAGATTGGGAAAGGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATG GCCTCAAACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCC CAGCAGAGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGC CAGAGCAGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGC TCGACTGGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGC CAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAG AGTGGGAGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCC CCTCAGGGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAAC TATGGCCAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGT GGAGGTGGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGC GGCGGCGGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGA GGTGGCATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGAC TCCGAACAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTG GCTGATTACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACA GACAGGGAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATT GACTGGTTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAAT CGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGC AGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGG AAGTGTCCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAA CCAGATGGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGA GGAGGCGATTACAAGGATGACGACGATAAGGGTACCGGCGCCCCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGC GCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTG ATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCG AAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCA CTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACA AAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCG AAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTC TCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATT AGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAA GCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGC CTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATC TATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTT CTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTG AGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAACACCAAATCTGTATAACTATCTG CGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGAC GGTAAAGAACATCTGGAGGAGTTTACCATGCTGGTCTTTTGGCAAATGGGTTCAGGTTGTACTCGTGAGAACCTG GAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTT GGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCCTTGTGGGCCCAATCCCGCTGGATCGT GAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTC AAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 645)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTYTDIEMNRLGKGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYS QQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYG QQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGN YGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGR GGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYT DRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGG SGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGR GGDYKDDDDKGTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRS KIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMP KSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQ ASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGF LEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLTPNLYNYLRKLDRALPDPIKIFEIGPCYRKESD GKEHLEEFTMLVFWQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSALVGPIPLDR EWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 646)
LCK : : FUS : : OMeRS
DNA: ATGGGCTGCGTGTGCAGCAGCAACCCCGAGGGTACCGAGCTCGCCTCAAACGATTATACCCAACAAGCAACCCAA AGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTACGGACAGCAGAGT TACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTATGGCCAGAGCCAG AACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGTAGCCAGAGCTCC CAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGCACCTCGGGAAGT TACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAGCCTAGCTATGGT GGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAGAACCAGTACAAC AGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCCATGAGTAGTGGT GGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTATGGACAGCAGGAC CGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAACCGCAGCAGTGGT GGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGACCGTGGTGGCTTC AATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGACAACAACACCATC TTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATTGGTATTATTAAG ACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTGAAGGGAGAGGCA ACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAATTCTCCGGAAAT CCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGTGGAGGCCGAGGG CGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGAGGATTTCCCAGT GGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGTGAGAATATGAAC TTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGACCAGGTGGCTCT CACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGGCGCCCCCGGCTCCGCCGGCTCCGCC GCCGGCTCCGGCATGGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAA CCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCAC GAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCT CGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTG AACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACT AAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCG TCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACC AGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATG TCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCG AAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAA GACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTC GTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGAC AATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAACACCAAAT CTGTATAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTAT CGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGGTCTTTTGGCAAATGGGTTCAGGTTGT ACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGAC AGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCCTTGTGGGCCCA ATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAA GTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTG TAA (SEQ ID NO: 647)
Protein :
MGCVCSSNPEGTELASNDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQ NTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYG GQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQD RGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTI FVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGN PIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMN FSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGGAPGSAGSAAGSGMACPVPLQLPPLERLTLDDKK PLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDL NKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVST SISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKK DLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLTPN LYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLVFWQMGSGCTRENLESIITDFLNHLGIDFKIVGD SCMVFGDTLDVMHGDLELSSALVGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL
* (SEQ ID NO: 648) EBAG9I-29 : FUS : : OMeRS
DNA:
ATGGCCATCACCCAGTTTCGGTTATTTAAATTTTGTACCTGCCTAGCAACAGTATTCTCATTCCTAAAGAGATTA ATATGCAGATCTGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCAAACGATTATACCCAA CAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGTCAGCCCTAC GGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTATTCTTCTTAT GGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGCTATGGCAGT AGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCTCCCAGCAGC ACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTACAGCCAGCAG CCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTATGGACAGCAG AACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGATCAATCCTCC ATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGCGGTGGCTAT GGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGTGGTTACAAC CGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGCGGAAGTGAC CGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGATAATTCAGAC AACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTCAAGCAGATT GGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACTGGCAAGCTG AAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGATGGTAAAGAA TTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGCAATGGTCGT GGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGTGGCCGAGGA GGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAATCCCACCTGT GAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCAGGAGGGGGA CCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGTGCGATCGCAGGCGCC GATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTG CCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACT GGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATT GAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCAC AAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAG GACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCT CGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATT CCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCC ACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCA GCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGC AAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAGCAAATCTATGCCGAAGAA CGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAA TCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATT TTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAACACCAAATCTGTATAACTATCTGCGCAAACTGGAC CGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACAT CTGGAGGAGTTTACCATGCTGGTCTTTTGGCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATC ACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTG GATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCCTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATC GACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAA CGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 649)
Protein :
MAITQFRLFKFCTCLATVFSFLKRLICRSGAPGSAGSAAGSGMASNDYTQQATQSYGAYPTQPGQGYSQQSSQPY GQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGGYGSSQSSQSSYGQQSSYPGYGQQPAPSS TSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGYGQQNQYNSSSGGGGGGGGGGNYGQDQSS MSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGGGYNRSSGGYEPRGRGGGRGGRGGMGGSD RGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYFKQIGIIKTNKKTGQPMINLYTDRETGKL KGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGGNGRGGRGRGGPMGRGGYGGGGSGGGGRG GFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGPGGGPGGSHMGGNYGDDRRGGRGGAIAGA DYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYI EMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVA RAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAP ALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIK SPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLTPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEH LEEFTMLVFWQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSALVGPIPLDREWGI DKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 650)
TOM20 : : FUS : : SYNZIP1 : : OMeRS
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATCTGGTGGCCCAGCTGGAAAACGAGGTGGCCAGCCTGGAAAACGAGAACGAAACCCTGAAGAAAAAGAAC CTGCACAAGAAGGACCTGATCGCCTACCTGGAAAAGGAAATCGCCAACCTGAGAAAGAAGATCGAGGAAGGCAAG CCTATTCCCAACCCCCTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGA GCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTG ATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCG AAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCA CTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACA AAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCG AAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTC TCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATT AGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAA GCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGC CTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATC TATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTT CTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTG AGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAACACCAAATCTGTATAACTATCTG CGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGAC GGTAAAGAACATCTGGAGGAGTTTACCATGCTGGTCTTTTGGCAAATGGGTTCAGGTTGTACTCGTGAGAACCTG GAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTT GGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCCTTGTGGGCCCAATCCCGCTGGATCGT GAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTC AAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 651)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND
YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG
YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY
GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG
GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF
KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIANLVAQLENEVASLENENETLKKKNLHKKDLIAYLEKEIANLRKKIEEGK PIPNPLLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRS KIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMP KSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQ ASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGF LEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLTPNLYNYLRKLDRALPDPIKIFEIGPCYRKESD GKEHLEEFTMLVFWQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSALVGPIPLDR EWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 652)
TOM20 : : FUS : : SYNZIP3 : : OMeRS
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCAAATGAGGTGACCACCCTGGAAAACGACGCCGCCTTCATCGAGAACGAGAACGCCTACCTGGAAAAAGAGATC GCCAGACTGAGAAAGGAAAAGGCCGCTCTGCGGAACAGACTGGCCCACAAGAAGGGCAAGCCTATTCCCAACCCC CTGCTGGGCCTGGATAGCACCGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCG CTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGT CTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAG ATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAA TATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGAC CAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGT GCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCT GTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACC GCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCA CTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAA CCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGT GAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCC CCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTC CGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAACACCAAATCTGTATAACTATCTGCGCAAACTGGACCGT GCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTG GAGGAGTTTACCATGCTGGTCTTTTGGCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACC GATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGAT GTCATGCACGGCGACCTGGAACTGTCTAGTGCCCTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGAC AAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGT GCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 653) Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIANEVTTLENDAAFIENENAYLEKEIARLRKEKAALRNRLAHKKGKPIPNP LLGLDSTGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIE MACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVAR APKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPA LTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKS PILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLTPNLYNYLRKLDRALPDPIKIFEIGPCYRKESDGKEHL EEFTMLVFWQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSALVGPIPLDREWGID KPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 654)
SLP3 : : FUS : : PylRS (AF)
DNA:
ATGGATTCTAGGGTGTCTTCACCTGAGAAGCAAGATAAAGAGAATTTCGTGGGTGTCAACAATAAACGGCTTGGT GTATGTGGCTGGATCCTGTTTTCCCTCTCTTTCCTGTTGGTGATCATTACCTTCCCCATCTCCATATGGATGTGC TTGAAGATCATTAAGGAGTATGAACGTGGAGCACCCGGCTCCGCCGGCTCCGCCGCCGGCTCCGGCATGGCCTCA AACGATTATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAG AGCAGTCAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGC AGCTATTCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACT GGCGGCTATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAG CCAGCTCCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGG AGCTACAGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAG GGCTATGGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGC CAAGATCAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGT GGCAGCGGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGC GGTGGTGGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGC ATGGGCGGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAA CAGGATAATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGAT TACTTCAAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGG GAAACTGGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGG TTTGATGGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGT GGTGGCAATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGT GGTGGTGGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGT CCTAATCCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGAT GGCCCAGGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGTGGT GCGATCGCAGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGT GGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACC CTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGT TCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGT GCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTG ACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATG CCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAA TTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGT ATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTT CAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATC AGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAA ATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGC TTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAA CTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTAT CTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCC GACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAAC CTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTG TTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGAT CGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGAC TTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA ( SEQ ID NO : 655 )
Protein :
MDSRVSSPEKQDKENFVGWNKRLGVCGWILFSLSFLLVIITFPISIWMCLKIIKEYERGAPGSAGSAAGSGMAS
NDYTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGST
GGYGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQ
GYGQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGG
GGGYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVAD
YFKQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRG
GGNGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPD
GPGGGPGGSHMGGNYGDDRRGGRGGAIAGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNT
LISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFL
TKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISS
ISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQ
IYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANY
LRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMV
FGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL*
(SEQ ID NO: 656)
SLP3 : :MCP
DNA:
ATGGATTCTAGGGTGTCTTCACCTGAGAAGCAAGATAAAGAGAATTTCGTGGGTGTCAACAATAAACGGCTTGGT GTATGTGGCTGGATCCTGTTTTCCCTCTCTTTCCTGTTGGTGATCATTACCTTCCCCATCTCCATATGGATGTGC TTGAAGATCATTAAGGAGTATGAACGTGCGATCGCATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGA AGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGC GACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCT TACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAA GGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATT GTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTAC TAA (SEQ ID NO: 657)
Protein :
MDSRVSSPEKQDKENFVGVNNKRLGVCGWILFSLSFLLVIITFPISIWMCLKIIKEYERAIAYPYDVPDYAGAPG SAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPK GAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY* (SEQ ID NO: 658)
SLP3 : : EWSR1 : :MCP
DNA:
ATGGATTCTAGGGTGTCTTCACCTGAGAAGCAAGATAAAGAGAATTTCGTGGGTGTCAACAATAAACGGCTTGGT GTATGTGGCTGGATCCTGTTTTCCCTCTCTTTCCTGTTGGTGATCATTACCTTCCCCATCTCCATATGGATGTGC TTGAAGATCATTAAGGAGTATGAACGTATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAG GGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTAT GGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTAT GCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAG GGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCT GCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGAT GGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGA CAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCT CCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAA CCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCA CCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCA TTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAAC CGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGG CGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGAT GAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAA GGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAG AGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCC TATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAA GTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATG CCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGA GGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAG CACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAAC CAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGT GGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGC CGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGT GGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATCGCATATCCCTATGATGTGCCGGATTATGCTGGA GCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGC GGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGT TCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAG GTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGC GAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCC GGCATCTACTAA (SEQ ID NO: 659)
Protein :
MDSRVSSPEKQDKENFVGWNKRLGVCGWILFSLSFLLVIITFPISIWMCLKIIKEYERMASTDYSTYSQAAAQQ GYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQ GYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYG QSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYP PQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGG RGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNK RTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGM PPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECN QCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRG GPGGPPGPLMEQAIAYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSR SQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANS GIY* (SEQ ID NO: 660)
SLP3 : : EWSR1 : : 4clN22
DNA:
ATGGATTCTAGGGTGTCTTCACCTGAGAAGCAAGATAAAGAGAATTTCGTGGGTGTCAACAATAAACGGCTTGGT GTATGTGGCTGGATCCTGTTTTCCCTCTCTTTCCTGTTGGTGATCATTACCTTCCCCATCTCCATATGGATGTGC TTGAAGATCATTAAGGAGTATGAACGTATGGCGTCCACGGATTACAGTACCTATAGCCAAGCTGCAGCGCAGCAG GGCTACAGTGCTTACACCGCCCAGCCCACTCAAGGATATGCACAGACCACCCAGGCATATGGGCAACAAAGCTAT GGAACCTATGGACAGCCCACTGATGTCAGCTATACCCAGGCTCAGACCACTGCAACCTATGGGCAGACCGCCTAT GCAACTTCTTATGGACAGCCTCCCACTGGTTATACTACTCCAACTGCCCCCCAGGCATACAGCCAGCCTGTCCAG GGGTATGGCACTGGTGCTTATGATACCACCACTGCTACAGTCACCACCACCCAGGCCTCCTATGCAGCTCAGTCT GCATATGGCACTCAGCCTGCTTATCCAGCCTATGGGCAGCAGCCAGCAGCCACTGCACCTACAAGACCGCAGGAT GGAAACAAGCCCACTGAGACTAGTCAACCTCAATCTAGCACAGGGGGTTACAACCAGCCCAGCCTAGGATATGGA CAGAGTAACTACAGTTATCCCCAGGTACCTGGGAGCTACCCCATGCAGCCAGTCACTGCACCTCCATCCTACCCT CCTACCAGCTATTCCTCTACACAGCCGACTAGTTATGATCAGAGCAGTTACTCTCAGCAGAACACCTATGGGCAA CCGAGCAGCTATGGACAGCAGAGTAGCTATGGTCAACAAAGCAGCTATGGGCAGCAGCCTCCCACTAGTTACCCA CCCCAAACTGGATCCTACAGCCAAGCTCCAAGTCAATATAGCCAACAGAGCAGCAGCTACGGGCAGCAGAGTTCA TTCCGACAGGACCACCCCAGTAGCATGGGTGTTTATGGGCAGGAGTCTGGAGGATTTTCCGGACCAGGAGAGAAC CGGAGCATGAGTGGCCCTGATAACCGGGGCAGGGGAAGAGGGGGATTTGATCGTGGAGGCATGAGCAGAGGTGGG CGGGGAGGAGGACGCGGTGGAATGGGCAGCGCTGGAGAGCGAGGTGGCTTCAATAAGCCTGGTGGACCCATGGAT GAAGGACCAGATCTTGATCTAGGCCCACCTGTAGATCCAGATGAAGACTCTGACAACAGTGCAATTTATGTACAA GGATTAAATGACAGTGTGACTCTAGATGATCTGGCAGACTTCTTTAAGCAGTGTGGGGTTGTTAAGATGAACAAG AGAACTGGGCAACCCATGATCCACATCTACCTGGACAAGGAAACAGGAAAGCCCAAAGGCGATGCCACAGTGTCC TATGAAGACCCACCTACTGCCAAGGCTGCCGTGGAATGGTTTGATGGGAAAGATTTTCAAGGGAGCAAACTTAAA GTCTCCCTTGCTCGGAAGAAGCCTCCAATGAACAGTATGCGGGGTGGTCTGCCACCCCGTGAGGGCAGAGGCATG CCACCACCACTCCGTGGAGGTCCAGGAGGCCCAGGAGGTCCTGGGGGACCCATGGGTCGCATGGGAGGCCGTGGA GGAGATAGAGGAGGCTTCCCTCCAAGAGGACCCCGGGGTTCCCGAGGGAACCCCTCTGGAGGAGGAAACGTCCAG CACCGAGCTGGAGACTGGCAGTGTCCCAATCCGGGTTGTGGAAACCAGAACTTCGCCTGGAGAACAGAGTGCAAC CAGTGTAAGGCCCCAAAGCCTGAAGGCTTCCTCCCGCCACCCTTTCCGCCCCCGGGTGGTGATCGTGGCAGAGGT GGCCCTGGTGGCATGCGGGGAGGAAGAGGTGGCCTCATGGATCGTGGTGGTCCCGGTGGAATGTTCAGAGGTGGC CGTGGTGGAGACAGAGGTGGCTTCCGTGGTGGCCGGGGCATGGACCGAGGTGGCTTTGGTGGAGGAAGACGAGGT GGCCCTGGGGGGCCCCCTGGACCTTTGATGGAACAGGCGATCGCAGGAGCACCAGGAAGTGCTGGTTCTGCTGCT GGTAGTGGAGAGCAGAAGCTGATCTCAGAGGAGGACCTGCTAGCCACCATGGACGCACAAACACGACGACGTGAG CGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCT GGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCT CAATGGAAAGCTGCAAACCCACCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTA GCCACCATGGACGCACAAACACGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCA CCGCTCGACGGAGCCGGAGCTGGCGCTGGAGCTGGAGCCGGAGCTGGCGGTCTAGCCACCATGGACGCACAAACA CGACGACGTGAGCGTCGCGCTGAGAAACAAGCTCAATGGAAAGCTGCAAACCCACCGCTCGAGTCTAGAGGGCCC GTTTAA (SEQ ID NO: 661)
Protein :
MDSRVSSPEKQDKENFVGWNKRLGVCGWILFSLSFLLVIITFPISIWMCLKIIKEYERMASTDYSTYSQAAAQQ GYSAYTAQPTQGYAQTTQAYGQQSYGTYGQPTDVSYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQAYSQPVQ GYGTGAYDTTTATVTTTQASYAAQSAYGTQPAYPAYGQQPAATAPTRPQDGNKPTETSQPQSSTGGYNQPSLGYG QSNYSYPQVPGSYPMQPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYGQPSSYGQQSSYGQQSSYGQQPPTSYP PQTGSYSQAPSQYSQQSSSYGQQSSFRQDHPSSMGVYGQESGGFSGPGENRSMSGPDNRGRGRGGFDRGGMSRGG RGGGRGGMGSAGERGGFNKPGGPMDEGPDLDLGPPVDPDEDSDNSAIYVQGLNDSVTLDDLADFFKQCGWKMNK RTGQPMIHIYLDKETGKPKGDATVSYEDPPTAKAAVEWFDGKDFQGSKLKVSLARKKPPMNSMRGGLPPREGRGM PPPLRGGPGGPGGPGGPMGRMGGRGGDRGGFPPRGPRGSRGNPSGGGNVQHRAGDWQCPNPGCGNQNFAWRTECN QCKAPKPEGFLPPPFPPPGGDRGRGGPGGMRGGRGGLMDRGGPGGMFRGGRGGDRGGFRGGRGMDRGGFGGGRRG GPGGPPGPLMEQAIAGAPGSAGSAAGSGEQKLISEEDLLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGA GAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANP PLDGAGAGAGAGAGAGGLATMDAQTRRRERRAEKQAQWKAANPPLESRGPV* (SEQ ID NO: 662)
SLP3 : : PylRS (AF)
DNA:
ATGGATTCTAGGGTGTCTTCACCTGAGAAGCAAGATAAAGAGAATTTCGTGGGTGTCAACAATAAACGGCTTGGT GTATGTGGCTGGATCCTGTTTTCCCTCTCTTTCCTGTTGGTGATCATTACCTTCCCCATCTCCATATGGATGTGC TTGAAGATCATTAAGGAGTATGAACGTGGCGCCGATTACAAGGACGATGATGACAAGGGAGCACCAGGAAGTGCT GGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGAT AAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAA CACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGC TCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAG GATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACC CGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCA CAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTG AGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACA AGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTG AATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGT AAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGC TTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGC ATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCA CCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCG TGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCA GGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTG GGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTG GGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTG CTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACT AACCTGTAA (SEQ ID NO: 663) Protein :
MDSRVSSPEKQDKENFVGWNKRLGVCGWILFSLSFLLVIITFPISIWMCLKIIKEYERGADYKDDDDKGAPGSA GSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSR SSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQA QPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLL NPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMG IDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGS GCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERL LKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 664)
TOM20 : : FUS : : MCP : : PylRS (AF)
DNA:
ATGGTGGGTCGGAACAGCGCCATCGCCGCCGGTGTATGCGGGGCCCTTTTCATTGGGTACTGCATCTACTTCGAC CGCAAAAGACGAAGTGACCCCAACTTCAAGAACAGGCTTCGAGAACGAAGAAAGAAACAGAAGCTTGCCAAGGAG AGAGCTGGGCTTTCCAAGTTACCTGACCTTAAAGATGCTGAAGCTGTTCAGAAATTCTTCATGGCCTCAAACGAT TATACCCAACAAGCAACCCAAAGCTATGGGGCCTACCCCACCCAGCCCGGGCAGGGCTATTCCCAGCAGAGCAGT CAGCCCTACGGACAGCAGAGTTACAGTGGTTATAGCCAGTCCACGGACACTTCAGGATATGGCCAGAGCAGCTAT TCTTCTTATGGCCAGAGCCAGAACACAGGCTATGGAACTCAGTCAACTCCCCAGGGATATGGCTCGACTGGCGGC TATGGCAGTAGCCAGAGCTCCCAATCGTCTTACGGGCAGCAGTCCTCCTACCCTGGCTATGGCCAGCAGCCAGCT CCCAGCAGCACCTCGGGAAGTTACGGTAGCAGTTCTCAGAGCAGCAGCTATGGGCAGCCCCAGAGTGGGAGCTAC AGCCAGCAGCCTAGCTATGGTGGACAGCAGCAAAGCTATGGACAGCAGCAAAGCTATAATCCCCCTCAGGGCTAT GGACAGCAGAACCAGTACAACAGCAGCAGTGGTGGTGGAGGTGGAGGTGGAGGTGGAGGTAACTATGGCCAAGAT CAATCCTCCATGAGTAGTGGTGGTGGCAGTGGTGGCGGTTATGGCAATCAAGACCAGAGTGGTGGAGGTGGCAGC GGTGGCTATGGACAGCAGGACCGTGGAGGCCGCGGCAGGGGTGGCAGTGGTGGCGGCGGCGGCGGCGGCGGTGGT GGTTACAACCGCAGCAGTGGTGGCTATGAACCCAGAGGTCGTGGAGGTGGCCGTGGAGGCAGAGGTGGCATGGGC GGAAGTGACCGTGGTGGCTTCAATAAATTTGGTGGCCCTCGGGACCAAGGATCACGTCATGACTCCGAACAGGAT AATTCAGACAACAACACCATCTTTGTGCAAGGCCTGGGTGAGAATGTTACAATTGAGTCTGTGGCTGATTACTTC AAGCAGATTGGTATTATTAAGACAAACAAGAAAACGGGACAGCCCATGATTAATTTGTACACAGACAGGGAAACT GGCAAGCTGAAGGGAGAGGCAACGGTCTCTTTTGATGACCCACCTTCAGCTAAAGCAGCTATTGACTGGTTTGAT GGTAAAGAATTCTCCGGAAATCCTATCAAGGTCTCATTTGCTACTCGCCGGGCAGACTTTAATCGGGGTGGTGGC AATGGTCGTGGAGGCCGAGGGCGAGGAGGACCCATGGGCCGTGGAGGCTATGGAGGTGGTGGCAGTGGTGGTGGT GGCCGAGGAGGATTTCCCAGTGGAGGTGGTGGCGGTGGAGGACAGCAGCGAGCTGGTGACTGGAAGTGTCCTAAT CCCACCTGTGAGAATATGAACTTCTCTTGGAGGAATGAATGCAACCAGTGTAAGGCCCCTAAACCAGATGGCCCA GGAGGGGGACCAGGTGGCTCTCACATGGGGGGTAACTACGGGGATGATCGTCGTGGTGGCAGAGGAGGCGCGATC GCATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCT AACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAAC GGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCT GCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTA ACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGA AACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACGGCGCCGATTACAAGGACGATGATGACAAGGGA GCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGC CTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACC ATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTT GTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGC CGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTC GTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACT GAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCT GTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAAT ACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGT CTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAA CTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAA CGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTAT ATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTG CGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATC TTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTT TGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATT GACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTG TCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTT GGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTAC AATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 665)
Protein :
MVGRNSAIAAGVCGALFIGYCIYFDRKRRSDPNFKNRLRERRKKQKLAKERAGLSKLPDLKDAEAVQKFFMASND YTQQATQSYGAYPTQPGQGYSQQSSQPYGQQSYSGYSQSTDTSGYGQSSYSSYGQSQNTGYGTQSTPQGYGSTGG YGSSQSSQSSYGQQSSYPGYGQQPAPSSTSGSYGSSSQSSSYGQPQSGSYSQQPSYGGQQQSYGQQQSYNPPQGY GQQNQYNSSSGGGGGGGGGGNYGQDQSSMSSGGGSGGGYGNQDQSGGGGSGGYGQQDRGGRGRGGSGGGGGGGGG GYNRSSGGYEPRGRGGGRGGRGGMGGSDRGGFNKFGGPRDQGSRHDSEQDNSDNNTIFVQGLGENVTIESVADYF KQIGIIKTNKKTGQPMINLYTDRETGKLKGEATVSFDDPPSAKAAIDWFDGKEFSGNPIKVSFATRRADFNRGGG NGRGGRGRGGPMGRGGYGGGGSGGGGRGGFPSGGGGGGGQQRAGDWKCPNPTCENMNFSWRNECNQCKAPKPDGP GGGPGGSHMGGNYGDDRRGGRGGAIAYPYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFAN GIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDG NPIPSAIAANSGIYGADYKDDDDKGAPGSAGSAAGSGACPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGT IHKIKHHEVSRSKIYIEMACGDHLWNNSRSSRTARALRHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKV VSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSISSISTGATASALVKGN TNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLE REITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKI FEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLEL SSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 666)
KIF16B : : lxLAF-1 : : PylRS (AF)
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA GACTACAAGGACGACGATGATAAGATGGAAAGCAACCAGAGCAACAACGGCGGCTCTGGCAACGCCGCTCTGAAC AGAGGCGGCAGATACGTGCCCCCCCACCTGAGAGGAGGCGACGGCGGCGCCGCCGCCGCTGCATCTGCCGGCGGA GATGACAGAAGAGGCGGAGCCGGAGGCGGCGGCTATAGACGGGGAGGCGGAAACAGCGGCGGCGGAGGCGGAGGC GGCTACGACAGAGGCTACAACGACAACCGGGACGACCGGGACAACAGAGGCGGCAGCGGCGGATACGGCAGAGAT CGAAACTACGAGGACAGAGGCTACAATGGCGGAGGCGGAGGCGGCGGCAACCGGGGCTACAACAACAACAGAGGA GGCGGCGGCGGCGGCTACAACCGCCAGGACAGAGGCGATGGCGGATCTAGCAATTTCAGCAGAGGCGGCTACAAC AACCGGGACGAGGGCAGCGACAACAGAGGCAGCGGAAGAAGCTACAACAATGACCGGAGAGATAATGGCGGAGAT GGCTCCGGAGCGTGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTG AATACCCTGATCTCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTT AGCCGTTCGAAAATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACA GCACGTGCACTGCGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAA TTCCTGACAAAAGCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAA GCAATGCCGAAATCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGA AGCAAATTCTCTCCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATT AGCAGTATTAGCACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCC CCGGTTCAAGCATCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGAC GAAATCAGCCTGAATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTG CAACAAATCTATGCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGAT
CGTGGCTTTCTGGAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGAT
ACCGAACTGAGCAAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCT
AACTATCTGCGCAAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAA
GAGTCCGACGGTAAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGT
GAGAACCTGGAAAGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGT
ATGGTGTTTGGCGACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCG
CTGGATCGTGAGTGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAA
CACGACTTCAAGAACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA
(SEQ ID NO: 667)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTDYKDDDDKMESNQSNNGGSGNAALNRGGRYVPPHLRGGDGGAAAAASAGG DDRRGGAGGGGYRRGGGNSGGGGGGGYDRGYNDNRDDRDNRGGSGGYGRDRNYEDRGYNGGGGGGGNRGYNNNRG GGGGGYNRQDRGDGGSSNFSRGGYNNRDEGSDNRGSGRSYNNDRRDNGGDGSGACPVPLQLPPLERLTLDDKKPL NTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARALRHHKYRKTCKRCRVSDEDLNK FLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFSPAI PVSTQESVSVPASVSTSI SSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISLNSGKPFRELESELLSRRKKDL QQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELSKQIFRVDKNFCLRPMLAPNLA NYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLESIITDFLNHLGIDFKIVGDSC MVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFKNIKRAARSESYYNGISTNL* (SEQ ID NO: 668)
KIF16B : :lxLAF-l: :MCP
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA GACTACAAGGACGACGATGATAAGATGGAAAGCAACCAGAGCAACAACGGCGGCTCTGGCAACGCCGCTCTGAAC AGAGGCGGCAGATACGTGCCCCCCCACCTGAGAGGAGGCGACGGCGGCGCCGCCGCCGCTGCATCTGCCGGCGGA GATGACAGAAGAGGCGGAGCCGGAGGCGGCGGCTATAGACGGGGAGGCGGAAACAGCGGCGGCGGAGGCGGAGGC GGCTACGACAGAGGCTACAACGACAACCGGGACGACCGGGACAACAGAGGCGGCAGCGGCGGATACGGCAGAGAT CGAAACTACGAGGACAGAGGCTACAATGGCGGAGGCGGAGGCGGCGGCAACCGGGGCTACAACAACAACAGAGGA GGCGGCGGCGGCGGCTACAACCGCCAGGACAGAGGCGATGGCGGATCTAGCAATTTCAGCAGAGGCGGCTACAAC AACCGGGACGAGGGCAGCGACAACAGAGGCAGCGGAAGAAGCTACAACAATGACCGGAGAGATAATGGCGGAGAT GGCTCCGGATATCCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGA GCTTCTAACTTTACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTC GCTAACGGGATCGCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAG AGCTCTGCGCAGAATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATG GAACTAACCATTCCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAA GATGGAAACCCGATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 669)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTDYKDDDDKMESNQSNNGGSGNAALNRGGRYVPPHLRGGDGGAAAAASAGG DDRRGGAGGGGYRRGGGNSGGGGGGGYDRGYNDNRDDRDNRGGSGGYGRDRNYEDRGYNGGGGGGGNRGYNNNRG GGGGGYNRQDRGDGGSSNFSRGGYNNRDEGSDNRGSGRSYNNDRRDNGGDGSGYPYDVPDYAGAPGSAGSAAGSG ASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQNRKYTIKVEVPKGAWRSYLNM ELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY* (SEQ ID NO: 670)
KIF16B : :lxLAF-l: : 2xPCP
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT
ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA
GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA
GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA
TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC
TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC
TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT
GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA
CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC
ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG
GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT
AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT
GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA
AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC
TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG
CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA
GACTACAAGGACGACGATGATAAGATGGAAAGCAACCAGAGCAACAACGGCGGCTCTGGCAACGCCGCTCTGAAC
AGAGGCGGCAGATACGTGCCCCCCCACCTGAGAGGAGGCGACGGCGGCGCCGCCGCCGCTGCATCTGCCGGCGGA
GATGACAGAAGAGGCGGAGCCGGAGGCGGCGGCTATAGACGGGGAGGCGGAAACAGCGGCGGCGGAGGCGGAGGC
GGCTACGACAGAGGCTACAACGACAACCGGGACGACCGGGACAACAGAGGCGGCAGCGGCGGATACGGCAGAGAT
CGAAACTACGAGGACAGAGGCTACAATGGCGGAGGCGGAGGCGGCGGCAACCGGGGCTACAACAACAACAGAGGA
GGCGGCGGCGGCGGCTACAACCGCCAGGACAGAGGCGATGGCGGATCTAGCAATTTCAGCAGAGGCGGCTACAAC
AACCGGGACGAGGGCAGCGACAACAGAGGCAGCGGAAGAAGCTACAACAATGACCGGAGAGATAATGGCGGAGAT
GGCTCCGGCGAGCAGAAGCTGATCTCAGAGGAGGACCTGATCGAAGGCCGCCATATGCTAGCCTCCAAAACCATC
GTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATCTTCGAAGAG
AAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACCGCGTATCGC
GTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAGGTATGGTCG
CACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAGTCCCTCGTC
GCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTGCGGATCCGCTAGCCTCCAAAACC
ATCGTTCTTTCGGTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATCTTCGAA
GAGAAGGTCGGGCCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACCGCGTAT
CGCGTCAACCTAAAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAGGTATGG
TCGCACGACGTGACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAGTCCCTC
GTCGCGACCTCGCAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTCCACCGGTCGCCACCTAA
(SEQ ID NO: 671)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ
EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY
LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTDYKDDDDKMESNQSNNGGSGNAALNRGGRYVPPHLRGGDGGAAAAASAGG DDRRGGAGGGGYRRGGGNSGGGGGGGYDRGYNDNRDDRDNRGGSGGYGRDRNYEDRGYNGGGGGGGNRGYNNNRG GGGGGYNRQDRGDGGSSNFSRGGYNNRDEGSDNRGSGRSYNNDRRDNGGDGSGEQKLISEEDLIEGRHMLASKTI VLSVGEATRTLTEIQSTADRQIFEEKVGPLVGRLRLTASLRQNGAKTAYRVNLKLDQADWDSGLPKVRYTQVWS HDVTIVANSTEASRKSLYDLTKSLVATSQVEDLWNLVPLGRADPLASKTIVLSVGEATRTLTEIQSTADRQIFE EKVGPLVGRLRLTASLRQNGAKTAYRVNLKLDQADWDSGLPKVRYTQVWSHDVTIVANSTEASRKSLYDLTKSL VATSQVEDLWNLVPLGRPPVAT* (SEQ ID NO: 672)
KIF16B : :2xLAF-l: : 2xPCP
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA GACTACAAGGACGACGATGATAAGATGGAAAGCAACCAGAGCAACAACGGCGGCTCTGGCAACGCCGCTCTGAAC AGAGGCGGCAGATACGTGCCCCCCCACCTGAGAGGAGGCGACGGCGGCGCCGCCGCCGCTGCATCTGCCGGCGGA GATGACAGAAGAGGCGGAGCCGGAGGCGGCGGCTATAGACGGGGAGGCGGAAACAGCGGCGGCGGAGGCGGAGGC GGCTACGACAGAGGCTACAACGACAACCGGGACGACCGGGACAACAGAGGCGGCAGCGGCGGATACGGCAGAGAT CGAAACTACGAGGACAGAGGCTACAATGGCGGAGGCGGAGGCGGCGGCAACCGGGGCTACAACAACAACAGAGGA GGCGGCGGCGGCGGCTACAACCGCCAGGACAGAGGCGATGGCGGATCTAGCAATTTCAGCAGAGGCGGCTACAAC AACCGGGACGAGGGCAGCGACAACAGAGGCAGCGGAAGAAGCTACAACAATGACCGGAGAGATAATGGCGGAGAT GGCTCCGGCGGAATGGAAAGCAACCAGAGCAACAACGGCGGCTCTGGCAACGCCGCTCTGAACAGAGGCGGCAGA TACGTGCCCCCCCACCTGAGAGGAGGCGACGGCGGCGCCGCCGCCGCTGCATCTGCCGGCGGAGATGACAGAAGA GGCGGAGCCGGAGGCGGCGGCTATAGACGGGGAGGCGGAAACAGCGGCGGCGGAGGCGGAGGCGGCTACGACAGA GGCTACAACGACAACCGGGACGACCGGGACAACAGAGGCGGCAGCGGCGGATACGGCAGAGATCGAAACTACGAG GACAGAGGCTACAATGGCGGAGGCGGAGGCGGCGGCAACCGGGGCTACAACAACAACAGAGGAGGCGGCGGCGGC GGCTACAACCGCCAGGACAGAGGCGATGGCGGATCTAGCAATTTCAGCAGAGGCGGCTACAACAACCGGGACGAG GGCAGCGACAACAGAGGCAGCGGAAGAAGCTACAACAATGACCGGAGAGATAATGGCGGAGATGGCTCCGGCGAG CAGAAGCTGATCTCAGAGGAGGACCTGATCGAAGGCCGCCATATGCTAGCCTCCAAAACCATCGTTCTTTCGGTC GGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATCTTCGAAGAGAAGGTCGGGCCT CTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACCGCGTATCGCGTCAACCTAAAA CTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAGGTATGGTCGCACGACGTGACA ATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAGTCCCTCGTCGCGACCTCGCAG GTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTGCGGATCCGCTAGCCTCCAAAACCATCGTTCTTTCG GTCGGCGAGGCTACTCGCACTCTGACTGAGATCCAGTCCACCGCAGACCGTCAGATCTTCGAAGAGAAGGTCGGG CCTCTGGTGGGTCGGCTGCGCCTCACGGCTTCGCTCCGTCAAAACGGAGCCAAGACCGCGTATCGCGTCAACCTA AAACTGGATCAGGCGGACGTCGTTGATTCCGGACTTCCGAAAGTGCGCTACACTCAGGTATGGTCGCACGACGTG ACAATCGTTGCGAATAGCACCGAGGCCTCGCGCAAATCGTTGTACGATTTGACCAAGTCCCTCGTCGCGACCTCG CAGGTCGAAGATCTTGTCGTCAACCTTGTGCCGCTGGGCCGTCCACCGGTCGCCACCTAA (SEQ ID NO: 673)
Protein : MASVKVAVRVRPMNRREKDLEAKFI I QMEKS KTT I TNLKI PEGGTGDS GRERTKT FTYDFS FYSADTKS PDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGS GKSYTMMGNS GDS GLI PRI CEGLFS RINETTRWDEAS FRTEVSY LEI YNERVRDLLRRKS S KT FNLRVREHPKEGPYVEDLS KHLVQNYGDVEELMDAGNINRTTAATGMNDVS S RSHA I FT I KFTQAKFDS EMPCETVS KI HLVDLAGS ERADATGATGVRLKEGGNINKS LVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDS LGGNS KT IMIAT I S PADVNYGETLSTLRYANRAKNI INKPT INEDANVKLI RE LRAEIARLKTLLAQGNQIALLDS PTDYKDDDDKMESNQSNNGGS GNAALNRGGRYVP PHLRGGDGGAAAAASAGG DDRRGGAGGGGYRRGGGNS GGGGGGGYDRGYNDNRDDRDNRGGS GGYGRDRNYEDRGYNGGGGGGGNRGYNNNRG GGGGGYNRQDRGDGGS SNFS RGGYNNRDEGS DNRGS GRSYNNDRRDNGGDGS GGMESNQSNNGGS GNAALNRGGR YVP PHLRGGDGGAAAAASAGGDDRRGGAGGGGYRRGGGNS GGGGGGGYDRGYNDNRDDRDNRGGS GGYGRDRNYE DRGYNGGGGGGGNRGYNNNRGGGGGGYNRQDRGDGGS SNFS RGGYNNRDEGS DNRGS GRSYNNDRRDNGGDGS GE QKLI S EEDLI EGRHMLAS KT IVLSVGEATRTLTEI QSTADRQI FEEKVGPLVGRLRLTAS LRQNGAKTAYRWLK LDQADWDS GLPKVRYTQVWSHDVT IVANSTEAS RKS LYDLTKS LVAT SQVEDLWNLVPLGRADPLAS KT IVLS VGEATRTLTEI QSTADRQI FEEKVGPLVGRLRLTAS LRQNGAKTAYRWLKLDQADWDS GLPKVRYTQVWSHDV T IVANSTEAS RKS LYDLTKS LVAT SQVEDLWNLVPLGRP PVAT * ( S EQ I D NO : 67 4 )
KIF16B : : 2xLAF-l : : PylRS (AF)
DNA :
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT AT T C AGAT G GAGAAAAG C AAAAC GAC AAT C AC AAAC T T AAAGAT AC C AGAAG GAG G C AC T G G G GAC T C AG GAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAG CAT C C C AAAGAAG G C C C T TAT GT T GAG GAT T TAT C C AAAC AT T T AGT AC AGAAT TAT G GT GAC GT AGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TAT G C AAAT AGAG C C AAAAAC AT CAT C AAC AAG C C T AC CAT T AAT GAG GAT G C C AAC GT C AAAC T TAT C C GT GAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA GAC T AC AAG GAC GAC GAT GAT AAGAT G GAAAG C AAC C AGAG C AAC AAC GGCGGCTCTGG C AAC GCCGCTCT GAAC AGAGGCGGCAGATACGTGCCCCCCCACCTGAGAGGAGGCGACGGCGGCGCCGCCGCCGCTGCATCTGCCGGCGGA GATGACAGAAGAGGCGGAGCCGGAGGCGGCGGCTATAGACGGGGAGGCGGAAACAGCGGCGGCGGAGGCGGAGGC G G C T AC GAC AGAG G C T AC AAC GAC AAC C G G GAC GAC C G G GAC AAC AGAG G C G G C AG C G G C G GAT AC G G C AGAGAT C GAAAC T AC GAG GAC AGAG G C T AC AAT G G C G GAG G C G GAG G C G G C G G C AAC C G G G G C T AC AAC AAC AAC AGAG GA GGCGGCGGCGGCGGCTACAACCGCCAGGACAGAGGCGATGGCGGATCTAGCAATTTCAGCAGAGGCGGCTACAAC AAC C G G GAC GAG G G C AG C GAC AAC AGAG G C AG C G GAAGAAG C T AC AAC AAT GAC C G GAGAGAT AAT G G C G GAGAT GGCTCCGGCGGAATGGAAAGCAACCAGAGCAACAACGGCGGCTCTGGCAACGCCGCTCTGAACAGAGGCGGCAGA TACGTGCCCCCCCACCTGAGAGGAGGCGACGGCGGCGCCGCCGCCGCTGCATCTGCCGGCGGAGATGACAGAAGA GGCGGAGCCGGAGGCGGCGGCTATAGACGGGGAGGCGGAAACAGCGGCGGCGGAGGCGGAGGCGGCTACGACAGA G G C T AC AAC GAC AAC C G G GAC GAC C G G GAC AAC AGAG G C G G C AG C G G C G GAT AC G G C AGAGAT C GAAAC T AC GAG GACAGAGGCTACAATGGCGGAGGCGGAGGCGGCGGCAACCGGGGCTACAACAACAACAGAGGAGGCGGCGGCGGC G G C T AC AAC C G C C AG GAC AGAG G C GAT G G C G GAT C T AG C AAT T T C AG C AGAG G C G G C T AC AAC AAC C G G GAC GAG G G C AG C GAC AAC AGAG G C AG C G GAAGAAG C T AC AAC AAT GAC C G GAGAGAT AAT G G C G GAGAT G G C T C C G GAG C G TGCCCGGTGCCGCTGCAGCTGCCGCCGCTGGAACGCCTGACCCTGGATGATAAAAAACCGCTGAATACCCTGATC TCTGCTACTGGTCTGTGGATGAGTCGTACCGGAACCATTCATAAAATCAAACACCACGAGGTTAGCCGTTCGAAA ATCTATATTGAGATGGCGTGTGGCGATCATCTGGTTGTGAACAATAGCCGCTCTTCTCGTACAGCACGTGCACTG CGTCACCACAAATATCGTAAAACCTGTAAACGTTGCCGTGTGTCCGATGAGGATCTGAACAAATTCCTGACAAAA GCCAATGAGGACCAAACAAGCGTGAAAGTGAAAGTCGTTAGCGCTCCTACCCGTACTAAAAAAGCAATGCCGAAA TCCGTTGCTCGTGCCCCTAAACCACTGGAAAACACTGAAGCAGCACAGGCACAGCCGTCTGGAAGCAAATTCTCT CCGGCCATTCCTGTTTCTACCCAGGAGTCCGTTTCTGTTCCAGCAAGTGTGAGCACCAGCATTAGCAGTATTAGC ACCGGTGCCACCGCTAGCGCCCTGGTTAAAGGCAATACCAATCCGATTACAAGCATGTCTGCCCCGGTTCAAGCA TCAGCTCCAGCACTGACAAAATCCCAAACCGATCGTCTGGAGGTTCTGCTGAATCCGAAAGACGAAATCAGCCTG AATTCCGGCAAACCGTTTCGTGAACTGGAGAGCGAACTGCTGTCACGTCGTAAAAAAGACCTGCAACAAATCTAT GCCGAAGAACGTGAGAACTATCTGGGGAAACTGGAACGTGAAATCACCCGCTTTTTCGTGGATCGTGGCTTTCTG GAGATCAAATCCCCGATTCTGATTCCTCTGGAGTATATCGAGCGTATGGGCATCGACAATGATACCGAACTGAGC AAACAAATTTTCCGTGTGGATAAAAACTTCTGTCTGCGCCCTATGCTAGCACCAAATCTGGCTAACTATCTGCGC AAACTGGACCGTGCCCTGCCTGATCCTATCAAAATCTTCGAGATCGGCCCGTGTTATCGTAAAGAGTCCGACGGT AAAGAACATCTGGAGGAGTTTACCATGCTGAACTTTTGCCAAATGGGTTCAGGTTGTACTCGTGAGAACCTGGAA AGCATCATCACCGATTTTCTGAACCACCTGGGCATTGACTTCAAAATTGTGGGCGACAGCTGTATGGTGTTTGGC GACACCCTGGATGTCATGCACGGCGACCTGGAACTGTCTAGTGCCGTTGTGGGCCCAATCCCGCTGGATCGTGAG TGGGGTATCGACAAACCTTGGATCGGTGCGGGTTTTGGTCTGGAGCGTCTGCTGAAAGTAAAACACGACTTCAAG AACATCAAACGTGCTGCACGTTCCGAGTCCTATTACAATGGTATTTCTACTAACCTGTAA (SEQ ID NO: 675)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTDYKDDDDKMESNQSNNGGSGNAALNRGGRYVPPHLRGGDGGAAAAASAGG DDRRGGAGGGGYRRGGGNSGGGGGGGYDRGYNDNRDDRDNRGGSGGYGRDRNYEDRGYNGGGGGGGNRGYNNNRG GGGGGYNRQDRGDGGSSNFSRGGYNNRDEGSDNRGSGRSYNNDRRDNGGDGSGGMESNQSNNGGSGNAALNRGGR YVPPHLRGGDGGAAAAASAGGDDRRGGAGGGGYRRGGGNSGGGGGGGYDRGYNDNRDDRDNRGGSGGYGRDRNYE DRGYNGGGGGGGNRGYNNNRGGGGGGYNRQDRGDGGSSNFSRGGYNNRDEGSDNRGSGRSYNNDRRDNGGDGSGA CPVPLQLPPLERLTLDDKKPLNTLISATGLWMSRTGTIHKIKHHEVSRSKIYIEMACGDHLVWNSRSSRTARAL RHHKYRKTCKRCRVSDEDLNKFLTKANEDQTSVKVKWSAPTRTKKAMPKSVARAPKPLENTEAAQAQPSGSKFS PAI PVSTQESVSVPASVSTSISSISTGATASALVKGNTNPITSMSAPVQASAPALTKSQTDRLEVLLNPKDEISL NSGKPFRELESELLSRRKKDLQQIYAEERENYLGKLEREITRFFVDRGFLEIKSPILIPLEYIERMGIDNDTELS KQIFRVDKNFCLRPMLAPNLANYLRKLDRALPDPIKIFEIGPCYRKESDGKEHLEEFTMLNFCQMGSGCTRENLE SIITDFLNHLGIDFKIVGDSCMVFGDTLDVMHGDLELSSAWGPIPLDREWGIDKPWIGAGFGLERLLKVKHDFK NIKRAARSESYYNGISTNL* (SEQ ID NO: 676)
KIF16B : :2xLAF-l: :MCP
DNA:
ATGGCATCGGTCAAGGTGGCCGTGAGGGTCCGGCCCATGAATCGCAGGGAAAAGGACTTGGAGGCCAAGTTCATT ATTCAGATGGAGAAAAGCAAAACGACAATCACAAACTTAAAGATACCAGAAGGAGGCACTGGGGACTCAGGAAGA GAACGGACCAAGACCTTCACCTATGACTTTTCTTTTTATTCTGCTGATACAAAAAGCCCAGATTACGTTTCACAA GAAATGGTTTTCAAAACCCTCGGCACAGATGTCGTGAAGTCTGCATTTGAAGGTTATAATGCTTGTGTCTTTGCA TATGGGCAAACTGGATCTGGAAAGTCATACACTATGATGGGAAATTCTGGAGATTCTGGCTTAATACCTCGGATC TGTGAAGGACTCTTCAGTCGGATAAATGAAACCACCAGATGGGATGAAGCTTCTTTTCGAACTGAAGTCAGCTAC TTAGAAATTTATAACGAACGTGTGAGAGATCTACTTCGGCGGAAGTCATCTAAAACCTTCAATTTGAGAGTCCGT GAGCATCCCAAAGAAGGCCCTTATGTTGAGGATTTATCCAAACATTTAGTACAGAATTATGGTGACGTAGAAGAA CTTATGGATGCGGGCAATATCAACCGGACCACCGCAGCGACTGGGATGAACGACGTCAGTAGCAGGTCTCATGCC ATCTTCACCATCAAGTTCACTCAGGCTAAATTTGATTCTGAAATGCCATGTGAAACCGTCAGTAAGATCCACTTG GTTGATCTTGCCGGAAGTGAGCGTGCAGATGCCACCGGAGCCACCGGGGTTAGGCTAAAGGAAGGGGGAAATATT AACAAGTCCCTCGTGACTCTGGGGAACGTCATTTCTGCCTTAGCTGATTTATCTCAGGATGCTGCAAATACTCTT GCAAAGAAGAAGCAAGTTTTCGTGCCTTACAGGGATTCTGTGTTGACTTGGTTGTTAAAAGATAGCCTTGGAGGA AACTCTAAAACTATCATGATTGCCACCATTTCACCTGCTGATGTCAATTATGGAGAAACCCTAAGTACTCTTCGC TATGCAAATAGAGCCAAAAACATCATCAACAAGCCTACCATTAATGAGGATGCCAACGTCAAACTTATCCGTGAG CTGCGAGCTGAAATAGCCAGACTGAAAACGCTGCTTGCTCAAGGGAATCAGATTGCCCTCTTAGACTCCCCCACA GACTACAAGGACGACGATGATAAGATGGAAAGCAACCAGAGCAACAACGGCGGCTCTGGCAACGCCGCTCTGAAC AGAGGCGGCAGATACGTGCCCCCCCACCTGAGAGGAGGCGACGGCGGCGCCGCCGCCGCTGCATCTGCCGGCGGA GATGACAGAAGAGGCGGAGCCGGAGGCGGCGGCTATAGACGGGGAGGCGGAAACAGCGGCGGCGGAGGCGGAGGC GGCTACGACAGAGGCTACAACGACAACCGGGACGACCGGGACAACAGAGGCGGCAGCGGCGGATACGGCAGAGAT CGAAACTACGAGGACAGAGGCTACAATGGCGGAGGCGGAGGCGGCGGCAACCGGGGCTACAACAACAACAGAGGA GGCGGCGGCGGCGGCTACAACCGCCAGGACAGAGGCGATGGCGGATCTAGCAATTTCAGCAGAGGCGGCTACAAC AACCGGGACGAGGGCAGCGACAACAGAGGCAGCGGAAGAAGCTACAACAATGACCGGAGAGATAATGGCGGAGAT GGCTCCGGCGGAATGGAAAGCAACCAGAGCAACAACGGCGGCTCTGGCAACGCCGCTCTGAACAGAGGCGGCAGA TACGTGCCCCCCCACCTGAGAGGAGGCGACGGCGGCGCCGCCGCCGCTGCATCTGCCGGCGGAGATGACAGAAGA GGCGGAGCCGGAGGCGGCGGCTATAGACGGGGAGGCGGAAACAGCGGCGGCGGAGGCGGAGGCGGCTACGACAGA GGCTACAACGACAACCGGGACGACCGGGACAACAGAGGCGGCAGCGGCGGATACGGCAGAGATCGAAACTACGAG GACAGAGGCTACAATGGCGGAGGCGGAGGCGGCGGCAACCGGGGCTACAACAACAACAGAGGAGGCGGCGGCGGC GGCTACAACCGCCAGGACAGAGGCGATGGCGGATCTAGCAATTTCAGCAGAGGCGGCTACAACAACCGGGACGAG GGCAGCGACAACAGAGGCAGCGGAAGAAGCTACAACAATGACCGGAGAGATAATGGCGGAGATGGCTCCGGATAT CCCTATGATGTGCCGGATTATGCTGGAGCACCAGGAAGTGCTGGTTCTGCTGCTGGTAGTGGAGCTTCTAACTTT ACTCAGTTCGTTCTCGTCGACAATGGCGGAACTGGCGACGTGACTGTCGCCCCAAGCAACTTCGCTAACGGGATC GCTGAATGGATCAGCTCTAACTCGCGTTCACAGGCTTACAAAGTAACCTGTAGCGTTCGTCAGAGCTCTGCGCAG AATCGCAAATACACCATCAAAGTCGAGGTGCCTAAAGGCGCCTGGCGTTCGTACTTAAATATGGAACTAACCATT CCAATTTTCGCCACGAATTCCGACTGCGAGCTTATTGTTAAGGCAATGCAAGGTCTCCTAAAAGATGGAAACCCG ATTCCCTCAGCAATCGCAGCAAACTCCGGCATCTACTAA (SEQ ID NO: 677)
Protein :
MASVKVAVRVRPMNRREKDLEAKFIIQMEKSKTTITNLKIPEGGTGDSGRERTKTFTYDFSFYSADTKSPDYVSQ EMVFKTLGTDWKSAFEGYNACVFAYGQTGSGKSYTMMGNSGDSGLIPRICEGLFSRINETTRWDEASFRTEVSY LEIYNERVRDLLRRKSSKTFNLRVREHPKEGPYVEDLSKHLVQNYGDVEELMDAGNINRTTAATGMNDVSSRSHA IFTIKFTQAKFDSEMPCETVSKIHLVDLAGSERADATGATGVRLKEGGNINKSLVTLGNVI SALADLSQDAANTL AKKKQVFVPYRDSVLTWLLKDSLGGNSKTIMIATISPADVNYGETLSTLRYANRAKNIINKPTINEDANVKLIRE LRAEIARLKTLLAQGNQIALLDSPTDYKDDDDKMESNQSNNGGSGNAALNRGGRYVPPHLRGGDGGAAAAASAGG DDRRGGAGGGGYRRGGGNSGGGGGGGYDRGYNDNRDDRDNRGGSGGYGRDRNYEDRGYNGGGGGGGNRGYNNNRG GGGGGYNRQDRGDGGSSNFSRGGYNNRDEGSDNRGSGRSYNNDRRDNGGDGSGGMESNQSNNGGSGNAALNRGGR YVPPHLRGGDGGAAAAASAGGDDRRGGAGGGGYRRGGGNSGGGGGGGYDRGYNDNRDDRDNRGGSGGYGRDRNYE DRGYNGGGGGGGNRGYNNNRGGGGGGYNRQDRGDGGSSNFSRGGYNNRDEGSDNRGSGRSYNNDRRDNGGDGSGY PYDVPDYAGAPGSAGSAAGSGASNFTQFVLVDNGGTGDVTVAPSNFANGIAEWISSNSRSQAYKVTCSVRQSSAQ NRKYTIKVEVPKGAWRSYLNMELTIPIFATNSDCELIVKAMQGLLKDGNPIPSAIAANSGIY* (SEQ ID NO: 678)
3. Epl tope tags :
VSV-G: Vesicular stomatitis virus glycoprotein epitope tag
DNA:
TATACAGATATTGAAATGAACAGATTGGGAAAG (SEQ ID NO: 679)
Protein :
YTDIEMNRLGK (SEQ ID NO: 680)
HA: Human influence hemagglutinin epitope tag
DNA:
TACCCCTACGACGTGCCCGACTACGCC (SEQ ID NO: 681)
Protein :
YPYDVPDYA (SEQ ID NO: 682)
Myc : Human c-Myc proto-oncogene epitope tag
DNA: GAGCAGAAGCTGATCTCAGAGGAGGACCTG (SEQ ID NO: 683) Protein :
EQKLISEEDL (SEQ ID NO: 684)

Claims

1. An assembler fusion protein (AFP) comprising:
(a) at least one first polypeptide segment acting as assembler (AP) that is selected from:
(a1) a polypeptide segment derived from an intracellular targeting polypeptide (IC-TP segment), wherein said intracellular targeting polypeptide targets, and thus becomes locally enriched at, an intracellular structural element within or directly adjacent to the cytoplasm; and
(a2) a polypeptide segment derived from a phase separation polypeptide (PSP segment), wherein said phase separation polypeptide has the ability to un dergo self-association in the cytoplasm of a cell so as to create sites of high local concentration in the cytoplasm, and
(b) at least one second polypeptide segment acting as an effector (EP) that is se lected from:
b1) an RNA-targeting polypeptide (RNA-TP) segment, and
b2) an orthogonal aminoacyl tRNA synthetase (O-RS) segment;
wherein said polypeptide segments are functionally linked in said AFP.
2. An assembler fusion protein (AFP) combination comprising at least two AFPs of claim 1.
3. A fusion protein (RNA-TP/O-RS fusion protein) comprising:
(i) at least one RNA-targeting polypeptide (RNA-TP) segment; and
(ii) at least one orthogonal aminoacyl tRNA synthetase (O-RS) segment,
wherein said polypeptide segments are functionally linked in said RNA-TP/O-RS fusion protein.
4. A nucleic acid molecule, or a combination of two or more nucleic acid molecules, com prising:
(i) a nucleotide sequence that encodes at least one AFP of claim 1 , or at least one AFP combination of claim 2, or
(ii) a nucleic acid sequence complementary to the nucleotide sequence of (i). (iii) both of (i) and (ii).
5. A nucleic acid molecule, or a combination of two or more nucleic acid molecules, com prising:
(i) a nucleotide sequence that encodes at least one RNA-TP/O-RS fusion protein of claim 3, or
(ii) a nucleic acid sequence complementary to (i), or
(iii) both of (i) and (ii).
6. An expression cassette comprising the nucleotide sequence of the nucleic acid mole cule, or the combination of nucleic acid molecules, of claim 4 or claim 5.
7. An expression vector comprising at least one expression cassette of claim 6.
8. A cell comprising at least one nucleic acid molecule, or combination of nucleic acid molecules, of claim 4 or claim 5, at least one expression cassette of claim 6, or at least one expression vector of claim 7.
9. The cell of claim 8 comprising a nucleotide sequence that encodes, or is complemen tary to a nucleotide sequence encoding, at least one AFP of claim 1 comprising (i) at least one EP selected from RNA-TP segments and (ii) at least one EP selected from O- RS segments.
10. The cell of claim 8 comprising a nucleotide sequence that encodes, or is complemen tary to a nucleotide sequence encoding, a combination of at least two AFPs of claim 1 , wherein one of the at least two AFPs comprises at least one RNA-TP segment and an other one of the at least two AFPs comprises at least one O-RS segment.
11. The cell of claim 8 comprising a nucleotide sequence that encodes, or is complemen tary to a nucleotide sequence encoding at least one RNA-TP/O-RS fusion protein of claim 3.
12. A method for preparing a polypeptide of interest (POI) comprising in its amino acid se quence one or more non-canonical amino acid (ncAA) residues, wherein the method comprises expressing the POI in a cell of claim 9 or claim 10 in the presence of said one or more ncAAs, wherein the cell comprises:
(i) a POI-encoding nucleotide sequence (CSP01) wherein said one or more ncAA resi dues of the POI are encoded by selector codon(s),
(ii) a targeting nucleotide sequence (TN) that is functionally linked to the CSP01 and is able to interact with an RNA-TP segment of at least one of the AFPs in the cell;
(iii) one or more orthogonal tRNAncAA (0-tRNAncAA) molecules which carry the antico don^) complementary to the selector codon(s) of the CSP01, and wherein said O- tRNAncAA molecules together with one or more O-RS segments of at least one of the AFPs in the cell form one or more orthogonal 0-RS/0-tRNAncAA pairs which al low for the introduction of said one or more ncAA residues into the amino acid se quence of the POI;
and wherein the method optionally further comprises recovering the expressed POI.
13. A method for preparing a polypeptide of interest (POI) comprising in its amino acid se quence one or more non-canonical amino acid (ncAA) residues, wherein the method comprises expressing the POI in a cell of claim 11 in the presence of said one or more ncAAs, wherein the cell comprises:
(iv) a POI-encoding nucleotide sequence (CSP01) wherein said one or more ncAA resi dues of the POI are encoded by selector codon(s),
(v) a targeting nucleotide sequence (TN) that is functionally linked to the CSP01 and is able to interact with an RNA-TP segment of at least one of the RNA-TP/O-RS fu sion proteins in the cell;
(vi) one or more orthogonal tRNAncAA (0-tRNAncAA) molecules which carry the antico don^) complementary to the selector codon(s) of the CSP01, and wherein said O- tRNAncAA molecules together with one or more O-RS segments of the RNA-TP/O- RS fusion proteins in the cell form one or more orthogonal 0-RS/0-tRNAncAA pairs which allow for the introduction of said one or more ncAA residues into the amino acid sequence of the POI;
and wherein the method optionally further comprises recovering the expressed POI.
14. A nucleic acid molecule comprising:
(i) a nucleotide sequence (CSP01) that encodes a polypeptide of interest (POI), said POI comprising one or more non-canonical amino acid (ncAA) residues which are encoded in the CSP01 by selector codons, and (ii) a targeting nucleotide sequence (TN), wherein an RNA molecule comprising said TN is able to interact via said TN with an RNA-targeting polypeptide (RNA-TP).
15. A kit for preparing a polypeptide of interest (POI) having at least one non-canonical amino acid (ncAA) residue, the kit comprising:
at least one ncAA, or salt thereof, corresponding to the at least one ncAA residue of the POI, and
at least one expression vector of claim 7.
PCT/EP2020/053883 2019-02-14 2020-02-14 Means and methods for preparing engineered target proteins by genetic code expansion in a target protein-selective manner WO2020165408A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
EP20703782.1A EP3924365A1 (en) 2019-02-14 2020-02-14 Means and methods for preparing engineered target proteins by genetic code expansion in a target protein-selective manner
US17/426,338 US20230098002A1 (en) 2019-02-14 2020-02-14 Means and methods for preparing engineered target proteins by genetic code expansion in a target protein-selective manner
CN202080028507.1A CN113727993A (en) 2019-02-14 2020-02-14 Means and methods for producing engineered target proteins in a target protein selective manner by genetic codon expansion
JP2021545719A JP2022521049A (en) 2019-02-14 2020-02-14 Means and Methods for Preparing Target Proteins Manipulated by Genetic Code Expansion in a Selective Mode of Target Proteins
CA3129336A CA3129336A1 (en) 2019-02-14 2020-02-14 Means and methods for preparing engineered target proteins by genetic code expansion in a target protein-selective manner
IL285405A IL285405A (en) 2019-02-14 2021-08-05 Means and methods for preparing engineered target proteins by genetic code expansion in a target protein-selective manner

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP19157257.7 2019-02-14
EP19157257.7A EP3696189A1 (en) 2019-02-14 2019-02-14 Means and methods for preparing engineered target proteins by genetic code expansion in a target protein selective manner

Publications (1)

Publication Number Publication Date
WO2020165408A1 true WO2020165408A1 (en) 2020-08-20

Family

ID=65685108

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2020/053883 WO2020165408A1 (en) 2019-02-14 2020-02-14 Means and methods for preparing engineered target proteins by genetic code expansion in a target protein-selective manner

Country Status (8)

Country Link
US (1) US20230098002A1 (en)
EP (2) EP3696189A1 (en)
JP (1) JP2022521049A (en)
CN (1) CN113727993A (en)
CA (1) CA3129336A1 (en)
IL (1) IL285405A (en)
MA (1) MA54934A (en)
WO (1) WO2020165408A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4186529A1 (en) 2021-11-25 2023-05-31 Veraxa Biotech GmbH Improved antibody-payload conjugates (apcs) prepared by site-specific conjugation utilizing genetic code expansion
WO2023094525A1 (en) 2021-11-25 2023-06-01 Veraxa Biotech Gmbh Improved antibody-payload conjugates (apcs) prepared by site-specific conjugation utilizing genetic code expansion

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111304234A (en) * 2020-02-27 2020-06-19 江南大学 Unnatural amino acid utilization tool suitable for bacillus subtilis
CN115896144B (en) * 2022-10-17 2024-01-02 湖南诺合新生物科技有限公司 Application of FUS protein as fusion tag, recombinant protein and expression method thereof

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002006075A1 (en) 2000-07-17 2002-01-24 Magnetar Technologies Ltd Eddy current brake system with dual use conductor fin
WO2002085923A2 (en) 2001-04-19 2002-10-31 The Scripps Research Institute In vivo incorporation of unnatural amino acids
EP2192185A1 (en) 2007-09-20 2010-06-02 Riken Mutant pyrrolysyl-trna synthetase, and method for production of protein having non-natural amino acid integrated therein by using the same
WO2012104422A1 (en) 2011-02-03 2012-08-09 Embl Unnatural amino acids comprising a cyclooctynyl or trans-cyclooctenyl analog group and uses thereof
WO2015107064A1 (en) 2014-01-14 2015-07-23 European Molecular Biology Laboratory Multiple cycloaddition reactions for labeling of molecules
WO2018069481A1 (en) 2016-10-14 2018-04-19 European Molecular Biology Laboratory Archaeal pyrrolysyl trna synthetases for orthogonal use

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002006075A1 (en) 2000-07-17 2002-01-24 Magnetar Technologies Ltd Eddy current brake system with dual use conductor fin
WO2002085923A2 (en) 2001-04-19 2002-10-31 The Scripps Research Institute In vivo incorporation of unnatural amino acids
EP2192185A1 (en) 2007-09-20 2010-06-02 Riken Mutant pyrrolysyl-trna synthetase, and method for production of protein having non-natural amino acid integrated therein by using the same
WO2012104422A1 (en) 2011-02-03 2012-08-09 Embl Unnatural amino acids comprising a cyclooctynyl or trans-cyclooctenyl analog group and uses thereof
WO2015107064A1 (en) 2014-01-14 2015-07-23 European Molecular Biology Laboratory Multiple cycloaddition reactions for labeling of molecules
WO2018069481A1 (en) 2016-10-14 2018-04-19 European Molecular Biology Laboratory Archaeal pyrrolysyl trna synthetases for orthogonal use

Non-Patent Citations (67)

* Cited by examiner, † Cited by third party
Title
"Uniprot", Database accession no. P06899
AGARD ET AL., J AM CHEM SOC, vol. 126, 2004, pages 15046
ALBERTI ET AL., BIOESSAYS, vol. 38, 2016, pages 959 - 968
ALBERTI ET AL., CELL, vol. 137, 2009, pages 146 - 158
ALTMEYER ET AL., NAT COMMUN, vol. 6, 2015, pages 8088
AUSUBEL, F.M. ET AL.: "Current Protocols in Molecular Biology", 1997, GREENE PUBLISHING ASSOC. AND WILEY INTERSCIENCE
BENJAMIN S. SCHUSTER ET AL: "Controllable protein phase separation and modular recruitment to form responsive membraneless organelles", NATURE COMMUNICATIONS, vol. 9, no. 1, 30 July 2018 (2018-07-30), XP055526920, DOI: 10.1038/s41467-018-05403-1 *
BERTRAND ET AL., MOL CELL, vol. 2, 1998, pages 437 - 445
CHANG C. LIU ET AL: "Adding New Chemistries to the Genetic Code", ANNUAL REVIEW OF BIOCHEMISTRY, vol. 79, no. 1, 7 June 2010 (2010-06-07), pages 413 - 444, XP055026250, ISSN: 0066-4154, DOI: 10.1146/annurev.biochem.052308.105824 *
CHIN ET AL., J AM CHEM SOC, vol. 124, 2001, pages 9026
CHIN ET AL., SCIENCE, vol. 301, 2003, pages 964
CHIN J W: "Expanding and reprogramming the genetic code of cells and animals", ANNUAL REVIEW OF BIOCHEMISTRY, PALTO ALTO, CA, US, vol. 83, 1 June 2014 (2014-06-01), pages 379 - 408, XP002753743, ISSN: 0066-4154, [retrieved on 20140210], DOI: 10.1146/ANNUREV-BIOCHEM-060713-035737 *
CHIN, ANNU REV BIOCHEM, vol. 83, 2014, pages 379 - 408
DEUTSCHER: "Methods in Enzymology Vol. 182: Guide to Protein Purification", vol. 182, 1990, ACADEMIC PRESS
DEVARAJ ET AL., ANGEW CHEM INT ED ENGL, vol. 48, 2009, pages 7013
ECKHARDT MGOTZA BGERARDY-SCHAHN R: "Membrane topology of the mammalian CMP-sialic acid transporter", J BIOL CHEM., vol. 274, no. 13, 1999, pages 8779 - 8787
ENGELSBERG AHERMOSILLA RKARSTEN USCHULEIN RDARKEN BREHM A: "The Golgi protein RCAS1 controls cell surface expression of tumor-associated O-linked glycan antigens", J BIOL CHEM., vol. 278, no. 25, 2003, pages 22998 - 23007, XP008141356, DOI: 10.1074/jbc.M301361200
FAZAL FMHAN SPARKER KR ET AL.: "Atlas of Subcellular RNA Localization Revealed by APEX-Seq", CELL, vol. 178, no. 2, 2019, pages 473 - 490.e26, XP085739959, DOI: 10.1016/j.cell.2019.05.027
FERNANDEZ-MARTINEZ JKIM SJSHI Y ET AL.: "Structure and Function of the Nuclear Pore Complex Cytoplasmic mRNA Export Platform", CELL, vol. 167, no. 5, 2016, pages 1215 - 1228.e25, XP029812215, DOI: 10.1016/j.cell.2016.10.028
FRIED, ANGEW CHEM, vol. 54, 2015, pages 12791 - 12794
GIBSON ET AL., NAT METHODS, vol. 6, 2009, pages 343 - 345
HARLOW, E.LANE, D.: "Antibodies: A Laboratory Manual", 1988, COLD SPRING HARBOR
HEUMANN, NATURE, vol. 464, no. 441, 2010, pages 444
HIGGINS ET AL., COMPUT APPL. BIOSCI., vol. 5, no. 2, 1989, pages 151 - 1
ISAACS, SCIENCE, vol. 333, 2011, pages 348 - 353
JUNGMANN ET AL., NAT METHODS, vol. 11, 2014, pages 313 - 318
KATO ET AL., CELL, vol. 149, 2012, pages 753 - 767
KIM SJFERNANDEZ-MARTINEZ JNUDELMAN I ET AL.: "Integrative structure and functional anatomy of a nuclear pore complex", NATURE, vol. 555, no. 7697, 2018, pages 475 - 482
LAJOIE, SCIENCE, vol. 342, 2013, pages 357 - 360
LAPATSINA LJIRA JASMITH ES ET AL.: "Regulation of ASIC channels by a stomatin/STOML3 complex located in a mobile vesicle pool in sensory neurons", OPEN BIOL., vol. 2, no. 6, 2012, pages 120096
LEMKE, CHEMBIOCHEM, vol. 15, 2014, pages 1691 - 1694
LIU ET AL., ANNU REV BIOCHEM, vol. 79, 2010, pages 413 - 444
MALINOVSKA ET AL., BIOCHIM BIOPHYS ACTA, vol. 1834, 2013, pages 918 - 931
MALINOVSKA ET AL., PRION, vol. 9, 2015, pages 339 - 346
NEHLIG AMOLINA ARODRIGUES-FERREIRA SHONORE SNAHMIAS C: "Regulation of end-binding protein EB1 in the control of microtubule dynamics", CELL MOL LIFE SCI., vol. 74, no. 13, 2017, pages 2381 - 2393, XP036257906, DOI: 10.1007/s00018-017-2476-2
NEUMANN ET AL., NAT CHEM BIOL, vol. 4, 2008, pages 232
NGUYEN ET AL., J AM CHEM SOC, vol. 131, 2009, pages 8720
NIKIC ET AL., ANGEW CHEM INT ED ENGL, vol. 55, no. 52, 2016, pages 16172 - 16176
NIKIC ET AL., ANGEW CHEM, vol. 53, 2014, pages 2245 - 2249
NIKIC ET AL., ANGEW CHEM, vol. 55, 2016, pages 16172 - 16276
NIKIC ET AL., NAT PROTOC, vol. 10, no. 5, 2015, pages 780 - 791
OHTA ET AL., CHEMBIOCHEM, vol. 9, 2008, pages 2773 - 2778
ORELLE, NATURE, vol. 524, 2015, pages 119 - 124
OSTROV, SCIENCE, vol. 353, 2016, pages 819 - 822
PATEL ET AL., CELL, vol. 162, 2015, pages 1066 - 1077
PIERCE ET AL., METHODS CELL BIOL, vol. 122, 2014, pages 415 - 436
PLASS, ANGEW CHEM, vol. 50, 2011, pages 3878 - 3881
PLASS, ANGEW CHEM, vol. 51, 2012, pages 4166 - 4170
POUWELS P. H. ET AL.: "Cloning Vectors", 1985, ELSEVIER
REINKE, A.W.GRANT, R.A.KEATING, A.E., J AM CHEM SOC, vol. 133, 2010, pages 6025 - 6031
RESH, BBA-MOL CELL RES, vol. 1451, 1999, pages 1 - 16
SAMBROOK, J.FRITSCH, E.F.MANIATIS, T.: "Molecular Cloning: A Laboratory Manual", 1989, COLD SPRING HARBOR LABORATORY, COLD SPRING HARBOR LABORATORY PRESS
SCHMIED ET AL., J AM CHEM SOC, vol. 136, 2014, pages 15577 - 15583
SCHUSTER BSREED EHPARTHASARATHY R ET AL.: "Controllable protein phase separation and modular recruitment to form responsive membraneless organelles", NAT COMMUN. 2018, vol. 9, no. 1, 30 July 2018 (2018-07-30), pages 2985, XP055526920, DOI: 10.1038/s41467-018-05403-1
SCOPES: "Protein Purification", 1993, SPRINGER
SOPPINA ET AL., PROC NATL ACAD SCI U.S.A., vol. 111, 2014, pages 5562 - 5567
STEVEN BOEYNAEMS ET AL: "Protein Phase Separation: A New Phase in Cell Biology", TRENDS IN CELL BIOLOGY., vol. 28, no. 6, 1 June 2018 (2018-06-01), XX, pages 420 - 435, XP055583882, ISSN: 0962-8924, DOI: 10.1016/j.tcb.2018.02.004 *
T.J. SILHAVYM.L. BERMANL.W. ENQUIST: "Experiments with Gene Fusions", 1984, COLD SPRING HARBOR LABORATORY
THOMPSON, ACS CHEM BIOL, vol. 13, 2018, pages 313 - 325
WANG ET AL., ACS CHEM BIOL, vol. 8, 2013, pages 405 - 415
WANG, NATURE, vol. 539, 2016, pages 59 - 64
WOODRUFF ET AL., CELL, vol. 169, 2017, pages 1066 - 1077,e1010
WU BCHAO JASINGER RH: "Fluorescence fluctuation spectroscopy enables quantitative imaging of single mRNAs in living cells", BIOPHYS J., vol. 102, no. 12, 2012, pages 2936 - 2944, XP028495708, DOI: 10.1016/j.bpj.2012.05.017
XIAO ET AL., ANGEW CHEM, vol. 52, 2013, pages 14080 - 14083
YANAGISAWA ET AL., CHEM BIOL, vol. 15, 2008, pages 1187
ZHANG ET AL., BIOCHEM BIOPHYS RES CO, vol. 489, 2017, pages 490 - 496
ZHANG ET AL., NATURE, vol. 551, 2017, pages 644 - 647

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4186529A1 (en) 2021-11-25 2023-05-31 Veraxa Biotech GmbH Improved antibody-payload conjugates (apcs) prepared by site-specific conjugation utilizing genetic code expansion
WO2023094525A1 (en) 2021-11-25 2023-06-01 Veraxa Biotech Gmbh Improved antibody-payload conjugates (apcs) prepared by site-specific conjugation utilizing genetic code expansion

Also Published As

Publication number Publication date
CN113727993A (en) 2021-11-30
EP3696189A1 (en) 2020-08-19
MA54934A (en) 2021-12-22
IL285405A (en) 2021-09-30
US20230098002A1 (en) 2023-03-30
CA3129336A1 (en) 2020-08-20
JP2022521049A (en) 2022-04-05
EP3924365A1 (en) 2021-12-22

Similar Documents

Publication Publication Date Title
WO2020165408A1 (en) Means and methods for preparing engineered target proteins by genetic code expansion in a target protein-selective manner
Reinkemeier et al. Designer membraneless organelles enable codon reassignment of selected mRNAs in eukaryotes
Brödel et al. Cell‐free protein expression based on extracts from CHO cells
JP7292442B2 (en) Modified aminoacyl-tRNA synthetase and uses thereof
AU2004321117B2 (en) Selective incorporation of 5-hydroxytryptophan into proteins in mammalian cells
US20180171321A1 (en) Platform for a non-natural amino acid incorporation into proteins
JP7277361B2 (en) Archaeal Pyrrolidyl-tRNA Synthetases for Orthogonal Applications
JP2023514384A (en) Archaeal pyrrolidyl-tRNA synthetase for use in orthogonal methods
EP4031670A1 (en) Systems and methods for protein expression
JP2017216961A (en) Non-natural amino acid-containing peptide library
Lee et al. Multicistronic IVT mRNA for simultaneous expression of multiple fluorescent proteins
KR20200076603A (en) Cell penetrating Domain derived from human CLK2 protein
KR20200076582A (en) Cell penetrating Domain derived from human LRRC24 protein
Hino et al. Site-specific incorporation of unnatural amino acids into proteins in mammalian cells
KR20200076604A (en) Cell penetrating Domain derived from human GPATCH4 protein
CA3164788A1 (en) Peptide

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20703782

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021545719

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 3129336

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020703782

Country of ref document: EP

Effective date: 20210914